Quick Start Guide: Heuristic Regressors
This is a Python version of the corresponding Heuristics quick start guide.
In this example we will use regressors from Heuristics on the yacht hydrodynamics dataset. First we load in the data and split into training and test datasets:
import pandas as pd
df = pd.read_csv(
"yacht_hydrodynamics.data",
sep='\s+',
header=None,
names=['position', 'prismatic', 'length_displacement', 'beam_draught',
'length_beam', 'froude', 'resistance'],
)
position prismatic length_displacement ... length_beam froude resistance
0 -2.3 0.568 4.78 ... 3.17 0.125 0.11
1 -2.3 0.568 4.78 ... 3.17 0.150 0.27
2 -2.3 0.568 4.78 ... 3.17 0.175 0.47
3 -2.3 0.568 4.78 ... 3.17 0.200 0.78
4 -2.3 0.568 4.78 ... 3.17 0.225 1.18
5 -2.3 0.568 4.78 ... 3.17 0.250 1.82
6 -2.3 0.568 4.78 ... 3.17 0.275 2.61
.. ... ... ... ... ... ... ...
301 -2.3 0.600 4.34 ... 2.73 0.300 4.15
302 -2.3 0.600 4.34 ... 2.73 0.325 6.00
303 -2.3 0.600 4.34 ... 2.73 0.350 8.47
304 -2.3 0.600 4.34 ... 2.73 0.375 12.27
305 -2.3 0.600 4.34 ... 2.73 0.400 19.59
306 -2.3 0.600 4.34 ... 2.73 0.425 30.48
307 -2.3 0.600 4.34 ... 2.73 0.450 46.66
[308 rows x 7 columns]from interpretableai import iai
X = df.iloc[:, 0:-1]
y = df.iloc[:, -1]
(train_X, train_y), (test_X, test_y) = iai.split_data('regression', X, y,
seed=1)
Random Forest Regressor
We will use a GridSearch to fit a RandomForestRegressor with some basic parameter validation:
grid = iai.GridSearch(
iai.RandomForestRegressor(
random_seed=1,
),
max_depth=range(5, 11),
)
grid.fit(train_X, train_y)
All Grid Results:
Row │ max_depth train_score valid_score rank_valid_score
│ Int64 Float64 Float64 Int64
─────┼───────────────────────────────────────────────────────
1 │ 5 0.998792 0.995212 6
2 │ 6 0.999109 0.995511 5
3 │ 7 0.999189 0.995521 2
4 │ 8 0.999205 0.995522 1
5 │ 9 0.999207 0.995519 3
6 │ 10 0.999207 0.995519 4
Best Params:
max_depth => 8
Best Model - Fitted RandomForestRegressorWe can make predictions on new data using predict:
grid.predict(test_X)
array([ 0.09691999, 0.28367277, 1.27294232, ..., 12.93131952,
33.007085 , 50.49560667])We can evaluate the quality of the model using score with any of the supported loss functions. For example, the $R^2$ on the training set:
grid.score(train_X, train_y, criterion='mse')
0.9993065912313013Or on the test set:
grid.score(test_X, test_y, criterion='mse')
0.9937779440350902We can also look at the variable importance:
grid.get_learner().variable_importance()
Feature Importance
0 froude 0.990682
1 prismatic 0.004045
2 beam_draught 0.002431
3 position 0.001415
4 length_displacement 0.001226
5 length_beam 0.000201XGBoost Regressor
We will use a GridSearch to fit an XGBoostRegressor with some basic parameter validation:
grid = iai.GridSearch(
iai.XGBoostRegressor(
random_seed=1,
),
max_depth=range(2, 6),
num_round=[20, 50, 100],
)
grid.fit(train_X, train_y)
All Grid Results:
Row │ num_round max_depth train_score valid_score rank_valid_score
│ Int64 Int64 Float64 Float64 Int64
─────┼──────────────────────────────────────────────────────────────────
1 │ 20 2 0.997817 0.995225 6
2 │ 20 3 0.999371 0.9953 5
3 │ 20 4 0.999748 0.992537 7
4 │ 20 5 0.999816 0.992295 8
5 │ 50 2 0.999118 0.99551 4
6 │ 50 3 0.999904 0.995645 3
7 │ 50 4 0.999953 0.991627 9
8 │ 50 5 0.999966 0.990115 11
9 │ 100 2 0.999632 0.9962 1
10 │ 100 3 0.999904 0.995646 2
11 │ 100 4 0.999953 0.991627 10
12 │ 100 5 0.999966 0.990115 12
Best Params:
num_round => 100
max_depth => 2
Best Model - Fitted XGBoostRegressorWe can make predictions on new data using predict:
grid.predict(test_X)
array([ 0.23346329, 0.37805462, 1.26587391, ..., 11.98635006,
33.26702118, 49.76500702])We can evaluate the quality of the model using score with any of the supported loss functions. For example, the $R^2$ on the training set:
grid.score(train_X, train_y, criterion='mse')
0.999507Or on the test set:
grid.score(test_X, test_y, criterion='mse')
0.997345We can also look at the variable importance:
grid.get_learner().variable_importance()
Feature Importance
0 froude 0.993665
1 prismatic 0.002547
2 beam_draught 0.001372
3 length_displacement 0.001154
4 position 0.000648
5 length_beam 0.000614GLMNet Regressor
We can use a GLMNetCVRegressor to fit a GLMNet model using cross-validation:
lnr = iai.GLMNetCVRegressor(
random_seed=1,
nfolds=10,
)
lnr.fit(train_X, train_y)
Fitted GLMNetCVRegressor:
Constant: -22.0757
Weights:
froude: 113.256We can access the coefficients from the fitted model with get_prediction_weights and get_prediction_constant:
numeric_weights, categoric_weights = lnr.get_prediction_weights()
numeric_weights
{'froude': 113.25649906}categoric_weights
{}lnr.get_prediction_constant()
-22.07569551We can make predictions on new data using predict:
lnr.predict(test_X)
array([-7.91863312, -5.08722065, 3.40701678, ..., 20.39549164,
26.05831659, 28.88972907])We can evaluate the quality of the model using score with any of the supported loss functions. For example, the $R^2$ on the training set:
lnr.score(train_X, train_y, criterion='mse')
0.6541519917396235Or on the test set:
lnr.score(test_X, test_y, criterion='mse')
0.6504195810342512We can also look at the variable importance:
lnr.variable_importance()
Feature Importance
0 froude 1.0
1 beam_draught 0.0
2 length_beam 0.0
3 length_displacement 0.0
4 position 0.0
5 prismatic 0.0