Quick Start Guide: Heuristic Regressors
This is a Python version of the corresponding Heuristics quick start guide.
In this example we will use regressors from Heuristics on the yacht hydrodynamics dataset. First we load in the data and split into training and test datasets:
import pandas as pd
df = pd.read_csv(
"yacht_hydrodynamics.data",
sep='\s+',
header=None,
names=['position', 'prismatic', 'length_displacement', 'beam_draught',
'length_beam', 'froude', 'resistance'],
)
position prismatic length_displacement ... length_beam froude resistance
0 -2.3 0.568 4.78 ... 3.17 0.125 0.11
1 -2.3 0.568 4.78 ... 3.17 0.150 0.27
2 -2.3 0.568 4.78 ... 3.17 0.175 0.47
3 -2.3 0.568 4.78 ... 3.17 0.200 0.78
4 -2.3 0.568 4.78 ... 3.17 0.225 1.18
5 -2.3 0.568 4.78 ... 3.17 0.250 1.82
6 -2.3 0.568 4.78 ... 3.17 0.275 2.61
.. ... ... ... ... ... ... ...
301 -2.3 0.600 4.34 ... 2.73 0.300 4.15
302 -2.3 0.600 4.34 ... 2.73 0.325 6.00
303 -2.3 0.600 4.34 ... 2.73 0.350 8.47
304 -2.3 0.600 4.34 ... 2.73 0.375 12.27
305 -2.3 0.600 4.34 ... 2.73 0.400 19.59
306 -2.3 0.600 4.34 ... 2.73 0.425 30.48
307 -2.3 0.600 4.34 ... 2.73 0.450 46.66
[308 rows x 7 columns]
from interpretableai import iai
X = df.iloc[:, 0:-1]
y = df.iloc[:, -1]
(train_X, train_y), (test_X, test_y) = iai.split_data('regression', X, y,
seed=1)
Random Forest Regressor
We will use a GridSearch
to fit a RandomForestRegressor
with some basic parameter validation:
grid = iai.GridSearch(
iai.RandomForestRegressor(
random_seed=1,
),
max_depth=range(5, 11),
)
grid.fit(train_X, train_y)
All Grid Results:
Row │ max_depth train_score valid_score rank_valid_score
│ Int64 Float64 Float64 Int64
─────┼───────────────────────────────────────────────────────
1 │ 5 0.998792 0.995212 6
2 │ 6 0.999109 0.995511 5
3 │ 7 0.999189 0.995521 2
4 │ 8 0.999205 0.995522 1
5 │ 9 0.999207 0.995519 3
6 │ 10 0.999207 0.995519 4
Best Params:
max_depth => 8
Best Model - Fitted RandomForestRegressor
We can make predictions on new data using predict
:
grid.predict(test_X)
array([ 0.09691999, 0.28367277, 1.27294232, ..., 12.93131952,
33.007085 , 50.49560667])
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the $R^2$ on the training set:
grid.score(train_X, train_y, criterion='mse')
0.9993065912313013
Or on the test set:
grid.score(test_X, test_y, criterion='mse')
0.9937779440350902
We can also look at the variable importance:
grid.get_learner().variable_importance()
Feature Importance
0 froude 0.990682
1 prismatic 0.004045
2 beam_draught 0.002431
3 position 0.001415
4 length_displacement 0.001226
5 length_beam 0.000201
XGBoost Regressor
We will use a GridSearch
to fit an XGBoostRegressor
with some basic parameter validation:
grid = iai.GridSearch(
iai.XGBoostRegressor(
random_seed=1,
),
max_depth=range(2, 6),
num_round=[20, 50, 100],
)
grid.fit(train_X, train_y)
All Grid Results:
Row │ num_round max_depth train_score valid_score rank_valid_score
│ Int64 Int64 Float64 Float64 Int64
─────┼──────────────────────────────────────────────────────────────────
1 │ 20 2 0.997817 0.995225 6
2 │ 20 3 0.999371 0.9953 5
3 │ 20 4 0.999748 0.992537 7
4 │ 20 5 0.999816 0.992295 8
5 │ 50 2 0.999118 0.99551 4
6 │ 50 3 0.999904 0.995645 3
7 │ 50 4 0.999953 0.991627 9
8 │ 50 5 0.999966 0.990115 11
9 │ 100 2 0.999632 0.9962 1
10 │ 100 3 0.999904 0.995646 2
11 │ 100 4 0.999953 0.991627 10
12 │ 100 5 0.999966 0.990115 12
Best Params:
num_round => 100
max_depth => 2
Best Model - Fitted XGBoostRegressor
We can make predictions on new data using predict
:
grid.predict(test_X)
array([ 0.23346329, 0.37805462, 1.26587391, ..., 11.98635006,
33.26702118, 49.76500702])
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the $R^2$ on the training set:
grid.score(train_X, train_y, criterion='mse')
0.999507
Or on the test set:
grid.score(test_X, test_y, criterion='mse')
0.997345
We can also look at the variable importance:
grid.get_learner().variable_importance()
Feature Importance
0 froude 0.993665
1 prismatic 0.002547
2 beam_draught 0.001372
3 length_displacement 0.001154
4 position 0.000648
5 length_beam 0.000614
GLMNet Regressor
We can use a GLMNetCVRegressor
to fit a GLMNet model using cross-validation:
lnr = iai.GLMNetCVRegressor(
random_seed=1,
nfolds=10,
)
lnr.fit(train_X, train_y)
Fitted GLMNetCVRegressor:
Constant: -22.0757
Weights:
froude: 113.256
We can access the coefficients from the fitted model with get_prediction_weights
and get_prediction_constant
:
numeric_weights, categoric_weights = lnr.get_prediction_weights()
numeric_weights
{'froude': 113.25649906}
categoric_weights
{}
lnr.get_prediction_constant()
-22.07569551
We can make predictions on new data using predict
:
lnr.predict(test_X)
array([-7.91863312, -5.08722065, 3.40701678, ..., 20.39549164,
26.05831659, 28.88972907])
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the $R^2$ on the training set:
lnr.score(train_X, train_y, criterion='mse')
0.6541519917396235
Or on the test set:
lnr.score(test_X, test_y, criterion='mse')
0.6504195810342512
We can also look at the variable importance:
lnr.variable_importance()
Feature Importance
0 froude 1.0
1 beam_draught 0.0
2 length_beam 0.0
3 length_displacement 0.0
4 position 0.0
5 prismatic 0.0