Quick Start Guide: Heuristic Regressors
This is a Python version of the corresponding Heuristics quick start guide.
In this example we will use regressors from Heuristics on the yacht hydrodynamics dataset. First we load in the data and split into training and test datasets:
import pandas as pd
df = pd.read_csv(
"yacht_hydrodynamics.data",
sep='\s+',
header=None,
names=['position', 'prismatic', 'length_displacement', 'beam_draught',
'length_beam', 'froude', 'resistance'],
)
position prismatic length_displacement ... length_beam froude resistance
0 -2.3 0.568 4.78 ... 3.17 0.125 0.11
1 -2.3 0.568 4.78 ... 3.17 0.150 0.27
2 -2.3 0.568 4.78 ... 3.17 0.175 0.47
3 -2.3 0.568 4.78 ... 3.17 0.200 0.78
4 -2.3 0.568 4.78 ... 3.17 0.225 1.18
5 -2.3 0.568 4.78 ... 3.17 0.250 1.82
6 -2.3 0.568 4.78 ... 3.17 0.275 2.61
.. ... ... ... ... ... ... ...
301 -2.3 0.600 4.34 ... 2.73 0.300 4.15
302 -2.3 0.600 4.34 ... 2.73 0.325 6.00
303 -2.3 0.600 4.34 ... 2.73 0.350 8.47
304 -2.3 0.600 4.34 ... 2.73 0.375 12.27
305 -2.3 0.600 4.34 ... 2.73 0.400 19.59
306 -2.3 0.600 4.34 ... 2.73 0.425 30.48
307 -2.3 0.600 4.34 ... 2.73 0.450 46.66
[308 rows x 7 columns]
from interpretableai import iai
X = df.iloc[:, 0:-1]
y = df.iloc[:, -1]
(train_X, train_y), (test_X, test_y) = iai.split_data('regression', X, y,
seed=1)
Random Forest Regressor
We will use a GridSearch
to fit a RandomForestRegressor
with some basic parameter validation:
grid = iai.GridSearch(
iai.RandomForestRegressor(
random_seed=1,
),
max_depth=range(5, 11),
)
grid.fit(train_X, train_y)
All Grid Results:
Row │ max_depth train_score valid_score rank_valid_score
│ Int64 Float64 Float64 Int64
─────┼───────────────────────────────────────────────────────
1 │ 5 0.994449 0.990294 6
2 │ 6 0.994511 0.990322 1
3 │ 7 0.994515 0.990321 2
4 │ 8 0.994515 0.990321 3
5 │ 9 0.994515 0.990321 4
6 │ 10 0.994515 0.990321 5
Best Params:
max_depth => 6
Best Model - Fitted RandomForestRegressor
We can make predictions on new data using predict
:
grid.predict(test_X)
array([ 0.12836191, 0.24591598, 1.29305023, ..., 13.86254817,
33.50526059, 53.29463706])
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the $R^2$ on the training set:
grid.score(train_X, train_y, criterion='mse')
0.9954616758740213
Or on the test set:
grid.score(test_X, test_y, criterion='mse')
0.9898501909798273
We can also look at the variable importance:
grid.get_learner().variable_importance()
Feature Importance
0 froude 0.994405
1 prismatic 0.001657
2 length_displacement 0.001617
3 beam_draught 0.001429
4 length_beam 0.000664
5 position 0.000228
XGBoost Regressor
We will use a GridSearch
to fit an XGBoostRegressor
with some basic parameter validation:
grid = iai.GridSearch(
iai.XGBoostRegressor(
random_seed=1,
),
max_depth=range(2, 6),
num_round=[20, 50, 100],
)
grid.fit(train_X, train_y)
All Grid Results:
Row │ num_round max_depth train_score valid_score rank_valid_score
│ Int64 Int64 Float64 Float64 Int64
─────┼──────────────────────────────────────────────────────────────────
1 │ 20 2 0.99756 0.993216 3
2 │ 20 3 0.999231 0.99266 9
3 │ 20 4 0.999735 0.993189 4
4 │ 20 5 0.999831 0.99201 10
5 │ 50 2 0.999287 0.995097 2
6 │ 50 3 0.999892 0.993184 5
7 │ 50 4 0.999959 0.992946 7
8 │ 50 5 0.999973 0.990473 11
9 │ 100 2 0.999698 0.996021 1
10 │ 100 3 0.9999 0.993153 6
11 │ 100 4 0.999959 0.992946 8
12 │ 100 5 0.999973 0.990472 12
Best Params:
num_round => 100
max_depth => 2
Best Model - Fitted XGBoostRegressor
We can make predictions on new data using predict
:
grid.predict(test_X)
array([ 0.23346585, 0.37805766, 1.26587629, ..., 11.98634815,
33.26702881, 49.76499939])
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the $R^2$ on the training set:
grid.score(train_X, train_y, criterion='mse')
0.999507
Or on the test set:
grid.score(test_X, test_y, criterion='mse')
0.997345
We can also look at the variable importance:
grid.get_learner().variable_importance()
Feature Importance
0 froude 0.954647
1 length_displacement 0.018293
2 prismatic 0.013922
3 beam_draught 0.006395
4 position 0.004111
5 length_beam 0.002631
We can calculate the SHAP values:
s = grid.predict_shap(test_X)
s['shap_values']
array([[-6.32807910e-02, -1.14607625e-02, -3.91643867e-02,
-1.84658561e-02, -8.15268755e-02, -9.86756802e+00],
[-6.64551705e-02, -1.14607625e-02, -3.78506146e-02,
-7.68390484e-03, -8.15268755e-02, -9.73189735e+00],
[-7.23956525e-02, -1.27419746e-02, -4.30041552e-02,
7.60984235e-03, -1.34582460e-01, -8.79394150e+00],
...,
[-2.23140568e-01, -3.45702350e-01, 7.95381218e-02,
3.98655888e-04, -2.72205651e-01, 2.43252921e+00],
[-1.47517592e-01, -6.43372834e-01, 5.53900421e-01,
-3.00649643e-01, -2.73264706e-01, 2.37629852e+01],
[-1.73150003e-01, -1.51304734e+00, 7.99156308e-01,
-2.85290539e-01, 1.32692412e-01, 4.04897118e+01]])
We can then use the SHAP library to visualize these results in whichever way we prefer.
GLMNet Regressor
We can use a GLMNetCVRegressor
to fit a GLMNet model using cross-validation:
lnr = iai.GLMNetCVRegressor(
random_seed=1,
n_folds=10,
)
lnr.fit(train_X, train_y)
Fitted GLMNetCVRegressor:
Constant: -22.2638
Weights:
froude: 113.914
We can access the coefficients from the fitted model with get_prediction_weights
and get_prediction_constant
:
numeric_weights, categoric_weights = lnr.get_prediction_weights()
numeric_weights
{'froude': 113.91421371}
categoric_weights
{}
lnr.get_prediction_constant()
-22.26379885
We can make predictions on new data using predict
:
lnr.predict(test_X)
array([-8.02452214, -5.17666679, 3.36689923, ..., 20.45403129,
26.14974197, 28.99759732])
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the $R^2$ on the training set:
lnr.score(train_X, train_y, criterion='mse')
0.6545717189938915
Or on the test set:
lnr.score(test_X, test_y, criterion='mse')
0.6510068277128165
We can also look at the variable importance:
lnr.variable_importance()
Feature Importance
0 froude 1.0
1 beam_draught 0.0
2 length_beam 0.0
3 length_displacement 0.0
4 position 0.0
5 prismatic 0.0