Quick Start Guide: Heuristic Regressors
This is a Python version of the corresponding Heuristics quick start guide.
In this example we will use regressors from Heuristics on the yacht hydrodynamics dataset. First we load in the data and split into training and test datasets:
import pandas as pd
df = pd.read_csv(
"yacht_hydrodynamics.data",
sep='\s+',
header=None,
names=['position', 'prismatic', 'length_displacement', 'beam_draught',
'length_beam', 'froude', 'resistance'],
)
position prismatic length_displacement ... length_beam froude resistance
0 -2.3 0.568 4.78 ... 3.17 0.125 0.11
1 -2.3 0.568 4.78 ... 3.17 0.150 0.27
2 -2.3 0.568 4.78 ... 3.17 0.175 0.47
3 -2.3 0.568 4.78 ... 3.17 0.200 0.78
4 -2.3 0.568 4.78 ... 3.17 0.225 1.18
5 -2.3 0.568 4.78 ... 3.17 0.250 1.82
6 -2.3 0.568 4.78 ... 3.17 0.275 2.61
.. ... ... ... ... ... ... ...
301 -2.3 0.600 4.34 ... 2.73 0.300 4.15
302 -2.3 0.600 4.34 ... 2.73 0.325 6.00
303 -2.3 0.600 4.34 ... 2.73 0.350 8.47
304 -2.3 0.600 4.34 ... 2.73 0.375 12.27
305 -2.3 0.600 4.34 ... 2.73 0.400 19.59
306 -2.3 0.600 4.34 ... 2.73 0.425 30.48
307 -2.3 0.600 4.34 ... 2.73 0.450 46.66
[308 rows x 7 columns]from interpretableai import iai
X = df.iloc[:, 0:-1]
y = df.iloc[:, -1]
(train_X, train_y), (test_X, test_y) = iai.split_data('regression', X, y,
seed=1)
Random Forest Regressor
We will use a GridSearch to fit a RandomForestRegressor with some basic parameter validation:
grid = iai.GridSearch(
iai.RandomForestRegressor(
random_seed=1,
),
max_depth=range(5, 11),
)
grid.fit(train_X, train_y)
All Grid Results:
Row │ max_depth train_score valid_score rank_valid_score
│ Int64 Float64 Float64 Int64
─────┼───────────────────────────────────────────────────────
1 │ 5 0.994449 0.990294 6
2 │ 6 0.994511 0.990322 1
3 │ 7 0.994515 0.990321 2
4 │ 8 0.994515 0.990321 3
5 │ 9 0.994515 0.990321 4
6 │ 10 0.994515 0.990321 5
Best Params:
max_depth => 6
Best Model - Fitted RandomForestRegressorWe can make predictions on new data using predict:
grid.predict(test_X)
array([ 0.12836191, 0.24591598, 1.29305023, ..., 13.86254817,
33.50526059, 53.29463706])We can evaluate the quality of the model using score with any of the supported loss functions. For example, the $R^2$ on the training set:
grid.score(train_X, train_y, criterion='mse')
0.9954616758740213Or on the test set:
grid.score(test_X, test_y, criterion='mse')
0.9898501909798273We can also look at the variable importance:
grid.get_learner().variable_importance()
Feature Importance
0 froude 0.994405
1 prismatic 0.001657
2 length_displacement 0.001617
3 beam_draught 0.001429
4 length_beam 0.000664
5 position 0.000228XGBoost Regressor
We will use a GridSearch to fit an XGBoostRegressor with some basic parameter validation:
grid = iai.GridSearch(
iai.XGBoostRegressor(
random_seed=1,
),
max_depth=range(2, 6),
num_round=[20, 50, 100],
)
grid.fit(train_X, train_y)
All Grid Results:
Row │ num_round max_depth train_score valid_score rank_valid_score
│ Int64 Int64 Float64 Float64 Int64
─────┼──────────────────────────────────────────────────────────────────
1 │ 20 2 0.99756 0.993216 3
2 │ 20 3 0.999231 0.99266 9
3 │ 20 4 0.999735 0.993189 4
4 │ 20 5 0.999831 0.99201 10
5 │ 50 2 0.999287 0.995097 2
6 │ 50 3 0.999892 0.993184 5
7 │ 50 4 0.999959 0.992946 7
8 │ 50 5 0.999973 0.990473 11
9 │ 100 2 0.999698 0.996021 1
10 │ 100 3 0.9999 0.993153 6
11 │ 100 4 0.999959 0.992946 8
12 │ 100 5 0.999973 0.990472 12
Best Params:
num_round => 100
max_depth => 2
Best Model - Fitted XGBoostRegressorWe can make predictions on new data using predict:
grid.predict(test_X)
array([ 0.23346585, 0.37805766, 1.26587629, ..., 11.98634815,
33.26702881, 49.76499939])We can evaluate the quality of the model using score with any of the supported loss functions. For example, the $R^2$ on the training set:
grid.score(train_X, train_y, criterion='mse')
0.999507Or on the test set:
grid.score(test_X, test_y, criterion='mse')
0.997345We can also look at the variable importance:
grid.get_learner().variable_importance()
Feature Importance
0 froude 0.954647
1 length_displacement 0.018293
2 prismatic 0.013922
3 beam_draught 0.006395
4 position 0.004111
5 length_beam 0.002631We can calculate the SHAP values:
s = grid.predict_shap(test_X)
s['shap_values']
array([[-6.32807910e-02, -1.14607625e-02, -3.91643867e-02,
-1.84658561e-02, -8.15268755e-02, -9.86756802e+00],
[-6.64551705e-02, -1.14607625e-02, -3.78506146e-02,
-7.68390484e-03, -8.15268755e-02, -9.73189735e+00],
[-7.23956525e-02, -1.27419746e-02, -4.30041552e-02,
7.60984235e-03, -1.34582460e-01, -8.79394150e+00],
...,
[-2.23140568e-01, -3.45702350e-01, 7.95381218e-02,
3.98655888e-04, -2.72205651e-01, 2.43252921e+00],
[-1.47517592e-01, -6.43372834e-01, 5.53900421e-01,
-3.00649643e-01, -2.73264706e-01, 2.37629852e+01],
[-1.73150003e-01, -1.51304734e+00, 7.99156308e-01,
-2.85290539e-01, 1.32692412e-01, 4.04897118e+01]])We can then use the SHAP library to visualize these results in whichever way we prefer.
GLMNet Regressor
We can use a GLMNetCVRegressor to fit a GLMNet model using cross-validation:
lnr = iai.GLMNetCVRegressor(
random_seed=1,
n_folds=10,
)
lnr.fit(train_X, train_y)
Fitted GLMNetCVRegressor:
Constant: -22.2638
Weights:
froude: 113.914We can access the coefficients from the fitted model with get_prediction_weights and get_prediction_constant:
numeric_weights, categoric_weights = lnr.get_prediction_weights()
numeric_weights
{'froude': 113.91421371}categoric_weights
{}lnr.get_prediction_constant()
-22.26379885We can make predictions on new data using predict:
lnr.predict(test_X)
array([-8.02452214, -5.17666679, 3.36689923, ..., 20.45403129,
26.14974197, 28.99759732])We can evaluate the quality of the model using score with any of the supported loss functions. For example, the $R^2$ on the training set:
lnr.score(train_X, train_y, criterion='mse')
0.6545717189938915Or on the test set:
lnr.score(test_X, test_y, criterion='mse')
0.6510068277128165We can also look at the variable importance:
lnr.variable_importance()
Feature Importance
0 froude 1.0
1 beam_draught 0.0
2 length_beam 0.0
3 length_displacement 0.0
4 position 0.0
5 prismatic 0.0