Quick Start Guide: Optimal Regression Trees

This is a Python version of the corresponding OptimalTrees quick start guide.

In this example we will use Optimal Regression Trees (ORT) on the yacht hydrodynamics dataset. First we load in the data and split into training and test datasets:

import pandas as pd
df = pd.read_csv(
    "yacht_hydrodynamics.data",
    sep='\s+',
    header=None,
    names=['position', 'prismatic', 'length_displacement', 'beam_draught',
           'length_beam', 'froude', 'resistance'],
)
     position  prismatic  length_displacement  ...  length_beam  froude  resistance
0        -2.3      0.568                 4.78  ...         3.17   0.125        0.11
1        -2.3      0.568                 4.78  ...         3.17   0.150        0.27
2        -2.3      0.568                 4.78  ...         3.17   0.175        0.47
3        -2.3      0.568                 4.78  ...         3.17   0.200        0.78
4        -2.3      0.568                 4.78  ...         3.17   0.225        1.18
5        -2.3      0.568                 4.78  ...         3.17   0.250        1.82
6        -2.3      0.568                 4.78  ...         3.17   0.275        2.61
..        ...        ...                  ...  ...          ...     ...         ...
301      -2.3      0.600                 4.34  ...         2.73   0.300        4.15
302      -2.3      0.600                 4.34  ...         2.73   0.325        6.00
303      -2.3      0.600                 4.34  ...         2.73   0.350        8.47
304      -2.3      0.600                 4.34  ...         2.73   0.375       12.27
305      -2.3      0.600                 4.34  ...         2.73   0.400       19.59
306      -2.3      0.600                 4.34  ...         2.73   0.425       30.48
307      -2.3      0.600                 4.34  ...         2.73   0.450       46.66

[308 rows x 7 columns]
from interpretableai import iai
X = df.iloc[:, 0:-1]
y = df.iloc[:, -1]
(train_X, train_y), (test_X, test_y) = iai.split_data('regression', X, y,
                                                      seed=1)

Optimal Regression Trees

We will use a GridSearch to fit an OptimalTreeRegressor:

grid = iai.GridSearch(
    iai.OptimalTreeRegressor(
        random_seed=123,
    ),
    max_depth=range(1, 6),
)
grid.fit(train_X, train_y)
grid.get_learner()
Optimal Trees Visualization
≥ -1.15 < -1.15 ≥ 2.895 < 2.895 ≥ 4.56 < 4.56 ≥ 0.538 < 0.538 ≥ 0.3625 < 0.3625 ≥ 0.3125 < 0.3125 ≥ 0.2375 < 0.2375 ≥ 3.765 < 3.765 ≥ 0.546 < 0.546 ≥ 0.3875 < 0.3875 ≥ 0.2875 < 0.2875 ≥ 0.425 < 0.425 ≥ 0.3375 < 0.3375 ≥ 0.4125 < 0.4125Mean 10.32 n = 2161froudeMean 5.087 n = 1872froudeMean 44.03 n = 2917froudeMean 1.799 n = 1393froudeMean 14.61 n = 4810froudeMean 34.58 n = 1318prismaticMean 51.7 n = 1623beam_draughtMean 1.037 n = 1084froudeMean 4.452 n = 317froudeMean 10.13 n = 3011froudeMean 22.07 n = 1814prismaticMean 38.81 n = 319length_displacementPredict 33.3 n = 1022Mean 57.07 n = 424beam_draughtMean 49.92 n = 1227positionPredict 0.5655 n = 785Predict 2.264 n = 306Predict 3.857 n = 198Predict 5.393 n = 129Predict 7.983 n = 1812Predict 13.36 n = 1213Predict 24.75 n = 415Predict 21.31 n = 1416Predict 41.34 n = 120Predict 37.55 n = 221Predict 62.42 n = 125Predict 55.29 n = 326Predict 48.9 n = 928Predict 52.98 n = 329
×

We can make predictions on new data using predict:

grid.predict(test_X)
array([ 0.56551282,  0.56551282,  0.56551282, ..., 13.35666667,
       33.305     , 48.89555556])

We can evaluate the quality of the tree using score with any of the supported loss functions. For example, the R2R^2 on the training set:

grid.score(train_X, train_y, criterion='mse')
0.996553877459003

Or on the test set:

grid.score(test_X, test_y, criterion='mse')
0.9923405056982038

Optimal Regression Trees with Hyperplanes

To use Optimal Regression Trees with hyperplane splits (ORT-H), you should set the hyperplane_config parameter:

grid = iai.GridSearch(
    iai.OptimalTreeRegressor(
        random_seed=12345,
        hyperplane_config={'sparsity': 'all'},
    ),
    max_depth=range(1, 5),
)
grid.fit(train_X, train_y)
grid.get_learner()
Optimal Trees Visualization
≥ 2.895 < 2.895 ≥ 3.89 < 3.89 ≥ 0.546 < 0.546 ≥ -0.9828 < -0.9828 ≥ -1.15 < -1.15 ≥ 0.3375 < 0.3375 ≥ 0.2875 < 0.2875 ≥ 0.2125 < 0.2125 ≥ -4.017 < -4.017 ≥ 0.4125 < 0.4125 ≥ 0.3625 < 0.3625 ≥ 0.2625 < 0.2625 ≥ 0.4375 < 0.4375 ≥ 0.8495 < 0.8495 ≥ 0.3875 < 0.3875Mean 10.32 n = 2161froudeMean 3.278 n = 16920.806 * prismatic + 1.215 * froudeMean 35.62 n = 4717froudeMean 1.575 n = 1323froudeMean 9.354 n = 3710froudeMean 27.32 n = 3118froudeMean 51.7 n = 16250.1456 * position-0.9761 * beam_draughtMean 0.7884 n = 944froudeMean 3.521 n = 387froudeMean 7.433 n = 2511froudeMean 13.36 n = 1214positionMean 22.07 n = 18190.07936 * position-1.495 * prismaticMean 34.58 n = 1322prismaticMean 49.16 n = 1026beam_draughtMean 55.95 n = 629beam_draughtPredict 0.4206 n = 645Predict 1.573 n = 306Predict 2.709 n = 148Predict 3.995 n = 249Predict 6.017 n = 712Predict 7.983 n = 1813Predict 13.01 n = 1015Predict 15.09 n = 216Predict 20.96 n = 1220Predict 24.3 n = 621Predict 38.81 n = 323Predict 33.3 n = 1024Predict 44.38 n = 127Predict 49.69 n = 928Predict 62.42 n = 130Predict 54.66 n = 531
×

Now we can find the performance on the test set with hyperplanes:

grid.score(test_X, test_y, criterion='mse')
0.9869719326990959

It looks like the addition of hyperplane splits did not help too much here. It seems that the main variable affecting the target is froude, and so perhaps allowing multiple variables per split in the tree is not that useful for this dataset.

Optimal Regression Trees with Linear Predictions

To use Optimal Regression Trees with linear regression in the leaves (ORT-L), you should set the regression_features parameter to {'All'} and use the regression_lambda parameter to control the degree of regularization.

grid = iai.GridSearch(
    iai.OptimalTreeRegressor(
        random_seed=123,
        max_depth=2,
        regression_features={'All'},
    ),
    regression_lambda=[0.005, 0.01, 0.05],
)
grid.fit(train_X, train_y)
grid.get_learner()
Optimal Trees Visualization
≥ 0.3625 < 0.3625Mean 10.32 n = 2161froudeReg with mean 2.508 n = 1572Reg with mean 31.09 n = 593
×

We can find the performance on the test set:

grid.score(test_X, test_y, criterion='mse')
0.98425278605254

We can see that the ORT-L model is much smaller than the models with constant predictions and has similar performance.