Quick Start Guide: Optimal Regression Trees

This is a Python version of the corresponding OptimalTrees quick start guide.

In this example we will use Optimal Regression Trees (ORT) on the yacht hydrodynamics dataset. First we load in the data and split into training and test datasets:

import pandas as pd
df = pd.read_csv(
    "yacht_hydrodynamics.data",
    delim_whitespace=True,
    header=None,
    names=['position', 'prismatic', 'length_displacement', 'beam_draught',
           'length_beam', 'froude', 'resistance'],
)
     position  prismatic  length_displacement  ...  length_beam  froude  resistance
0        -2.3      0.568                 4.78  ...         3.17   0.125        0.11
1        -2.3      0.568                 4.78  ...         3.17   0.150        0.27
2        -2.3      0.568                 4.78  ...         3.17   0.175        0.47
3        -2.3      0.568                 4.78  ...         3.17   0.200        0.78
4        -2.3      0.568                 4.78  ...         3.17   0.225        1.18
5        -2.3      0.568                 4.78  ...         3.17   0.250        1.82
6        -2.3      0.568                 4.78  ...         3.17   0.275        2.61
..        ...        ...                  ...  ...          ...     ...         ...
301      -2.3      0.600                 4.34  ...         2.73   0.300        4.15
302      -2.3      0.600                 4.34  ...         2.73   0.325        6.00
303      -2.3      0.600                 4.34  ...         2.73   0.350        8.47
304      -2.3      0.600                 4.34  ...         2.73   0.375       12.27
305      -2.3      0.600                 4.34  ...         2.73   0.400       19.59
306      -2.3      0.600                 4.34  ...         2.73   0.425       30.48
307      -2.3      0.600                 4.34  ...         2.73   0.450       46.66

[308 rows x 7 columns]
from interpretableai import iai
X = df.iloc[:, 0:-1]
y = df.iloc[:, -1]
(train_X, train_y), (test_X, test_y) = iai.split_data('regression', X, y,
                                                      seed=1)

Optimal Regression Trees

We will use a GridSearch to fit an OptimalTreeRegressor:

grid = iai.GridSearch(
    iai.OptimalTreeRegressor(
        random_seed=123,
    ),
    max_depth=range(1, 6),
)
grid.fit(train_X, train_y)
grid.get_learner()
Optimal Trees Visualization

We can make predictions on new data using predict:

grid.predict(test_X)
array([ 0.56551282,  0.56551282,  0.56551282, ..., 13.35666667,
       33.305     , 48.89555556])

We can evaluate the quality of the tree using score with any of the supported loss functions. For example, the $R^2$ on the training set:

grid.score(train_X, train_y, criterion='mse')
0.996553877459003

Or on the test set:

grid.score(test_X, test_y, criterion='mse')
0.9923405056982038

Optimal Regression Trees with Hyperplanes

To use Optimal Regression Trees with hyperplane splits (ORT-H), you should set the hyperplane_config parameter:

grid = iai.GridSearch(
    iai.OptimalTreeRegressor(
        random_seed=123,
        hyperplane_config={'sparsity': 'all'},
    ),
    max_depth=range(1, 5),
)
grid.fit(train_X, train_y)
grid.get_learner()
Optimal Trees Visualization