Quick Start Guide: Optimal Regression Trees

In this example we will use Optimal Regression Trees (ORT) on the yacht hydrodynamics dataset. First we load in the data and split into training and test datasets:

using CSV, DataFrames
df = CSV.read(
    "yacht_hydrodynamics.data", DataFrame,
    delim=' ',            # file uses ' ' as separators rather than ','
    ignorerepeated=true,  # sometimes columns are separated by more than one ' '
    header=[:position, :prismatic, :length_displacement, :beam_draught,
            :length_beam, :froude, :resistance],
)
308×7 DataFrame
 Row │ position  prismatic  length_displacement  beam_draught  length_beam  fr ⋯
     │ Float64   Float64    Float64              Float64       Float64      Fl ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │     -2.3      0.568                 4.78          3.99         3.17     ⋯
   2 │     -2.3      0.568                 4.78          3.99         3.17
   3 │     -2.3      0.568                 4.78          3.99         3.17
   4 │     -2.3      0.568                 4.78          3.99         3.17
   5 │     -2.3      0.568                 4.78          3.99         3.17     ⋯
   6 │     -2.3      0.568                 4.78          3.99         3.17
   7 │     -2.3      0.568                 4.78          3.99         3.17
   8 │     -2.3      0.568                 4.78          3.99         3.17
  ⋮  │    ⋮          ⋮               ⋮                ⋮             ⋮          ⋱
 302 │     -2.3      0.6                   4.34          4.23         2.73     ⋯
 303 │     -2.3      0.6                   4.34          4.23         2.73
 304 │     -2.3      0.6                   4.34          4.23         2.73
 305 │     -2.3      0.6                   4.34          4.23         2.73
 306 │     -2.3      0.6                   4.34          4.23         2.73     ⋯
 307 │     -2.3      0.6                   4.34          4.23         2.73
 308 │     -2.3      0.6                   4.34          4.23         2.73
                                                  2 columns and 293 rows omitted
X = df[:, 1:(end - 1)]
y = df[:, end]
(train_X, train_y), (test_X, test_y) = IAI.split_data(:regression, X, y,
                                                      seed=12345)

Optimal Regression Trees

We will use a GridSearch to fit an OptimalTreeRegressor:

grid = IAI.GridSearch(
    IAI.OptimalTreeRegressor(
        random_seed=123,
    ),
    max_depth=1:5,
)
IAI.fit!(grid, train_X, train_y)
IAI.get_learner(grid)
Optimal Trees Visualization
≥ 0.5795 < 0.5795 ≥ 0.538 < 0.538 ≥ 0.3375 < 0.3375 ≥ 0.2375 < 0.2375 ≥ 0.538 < 0.538 ≥ 0.547 < 0.547 ≥ 0.3875 < 0.3875 ≥ 0.2875 < 0.2875 ≥ 0.4375 < 0.4375 ≥ 0.3625 < 0.3625 ≥ 0.4125 < 0.4125Mean 11.08 n = 2161froudeMean 4.872 n = 1832froudeMean 45.5 n = 3315froudeMean 2.45 n = 1553froudeMean 18.28 n = 2810froudeMean 34.87 n = 1316prismaticMean 52.41 n = 2019prismaticMean 1.069 n = 1124froudeMean 6.049 n = 437froudePredict 13.01 n = 1111Mean 21.69 n = 1712prismaticPredict 40.35 n = 317Predict 33.22 n = 1018Predict 58.15 n = 520Mean 50.5 n = 1521prismaticPredict 0.5705 n = 795Predict 2.261 n = 336Predict 4.527 n = 258Predict 8.163 n = 189Predict 24.21 n = 413Predict 20.92 n = 1314Predict 51.62 n = 1122Predict 47.42 n = 423
×

We can make predictions on new data using predict:

IAI.predict(grid, test_X)
92-element Vector{Float64}:
  0.5705063291139242
  0.5705063291139242
 13.007272727272728
  2.2612121212121212
  4.5268000000000015
  8.163333333333332
 20.91692307692308
  0.5705063291139242
  0.5705063291139242
  2.2612121212121212
  ⋮
  0.5705063291139242
  2.2612121212121212
  4.5268000000000015
  8.163333333333332
 13.007272727272728
 40.353333333333346
  0.5705063291139242
  4.5268000000000015
 13.007272727272728

We can evaluate the quality of the tree using score with any of the supported loss functions. For example, the R2R^2 on the training set:

IAI.score(grid, train_X, train_y, criterion=:mse)
0.9960433580045623

Or on the test set:

IAI.score(grid, test_X, test_y, criterion=:mse)
0.986575389647429

Optimal Regression Trees with Hyperplanes

To use Optimal Regression Trees with hyperplane splits (ORT-H), you should set the hyperplane_config parameter:

grid = IAI.GridSearch(
    IAI.OptimalTreeRegressor(
        random_seed=12345,
        hyperplane_config=(sparsity=:all,)
    ),
    max_depth=1:4,
)
IAI.fit!(grid, train_X, train_y)
IAI.get_learner(grid)
Optimal Trees Visualization
≥ 0.3332 < 0.3332 ≥ 0.5795 < 0.5795 ≥ 3.975 < 3.975 ≥ 4.945 < 4.945 ≥ 8.196 < 8.196 ≥ 0.3625 < 0.3625 ≥ 0.7662 < 0.7662 ≥ 0.2125 < 0.2125 ≥ -191.7 < -191.7 ≥ 0.547 < 0.547 ≥ 0.3875 < 0.3875 ≥ 0.2625 < 0.2625 ≥ 0.4375 < 0.4375 ≥ 0.3375 < 0.3375 ≥ 0.4125 < 0.4125Mean 11.08 n = 2161froudeMean 4.872 n = 1832froudeMean 45.5 n = 3317froudeMean 1.7 n = 1373froudeMean 14.32 n = 4610froudeMean 34.87 n = 1318prismaticMean 52.41 n = 20252.178 * position-259.2 * prismatic-9.014 * length_displacementMean 0.7805 n = 954froudeMean 3.779 n = 4270.05123 * beam_draught + 1.845 * froudeMean 10 n = 2911froudeMean 21.69 n = 1714-0.1021 * position + 14.29 * prismaticMean 40.35 n = 319length_displacementMean 33.22 n = 1022beam_draughtMean 49.56 n = 1226prismaticMean 56.69 n = 829-0.03454 * position + 0.0867 * beam_draughtPredict 0.4078 n = 635Predict 1.514 n = 326Predict 3.058 n = 288Predict 5.221 n = 149Predict 8.163 n = 1812Predict 13.01 n = 1113Predict 23.82 n = 615Predict 20.53 n = 1116Predict 41.55 n = 220Predict 37.95 n = 121Predict 34.33 n = 523Predict 32.12 n = 524Predict 50.63 n = 827Predict 47.42 n = 428Predict 61.63 n = 230Predict 55.04 n = 631
×

Now we can find the performance on the test set with hyperplanes:

IAI.score(grid, test_X, test_y, criterion=:mse)
0.9829672227461212

It looks like the addition of hyperplane splits did not help too much here. It seems that the main variable affecting the target is froude, and so perhaps allowing multiple variables per split in the tree is not that useful for this dataset.

Optimal Regression Trees with Linear Predictions

To use Optimal Regression Trees with linear regression in the leaves (ORT-L), you should set the regression_features parameter to All() and use the regression_lambda parameter to control the degree of regularization.

grid = IAI.GridSearch(
    IAI.OptimalTreeRegressor(
        random_seed=123,
        max_depth=2,
        regression_features=All(),
    ),
    regression_lambda=[0.005, 0.01, 0.05],
)
IAI.fit!(grid, train_X, train_y)
IAI.get_learner(grid)
Optimal Trees Visualization
≥ 0.3625 < 0.3625Mean 11.08 n = 2161froudeReg with mean 2.45 n = 1552Reg with mean 33.01 n = 613
×

We can find the performance on the test set:

IAI.score(grid, test_X, test_y, criterion=:mse)
0.982097391565015

We can see that the ORT-L model is much smaller than the models with constant predictions and has similar performance.