Quick Start Guide: Optimal Regression Trees

This is an R version of the corresponding OptimalTrees quick start guide.

In this example we will use Optimal Regression Trees (ORT) on the yacht hydrodynamics dataset. First we load in the data and split into training and test datasets:

df <- read.table(
    "yacht_hydrodynamics.data",
    col.names = c("position", "prismatic", "length_displacement",
                  "beam_draught", "length_beam", "froude", "resistance"),
)
  position prismatic length_displacement beam_draught length_beam froude
1     -2.3     0.568                4.78         3.99        3.17  0.125
2     -2.3     0.568                4.78         3.99        3.17  0.150
3     -2.3     0.568                4.78         3.99        3.17  0.175
4     -2.3     0.568                4.78         3.99        3.17  0.200
5     -2.3     0.568                4.78         3.99        3.17  0.225
6     -2.3     0.568                4.78         3.99        3.17  0.250
7     -2.3     0.568                4.78         3.99        3.17  0.275
8     -2.3     0.568                4.78         3.99        3.17  0.300
  resistance
1       0.11
2       0.27
3       0.47
4       0.78
5       1.18
6       1.82
7       2.61
8       3.76
 [ reached 'max' / getOption("max.print") -- omitted 300 rows ]
X <- df[, 1:6]
y <- df[, 7]
split <- iai::split_data("regression", X, y, seed = 1)
train_X <- split$train$X
train_y <- split$train$y
test_X <- split$test$X
test_y <- split$test$y

Optimal Regression Trees

We will use a grid_search to fit an optimal_tree_regressor:

grid <- iai::grid_search(
    iai::optimal_tree_regressor(
        random_seed = 123,
    ),
    max_depth = 1:5,
)
iai::fit(grid, train_X, train_y)
iai::get_learner(grid)
Optimal Trees Visualization

We can make predictions on new data using predict:

iai::predict(grid, test_X)
 [1]  0.7884043  0.7884043  0.7884043  3.9095556  3.9095556 13.3566667
 [7] 22.0722222  0.7884043  0.7884043  0.7884043  0.7884043  3.9095556
[13]  3.9095556 13.3566667 57.0700000  0.7884043  0.7884043  3.9095556
[19] 22.0722222 34.5753846  0.7884043  0.7884043  0.7884043  3.9095556
[25]  3.9095556  7.9833333  0.7884043  3.9095556 34.5753846  0.7884043
[31]  3.9095556  3.9095556 13.3566667 34.5753846  3.9095556  3.9095556
[37]  0.7884043  0.7884043  0.7884043 13.3566667 34.5753846 49.9158333
[43]  0.7884043  0.7884043  3.9095556  3.9095556 34.5753846  7.9833333
[49] 13.3566667  0.7884043  3.9095556 13.3566667  0.7884043  0.7884043
[55]  0.7884043  0.7884043  3.9095556 34.5753846  0.7884043 13.3566667
 [ reached getOption("max.print") -- omitted 32 entries ]

We can evaluate the quality of the tree using score with any of the supported loss functions. For example, the $R^2$ on the training set:

iai::score(grid, train_X, train_y, criterion = "mse")
[1] 0.991294

Or on the test set:

iai::score(grid, test_X, test_y, criterion = "mse")
[1] 0.9885238

Optimal Regression Trees with Hyperplanes

To use Optimal Regression Trees with hyperplane splits (ORT-H), you should set the hyperplane_config parameter:

grid <- iai::grid_search(
    iai::optimal_tree_regressor(
        random_seed = 123,
        hyperplane_config = list(sparsity = "all"),
    ),
    max_depth = 1:4,
)
iai::fit(grid, train_X, train_y)
iai::get_learner(grid)
Optimal Trees Visualization

Now we can find the performance on the test set with hyperplanes:

iai::score(grid, test_X, test_y, criterion = "mse")
[1] 0.9861183

It looks like the addition of hyperplane splits did not help too much here. It seems that the main variable affecting the target is froude, and so perhaps allowing multiple variables per split in the tree is not that useful for this dataset.

Optimal Regression Trees with Linear Predictions

To use Optimal Regression Trees with linear regression in the leaves (ORT-L), you should set the regression_sparsity parameter to "all" and use the regression_lambda parameter to control the degree of regularization.

grid <- iai::grid_search(
    iai::optimal_tree_regressor(
        random_seed = 123,
        max_depth = 2,
        regression_sparsity = "all",
    ),
    regression_lambda = c(0.005, 0.01, 0.05),
)
iai::fit(grid, train_X, train_y)
iai::get_learner(grid)
Optimal Trees Visualization

We can find the performance on the test set:

iai::score(grid, test_X, test_y, criterion = "mse")
[1] 0.9842225

We can see that the ORT-L model is much smaller than the models with constant predictions and has similar performance.