Parameter Tuning
Like most machine learning methods, you will likely need to tune parameters of optimal tree learners through validation to get the best results. This page discusses which parameters to validate and some suggested approaches for validation.
Refer to the IAIBase documentation on parameter tuning for a general description on the tuning interface.
It is highly recommended that you use the GridSearch
interface whenever you are fitting optimal tree models, as it will automatically tune the complexity parameter cp
using a method that is significantly stronger than manual tuning.
General approach to parameter tuning
First, we outline a strategy for parameter tuning that should provide a good start for most applications. These suggestions are based on our experiences, gained through tests with synthetic and real-world datasets as well as many applications.
For most problems, the key parameters that affect the quality of the generated trees are:
cp
max_depth
minbucket
criterion
As mentioned above, when fitting using a GridSearch
, the value of cp
will be automatically tuned with high precision. We recommend tuning the rest in the following steps:
Step 1: Tune max_depth
We recommend tuning max_depth
as a basic first step. Typically values for max_depth
in the range of 5–10 are sufficient, but it is often valuable to keep trying deeper trees until the performance stops significantly improving.
The following code tunes an Optimal Classification Tree with max_depth
between 5 and 10 and with cp
automatically tuned:
grid = IAI.GridSearch(
IAI.OptimalTreeClassifier(
random_seed=1,
),
max_depth=1:10,
)
IAI.fit!(grid, X, y)
All Grid Results:
Row │ max_depth cp train_score valid_score rank_valid_score
│ Int64 Float64 Float64 Float64 Int64
─────┼───────────────────────────────────────────────────────────────────
1 │ 1 0.345433 0.8625 0.830097 10
2 │ 2 0.0351288 0.932292 0.907767 9
3 │ 3 0.0210773 0.976042 0.976699 8
4 │ 4 0.00117096 0.994792 0.985194 6
5 │ 5 0.00136612 0.998958 0.984466 7
6 │ 6 0.00136612 1.0 0.991019 1
7 │ 7 0.00136612 1.0 0.989078 2
8 │ 8 0.00078064 1.0 0.98835 3
9 │ 9 0.00117096 1.0 0.98568 4
10 │ 10 0.00058548 1.0 0.98568 5
Best Params:
cp => 0.0013661202185792352
max_depth => 6
Best Model - Fitted OptimalTreeClassifier:
1) Split: skewness < 5.161
2) Split: variance < 0.3258
3) Split: curtosis < 3.064
4) Predict: 1 (100.00%), [0,305], 305 points, error 0
5) Split: variance < -0.4948
6) Split: skewness < -0.3414
7) Predict: 1 (99.47%), [1,187], 188 points, error 0.005319
8) Predict: 0 (100.00%), [14,0], 14 points, error 0
9) Split: entropy < 1.228
10) Predict: 0 (100.00%), [13,0], 13 points, error 0
11) Predict: 1 (100.00%), [0,1], 1 points, error 0
12) Split: curtosis < -1.44
13) Split: variance < 3.48
14) Predict: 1 (100.00%), [0,66], 66 points, error 0
15) Predict: 0 (100.00%), [5,0], 5 points, error 0
16) Split: variance < 0.7508
17) Split: curtosis < 1.881
18) Split: entropy < -0.2078
19) Predict: 0 (100.00%), [10,0], 10 points, error 0
20) Predict: 1 (100.00%), [0,10], 10 points, error 0
21) Predict: 0 (100.00%), [19,0], 19 points, error 0
22) Predict: 0 (100.00%), [301,0], 301 points, error 0
23) Split: variance < -3.368
24) Split: entropy < -1.916
25) Predict: 1 (100.00%), [0,40], 40 points, error 0
26) Predict: 0 (100.00%), [1,0], 1 points, error 0
27) Split: curtosis < -5.079
28) Predict: 1 (100.00%), [0,1], 1 points, error 0
29) Predict: 0 (100.00%), [398,0], 398 points, error 0
We can see that the performance on the validation set levels out around depth 5, so we don't need to push for deeper trees with this data. Also note that for each depth, cp
has indeed been automatically tuned very precisely during the validation process.
Step 2: Change or tune criterion
if splits are not sufficient
Depending on your problem, you may find it beneficial to use different values for criterion
. For example, Optimal Classification Trees use :misclassification
as the default training criterion, which works well in most cases where the goal is to predict the correct class. However, this criterion may not give the best solution if goal of the model is to predict probabilities as accurately as possible. For more information on how the training criterion affects Optimal Classification Trees, please refer to our worked example that compares behavior under different criteria.
Step 3: Change validation_criterion
We also recommend changing validation_criterion
during a grid search to be consistent with how you intend to evaluate the model. Additionally, any criterion can be used during validation (unlike training) and so there are more options to consider, such as :auc
for classification and :harrell_c_statistic
for survival. Validating using these can often give better model selection results than the criteria available for training.
Step 4: Change or tune minbucket
if leaves are too small
If you notice that you are seeing leaves with small numbers of points, you might need to increase or tune minbucket
to make such solutions infeasible. Note that a leaf with a small number of points is not inherently undesirable - this should be done only when the splits in the tree give evidence that this small leaf may be overfitting to the training data.
Optimal Regression Trees with Linear Predictions
When using Optimal Regression Trees with linear predictions in the leaves, it is crucial to tune regression_lambda
, the amount of regularization in the linear regression equations. We heavily recommend tuning both max_depth
and regression_lambda
to get the best results, but it can be computationally expensive to tune these simultaneously in the same grid search. Instead, we suggest the following three step process that tunes the parameters in an alternating fashion. We have found that this is much faster and typically has very similar performance to the full grid search.
Step 1: Get a starting estimate for regression_lambda
We need to choose a starting value for regression_lambda
. You can either use the default value, or find a good starting estimate yourself.
One approach to doing this yourself cheaply is to validate over regression_lambda
with max_depth
fixed to zero - this is effectively just fitting a linear regression to the data and allows you to find a good baseline level of regularization:
grid = IAI.GridSearch(
IAI.OptimalTreeRegressor(
random_seed=2,
max_depth=0,
regression_features=All(),
),
regression_lambda=[0.0001, 0.001, 0.01, 0.1],
)
IAI.fit!(grid, X, y)
starting_lambda = IAI.get_best_params(grid)[:regression_lambda]
0.1
Step 2: Tune max_depth
with regression_lambda
fixed
Using the starting estimate from Step 1 for regression_lambda
, we now tune max_depth
:
grid = IAI.GridSearch(
IAI.OptimalTreeRegressor(
random_seed=1,
regression_features=All(),
regression_lambda=starting_lambda,
),
max_depth=1:5,
)
IAI.fit!(grid, X, y)
best_depth = IAI.get_best_params(grid)[:max_depth]
5
Step 3: Fix max_depth
and tune regression_lambda
Finally, we fix max_depth
to the value found in Step 2, and tune regression_lambda
to get the final result:
grid = IAI.GridSearch(
IAI.OptimalTreeRegressor(
random_seed=1,
max_depth=best_depth,
regression_features=All(),
),
regression_lambda=[0.0001, 0.001, 0.01, 0.1],
)
IAI.fit!(grid, X, y)
IAI.get_best_params(grid)
Dict{Symbol, Any} with 2 entries:
:regression_lambda => 0.01
:cp => 0.000418484