Parameter Tuning

Like most machine learning methods, you will likely need to tune parameters of optimal tree learners through validation to get the best results. This page discusses which parameters to validate and some suggested approaches for validation.

Refer to the IAIBase documentation on parameter tuning for a general description on the tuning interface.

Warning

It is highly recommended that you use the GridSearch interface whenever you are fitting optimal tree models, as it will automatically tune the complexity parameter cp using a method that is significantly stronger than manual tuning.

General approach to parameter tuning

First, we outline a strategy for parameter tuning that should provide a good start for most applications. These suggestions are based on our experiences, gained through tests with synthetic and real-world datasets as well as many applications.

For most problems, the key parameters that affect the quality of the generated trees are:

cp
max_depth
minbucket
criterion

As mentioned above, when fitting using a GridSearch, the value of cp will be automatically tuned with high precision. We recommend tuning the rest in the following steps:

Step 1: Tune `max_depth`

We recommend tuning max_depth as a basic first step. Typically values for max_depth in the range of 5–10 are sufficient, but it is often valuable to keep trying deeper trees until the performance stops significantly improving.

The following code tunes an Optimal Classification Tree with max_depth between 5 and 10 and with cp automatically tuned:

grid = IAI.GridSearch(
    IAI.OptimalTreeClassifier(
        random_seed=1,
    ),
    max_depth=1:10,
)
IAI.fit!(grid, X, y)

All Grid Results:

 Row │ max_depth  cp          train_score  valid_score  rank_valid_score
     │ Int64      Float64     Float64      Float64      Int64
─────┼───────────────────────────────────────────────────────────────────
   1 │         1  0.345433       0.8625       0.830097                10
   2 │         2  0.0351288      0.932292     0.907767                 9
   3 │         3  0.0210773      0.976042     0.976699                 8
   4 │         4  0.00117096     0.994792     0.985194                 6
   5 │         5  0.00136612     0.998958     0.984466                 7
   6 │         6  0.00136612     1.0          0.991019                 1
   7 │         7  0.00136612     1.0          0.989078                 2
   8 │         8  0.00078064     1.0          0.98835                  3
   9 │         9  0.00117096     1.0          0.98568                  4
  10 │        10  0.00058548     1.0          0.98568                  5

Best Params:
  cp => 0.0013661202185792352
  max_depth => 6

Best Model - Fitted OptimalTreeClassifier:
  1) Split: skewness < 5.161
    2) Split: variance < 0.3258
      3) Split: curtosis < 3.064
        4) Predict: 1 (100.00%), [0,305], 305 points, error 0
        5) Split: variance < -0.4948
          6) Split: skewness < -0.3414
            7) Predict: 1 (99.47%), [1,187], 188 points, error 0.005319
            8) Predict: 0 (100.00%), [14,0], 14 points, error 0
          9) Split: entropy < 1.228
            10) Predict: 0 (100.00%), [13,0], 13 points, error 0
            11) Predict: 1 (100.00%), [0,1], 1 points, error 0
      12) Split: curtosis < -1.44
        13) Split: variance < 3.48
          14) Predict: 1 (100.00%), [0,66], 66 points, error 0
          15) Predict: 0 (100.00%), [5,0], 5 points, error 0
        16) Split: variance < 0.7508
          17) Split: curtosis < 1.881
            18) Split: entropy < -0.2078
              19) Predict: 0 (100.00%), [10,0], 10 points, error 0
              20) Predict: 1 (100.00%), [0,10], 10 points, error 0
            21) Predict: 0 (100.00%), [19,0], 19 points, error 0
          22) Predict: 0 (100.00%), [301,0], 301 points, error 0
    23) Split: variance < -3.368
      24) Split: entropy < -1.916
        25) Predict: 1 (100.00%), [0,40], 40 points, error 0
        26) Predict: 0 (100.00%), [1,0], 1 points, error 0
      27) Split: curtosis < -5.079
        28) Predict: 1 (100.00%), [0,1], 1 points, error 0
        29) Predict: 0 (100.00%), [398,0], 398 points, error 0

We can see that the performance on the validation set levels out around depth 5, so we don't need to push for deeper trees with this data. Also note that for each depth, cp has indeed been automatically tuned very precisely during the validation process.

Step 2: Change or tune `criterion` if splits are not sufficient

Depending on your problem, you may find it beneficial to use different values for criterion. For example, Optimal Classification Trees use :misclassification as the default training criterion, which works well in most cases where the goal is to predict the correct class. However, this criterion may not give the best solution if goal of the model is to predict probabilities as accurately as possible. For more information on how the training criterion affects Optimal Classification Trees, please refer to our worked example that compares behavior under different criteria.

Step 3: Change `validation_criterion`

We also recommend changing validation_criterion during a grid search to be consistent with how you intend to evaluate the model. Additionally, any criterion can be used during validation (unlike training) and so there are more options to consider, such as :auc for classification and :harrell_c_statistic for survival. Validating using these can often give better model selection results than the criteria available for training.

Step 4: Change or tune `minbucket` if leaves are too small

If you notice that you are seeing leaves with small numbers of points, you might need to increase or tune minbucket to make such solutions infeasible. Note that a leaf with a small number of points is not inherently undesirable - this should be done only when the splits in the tree give evidence that this small leaf may be overfitting to the training data.

Optimal Regression Trees with Linear Predictions

When using Optimal Regression Trees with linear predictions in the leaves, it is crucial to tune regression_lambda, the amount of regularization in the linear regression equations. We heavily recommend tuning both max_depth and regression_lambda to get the best results, but it can be computationally expensive to tune these simultaneously in the same grid search. Instead, we suggest the following three step process that tunes the parameters in an alternating fashion. We have found that this is much faster and typically has very similar performance to the full grid search.

Step 1: Get a starting estimate for `regression_lambda`

We need to choose a starting value for regression_lambda. You can either use the default value, or find a good starting estimate yourself.

One approach to doing this yourself cheaply is to validate over regression_lambda with max_depth fixed to zero - this is effectively just fitting a linear regression to the data and allows you to find a good baseline level of regularization:

grid = IAI.GridSearch(
    IAI.OptimalTreeRegressor(
        random_seed=2,
        max_depth=0,
        regression_features=All(),
    ),
    regression_lambda=[0.0001, 0.001, 0.01, 0.1],
)
IAI.fit!(grid, X, y)
starting_lambda = IAI.get_best_params(grid)[:regression_lambda]

0.1

Step 2: Tune `max_depth` with `regression_lambda` fixed

Using the starting estimate from Step 1 for regression_lambda, we now tune max_depth:

grid = IAI.GridSearch(
    IAI.OptimalTreeRegressor(
        random_seed=1,
        regression_features=All(),
        regression_lambda=starting_lambda,
    ),
    max_depth=1:5,
)
IAI.fit!(grid, X, y)
best_depth = IAI.get_best_params(grid)[:max_depth]

Step 3: Fix `max_depth` and tune `regression_lambda`

Finally, we fix max_depth to the value found in Step 2, and tune regression_lambda to get the final result:

grid = IAI.GridSearch(
    IAI.OptimalTreeRegressor(
        random_seed=1,
        max_depth=best_depth,
        regression_features=All(),
    ),
    regression_lambda=[0.0001, 0.001, 0.01, 0.1],
)
IAI.fit!(grid, X, y)
IAI.get_best_params(grid)

Dict{Symbol, Any} with 2 entries:
  :regression_lambda => 0.01
  :cp                => 0.000418484