Parameter Tuning

Parameter Tuning

Like most machine learning methods, you will likely need to tune parameters of optimal tree learners through validation to get the best results. This page discusses which parameters to validate and some suggested approaches for validation.

Refer to the IAIBase documentation on parameter tuning for a general description on the tuning interface.

Warning

It is highly recommended that you use the GridSearch interface whenever you are fitting optimal tree models, as it will automatically tune the complexity parameter cp using a method that is significantly stronger than manual tuning.

General approach to parameter tuning

First, we outline a strategy for parameter tuning that should provide a good start for most applications. These suggestions are based on our experiences, gained through tests with synthetic and real-world datasets as well as many applications.

For most problems, the key parameters that affect the quality of the generated trees are:

As mentioned above, when fitting using a GridSearch, the value of cp will be automatically tuned with high precision. We recommend tuning max_depth and trying different values of criterion as appropriate for your problem. Typically values for max_depth in the range of 5–10 are sufficient, but it is often valuable to keep trying deeper trees until the performance stops significantly improving.

The following code tunes an Optimal Classification Tree with max_depth between 5 and 10 and with cp automatically tuned:

grid = IAI.GridSearch(
    IAI.OptimalTreeClassifier(
        random_seed=1,
    ),
    max_depth=1:10,
)
IAI.fit!(grid, X, y)
All Grid Results:

│ Row │ max_depth │ cp         │ train_score │ valid_score │ rank_valid_score │
│     │ Int64     │ Float64    │ Float64     │ Float64     │ Int64            │
├─────┼───────────┼────────────┼─────────────┼─────────────┼──────────────────┤
│ 1   │ 1         │ 0.323185   │ 0.842708    │ 0.864078    │ 10               │
│ 2   │ 2         │ 0.0257611  │ 0.919792    │ 0.940291    │ 9                │
│ 3   │ 3         │ 0.00468384 │ 0.98125     │ 0.96068     │ 8                │
│ 4   │ 4         │ 0.00702576 │ 0.996875    │ 0.975485    │ 7                │
│ 5   │ 5         │ 0.00058548 │ 1.0         │ 0.987136    │ 2                │
│ 6   │ 6         │ 0.0019516  │ 1.0         │ 0.987379    │ 1                │
│ 7   │ 7         │ 0.00039032 │ 1.0         │ 0.984466    │ 3                │
│ 8   │ 8         │ 0.00058548 │ 1.0         │ 0.983252    │ 6                │
│ 9   │ 9         │ 0.00058548 │ 1.0         │ 0.983495    │ 5                │
│ 10  │ 10        │ 0.00058548 │ 1.0         │ 0.983738    │ 4                │

Best Params:
  cp => 0.0019516003122560495
  max_depth => 6

Best Model - Fitted OptimalTreeClassifier:
  1) Split: skewness < 5.21
    2) Split: variance < -1.3
      3) Split: curtosis < 2.705
        4) Predict: 1 (100.00%), [0,162], 162 points, error 0
        5) Split: skewness < 0.02172
          6) Predict: 1 (100.00%), [0,172], 172 points, error 0
          7) Predict: 0 (100.00%), [9,0], 9 points, error 0
      8) Split: variance < 0.846
        9) Split: skewness < -0.3114
          10) Split: entropy < -0.5563
            11) Predict: 0 (100.00%), [13,0], 13 points, error 0
            12) Split: curtosis < 6.824
              13) Predict: 1 (100.00%), [0,61], 61 points, error 0
              14) Predict: 0 (100.00%), [11,0], 11 points, error 0
          15) Split: curtosis < 0.7473
            16) Predict: 1 (100.00%), [0,135], 135 points, error 0
            17) Predict: 0 (100.00%), [27,0], 27 points, error 0
        18) Split: curtosis < -1.807
          19) Split: variance < 3.299
            20) Predict: 1 (100.00%), [0,40], 40 points, error 0
            21) Predict: 0 (100.00%), [3,0], 3 points, error 0
          22) Predict: 0 (100.00%), [302,0], 302 points, error 0
    23) Split: variance < -3.368
      24) Predict: 1 (97.56%), [1,40], 41 points, error 1
      25) Predict: 0 (100.00%), [396,0], 396 points, error 0

We can see that the performance on the validation set levels out around depth 5, so we don't need to push for deeper trees with this data. Also note that for each depth, cp has indeed been automatically tuned very precisely during the validation process.

Depending on your problem, you might also find it beneficial to tune the following parameters:

Optimal Regression Trees with Linear Predictions

When using Optimal Regression Trees with linear predictions in the leaves, it is crucial to tune regression_lambda, the amount of regularization in the linear regression equations. We heavily recommend tuning both max_depth and regression_lambda to get the best results, but it can be computationally expensive to tune these simultaneously in the same grid search. Instead, we suggest the following three step process that tunes the parameters in an alternating fashion. We have found that this is much faster and typically has very similar performance to the full grid search.

Step 1: Get a starting estimate for regression_lambda

We need to choose a starting value for regression_lambda. You can either use the default value, or find a good starting estimate yourself.

One approach to doing this yourself cheaply is to validate over regression_lambda with max_depth fixed to zero - this is effectively just fitting a linear regression to the data and allows you to find a good baseline level of regularization:

grid = IAI.GridSearch(
    IAI.OptimalTreeRegressor(
        random_seed=1,
        max_depth=0,
        regression_sparsity=:all,
    ),
    regression_lambda=[0.0001, 0.001, 0.01, 0.1],
)
IAI.fit!(grid, X, y)
starting_lambda = IAI.get_best_params(grid)[:regression_lambda]
0.1

Step 2: Tune max_depth with regression_lambda fixed

Using the starting estimate from Step 1 for regression_lambda, we now tune max_depth:

grid = IAI.GridSearch(
    IAI.OptimalTreeRegressor(
        random_seed=1,
        regression_sparsity=:all,
        regression_lambda=starting_lambda,
    ),
    max_depth=1:5,
)
IAI.fit!(grid, X, y)
best_depth = IAI.get_best_params(grid)[:max_depth]
4

Step 3: Fix max_depth and tune regression_lambda

Finally, we fix max_depth to the value found in Step 2, and tune regression_lambda to get the final result:

grid = IAI.GridSearch(
    IAI.OptimalTreeRegressor(
        random_seed=1,
        max_depth=best_depth,
        regression_sparsity=:all,
    ),
    regression_lambda=[0.0001, 0.001, 0.01, 0.1],
)
IAI.fit!(grid, X, y)
IAI.get_best_params(grid)
Dict{Symbol,Any} with 2 entries:
  :regression_lambda => 0.001
  :cp                => 1.57856e-5