# Parameter Tuning

Like most machine learning methods, you will likely need to tune parameters of optimal tree learners through validation to get the best results. This page discusses which parameters to validate and some suggested approaches for validation.

Refer to the IAIBase documentation on parameter tuning for a general description on the tuning interface.

It is **highly recommended** that you use the `GridSearch`

interface whenever you are fitting optimal tree models, as it will automatically tune the complexity parameter `cp`

using a method that is **significantly stronger** than manual tuning.

## General approach to parameter tuning

First, we outline a strategy for parameter tuning that should provide a good start for most applications. These suggestions are based on our experiences, gained through tests with synthetic and real-world datasets as well as many applications.

For most problems, the key parameters that affect the quality of the generated trees are:

`cp`

`max_depth`

`minbucket`

`criterion`

As mentioned above, when fitting using a `GridSearch`

, the value of `cp`

will be automatically tuned with high precision. We recommend tuning the rest in the following steps:

#### Step 1: Tune `max_depth`

We recommend tuning `max_depth`

as a basic first step. Typically values for `max_depth`

in the range of 5–10 are sufficient, but it is often valuable to keep trying deeper trees until the performance stops significantly improving.

The following code tunes an Optimal Classification Tree with `max_depth`

between 5 and 10 and with `cp`

automatically tuned:

```
grid = IAI.GridSearch(
IAI.OptimalTreeClassifier(
random_seed=1,
),
max_depth=1:10,
)
IAI.fit!(grid, X, y)
```

```
All Grid Results:
│ Row │ max_depth │ cp │ train_score │ valid_score │ rank_valid_score │
│ │ Int64 │ Float64 │ Float64 │ Float64 │ Int64 │
├─────┼───────────┼────────────┼─────────────┼─────────────┼──────────────────┤
│ 1 │ 1 │ 0.34192 │ 0.859375 │ 0.837379 │ 10 │
│ 2 │ 2 │ 0.0351288 │ 0.934375 │ 0.901456 │ 9 │
│ 3 │ 3 │ 0.0245902 │ 0.978125 │ 0.971845 │ 8 │
│ 4 │ 4 │ 0.00936768 │ 0.995833 │ 0.974757 │ 7 │
│ 5 │ 5 │ 0.00078064 │ 1.0 │ 0.983738 │ 4 │
│ 6 │ 6 │ 0.00175644 │ 1.0 │ 0.985437 │ 1 │
│ 7 │ 7 │ 0.00175644 │ 1.0 │ 0.983981 │ 3 │
│ 8 │ 8 │ 0.00058548 │ 1.0 │ 0.983495 │ 5 │
│ 9 │ 9 │ 0.00058548 │ 1.0 │ 0.983252 │ 6 │
│ 10 │ 10 │ 0.00058548 │ 1.0 │ 0.984466 │ 2 │
Best Params:
cp => 0.0017564402810304445
max_depth => 6
Best Model - Fitted OptimalTreeClassifier:
1) Split: skewness < 5.21
2) Split: variance < -0.9483
3) Split: skewness < -3.412
4) Predict: 1 (100.00%), [0,168], 168 points, error 0
5) Split: curtosis < 4.74
6) Predict: 1 (100.00%), [0,197], 197 points, error 0
7) Predict: 0 (100.00%), [14,0], 14 points, error 0
8) Split: variance < 0.846
9) Split: skewness < -0.6891
10) Split: entropy < -0.5563
11) Predict: 0 (100.00%), [13,0], 13 points, error 0
12) Split: curtosis < 6.824
13) Predict: 1 (100.00%), [0,40], 40 points, error 0
14) Predict: 0 (100.00%), [7,0], 7 points, error 0
15) Split: curtosis < 0.7473
16) Predict: 1 (100.00%), [0,125], 125 points, error 0
17) Predict: 0 (100.00%), [26,0], 26 points, error 0
18) Split: curtosis < -1.807
19) Split: variance < 3.299
20) Predict: 1 (100.00%), [0,40], 40 points, error 0
21) Predict: 0 (100.00%), [3,0], 3 points, error 0
22) Predict: 0 (100.00%), [302,0], 302 points, error 0
23) Split: variance < -3.368
24) Predict: 1 (97.56%), [1,40], 41 points, error 0.02439
25) Predict: 0 (100.00%), [396,0], 396 points, error 0
```

We can see that the performance on the validation set levels out around depth 5, so we don't need to push for deeper trees with this data. Also note that for each depth, `cp`

has indeed been automatically tuned very precisely during the validation process.

#### Step 2: Change or tune `criterion`

if splits are not sufficient

Depending on your problem, you may find it beneficial to use different values for `criterion`

. For example, Optimal Classification Trees use `:misclassification`

as the default training criterion, which works well in most cases where the goal is to predict the correct class. However, this criterion may not give the best solution if goal of the model is to predict probabilities as accurately as possible. For more information on how the training criterion affects Optimal Classification Trees, please refer to our worked example that compares behavior under different criteria.

#### Step 3: Change `validation_criterion`

We also recommend changing `validation_criterion`

during a grid search to be consistent with how you intend to evaluate the model. Additionally, any criterion can be used during validation (unlike training) and so there are more options to consider, such as `:auc`

for classification and `:harrell_c_statistic`

for survival. Validating using these can often give better model selection results than the criteria available for training.

#### Step 4: Change or tune `minbucket`

if leaves are too small

If you notice that you are seeing leaves with small numbers of points, you might need to increase or tune `minbucket`

to make such solutions infeasible. Note that a leaf with a small number of points is not inherently undesirable - this should be done only when the splits in the tree give evidence that this small leaf may be overfitting to the training data.

## Optimal Regression Trees with Linear Predictions

When using Optimal Regression Trees with linear predictions in the leaves, it is crucial to tune `regression_lambda`

, the amount of regularization in the linear regression equations. We heavily recommend tuning both `max_depth`

and `regression_lambda`

to get the best results, but it can be computationally expensive to tune these simultaneously in the same grid search. Instead, we suggest the following three step process that tunes the parameters in an alternating fashion. We have found that this is much faster and typically has very similar performance to the full grid search.

#### Step 1: Get a starting estimate for `regression_lambda`

We need to choose a starting value for `regression_lambda`

. You can either use the default value, or find a good starting estimate yourself.

One approach to doing this yourself cheaply is to validate over `regression_lambda`

with `max_depth`

fixed to zero - this is effectively just fitting a linear regression to the data and allows you to find a good baseline level of regularization:

```
grid = IAI.GridSearch(
IAI.OptimalTreeRegressor(
random_seed=2,
max_depth=0,
regression_sparsity=:all,
),
regression_lambda=[0.0001, 0.001, 0.01, 0.1],
)
IAI.fit!(grid, X, y)
starting_lambda = IAI.get_best_params(grid)[:regression_lambda]
```

`0.1`

#### Step 2: Tune `max_depth`

with `regression_lambda`

fixed

Using the starting estimate from Step 1 for `regression_lambda`

, we now tune `max_depth`

:

```
grid = IAI.GridSearch(
IAI.OptimalTreeRegressor(
random_seed=1,
regression_sparsity=:all,
regression_lambda=starting_lambda,
),
max_depth=1:5,
)
IAI.fit!(grid, X, y)
best_depth = IAI.get_best_params(grid)[:max_depth]
```

`4`

#### Step 3: Fix `max_depth`

and tune `regression_lambda`

Finally, we fix `max_depth`

to the value found in Step 2, and tune `regression_lambda`

to get the final result:

```
grid = IAI.GridSearch(
IAI.OptimalTreeRegressor(
random_seed=1,
max_depth=best_depth,
regression_sparsity=:all,
),
regression_lambda=[0.0001, 0.001, 0.01, 0.1],
)
IAI.fit!(grid, X, y)
IAI.get_best_params(grid)
```

```
Dict{Symbol,Any} with 2 entries:
:regression_lambda => 0.001
:cp => 0.000258181
```