# Optimal Tree Learners

OptimalTrees provides many learners for training optimal tree models, which we describe on this page along with a guide to their parameters.

## Shared Parameters

All of the learners provided by OptimalTrees are `OptimalTreeLearner`

s. In addition to the shared learner parameters, these learners support the following parameters to control the behavior of the OptimalTrees algorithm.

### Commonly-used shared parameters

`max_depth`

`max_depth`

accepts a non-negative `Integer`

to control the maximum depth of the fitted tree. This parameter must always be explicitly set or tuned. We recommend tuning this parameter using the grid search process described in the guide to parameter tuning.

`minbucket`

`minbucket`

controls the minimum number of points that must be present in every leaf node of the fitted tree. There are two ways to specify the `minbucket`

:

- a positive
`Integer`

specifying the minimum number of points for each leaf - a
`Real`

between 0 and 1 indicating the minimum proportion of the points that must be present in each leaf (e.g. 0.01 means each leaf must have at least 1% of the points)

The default value is `1`

, effectively applying no restriction. If you notice that some leaves in the tree have relatively few points, you might like to increase `minbucket`

to prevent these leaves from occuring.

`cp`

`cp`

is a `Real`

known as the *complexity parameter* that determines the tradeoff between the accuracy and complexity of the tree to control overfitting. When training the model, it tries to optimize the following objective function:

A higher value of `cp`

increases the penalty on each split in the tree, leading to shallower trees. This parameter must always be explicitly set or tuned, and should almost always be chosen using the autotuning procedure described in the guide to parameter tuning.

`missingdatamode`

`missingdatamode`

specifies the method to use for handling missing values in the features. The following options are supported:

`:none`

: The default option, an error will be thrown if missing data is encountered`:separate_class`

: The separate class algorithm treats the missing values as a separate class when creating a split, so in addition to deciding how to split the non-missing values as normal, the tree will decide to send all missing values to either the lower or upper child of the split`:always_right`

: Always send missing values to the upper child of the split

More details on each method are available in the paper by Ding and Simonoff (2010).

`ls_num_tree_restarts`

`ls_num_tree_restarts`

is an `Integer`

specifying the number of random restarts to use in the local search algorithm. Must be positive and defaults to `100`

. The performance of the tree typically increases as this value is increased, but with quickly diminishing returns. The computational cost of training increases linearly with this value. You might like to try increasing this value if you are seeing instability in your results, but our experience is that there is not much gain to be had increasing beyond `1000`

.

`split_features`

`split_features`

specifies the set of features in the data that are allowed to be used by the splits in the tree. It accepts the following options:

`:all`

meaning all features can be used in the splits, which is the default- a set of feature indices specifying which features can be used in the split

This parameter only needs to be specified if you want to restrict the splits in the tree to a subset of the features.

### Hyperplane-related parameters

`hyperplane_config`

`hyperplane_config`

controls the behavior of hyperplane splits in the tree fitting process. To simply enable standard hyperplane splits in the tree, you should pass

`hyperplane_config=(sparsity=:all,)`

For more advanced control of the hyperplane splits, there are additional options you can specify. You must pass a `NamedTuple`

or a vector of `NamedTuple`

s. Each `NamedTuple`

can contain one or more of the following keys:

`sparsity`

controls the maximum number of features used in each hyperplane split. The possible options are:`:all`

: allowed to use all features- an
`Integer`

: the maximum number of features - a
`Real`

between 0 and 1: use this proportion of the total features `:sqrt`

: use the square root of the total number of features`:log2`

: use the base-2 logarithm of the total number of features

`feature_set`

specifies the set of potential features used in each hyperplane split. Defaults to`:all`

, allowing all features to be considered for hyperplane splits. You can also specify a set of features to restrict which features are allowed to be used in the hyperplane splits.`values`

specifies the values that hyperplane weights can take:`:continuous`

is the default option, allowing any continuous values to be used for the weights`:discrete`

restricts the weights to be integer-valued- You can also pass a set of real values to restrict all hyperplane weights to be chosen from this set of possible values

A different hyperplane split search is conducted for each `NamedTuple`

that is passed. For instance, the following parameter setting specifies that we want to consider hyperplane splits on the first three features with any continuous weights, and also hyperplane splits on features 6-9 with integer-valued weights:

```
hyperplane_config=[
(sparsity=:all, feature_set=[1, 2, 3]),
(sparsity=:all, feature_set=6:9, values=:discrete),
]
```

`ls_num_hyper_restarts`

`ls_num_hyper_restarts`

is an `Integer`

controlling the number of random restarts to use when optimizing hyperplane splits. Must be positive and defaults to `5`

. If you are noticing that your hyperplane splits are taking a long time to train, you might like to try decreasing this value towards `1`

to speed up the training without having a significant effect on the final performance.

### Rarely-used shared parameters

`ls_ignore_errors`

`ls_ignore_errors`

is a `Bool`

that controls whether to ignore any errors that arise in the local search procedure. The only reason to enable this is to ignore any errors resulting from an edge-case bug before the bug is fixed. Defaults to `false`

.

`localsearch`

`localsearch`

is a `Bool`

that determines whether to use the local search procedure to train the tree. Defaults to `true`

, when set to `false`

, a greedy algorithm similar to CART will be used to train the tree.

`ls_num_categoric_restarts`

`ls_num_categoric_restarts`

is an `Integer`

controlling the number of random restarts to use when optimizing categoric splits. Must be positive and defaults to `10`

. There is no need to change to this parameter.

`ls_warmstart_criterion`

`ls_warmstart_criterion`

is a `Symbol`

that specifies which criterion to use when generating the random restarts for the local search (see Scoring Criteria). The default value depends on the problem type and does not need to be changed.

## Classification Learners

The `OptimalTreeClassifier`

is used for training Optimal Classification Trees. There are no additional parameters beyond the shared parameters.

## Regression Learners

The `OptimalTreeRegressor`

is used for training Optimal Regression Trees. In addition to the shared parameters, these learners also support the shared regression parameters as well as the following parameters.

`regression_sparsity`

`regression_sparsity`

specifies whether to use linear regresssion predictions in the leaves of the tree. The default value is `0`

indicating only a constant term will be used for prediction. Set to `:all`

to fit a linear regression in each leaf.

`regression_lambda`

`regression_lambda`

is a non-negative `Real`

that controls the amount of regularization used when fitting the regression equations in each leaf. The default value is `0.01`

.

`regression_features`

`regression_features`

specifies the set of features that can be used in the for linear regression in each leaf. With the default value of `:all`

, there is no restriction on which features can be included. If you would like to restrict the linear regression to a subset of the features in the data, you can specify the set of feature indices that are permitted in the regression.

`regression_cd_algorithm`

`regression_cd_algorithm`

controls the coordinate-descent algorithm used to fit the linear regression equations in each leaf. The default value is `:covariance`

, but in scenarios where the number of features is small you might achieve faster training by setting this parameter to `:naive`

.

`regression_weighted_betas`

`regression_weighted_betas`

specifies the regularization scheme to use for the linear regression equations. With the default value of `false`

, the regularization penalty is simply added across each leaf in the tree, meaning that smaller leaves are more likely to have simpler regression equations. When set to `true`

, the regularization penalty is weighted by the number of points in each leaf, so that the complexity of the linear regressions is not restricted by the size of the leaf.

It is recommended to set this paramater to `true`

in prescription problems with linear regression in the leaves so that the tree is not penalized for any additional splitting that is needed to refine prescriptions. In other applications, the default value is usually fine.

## Survival Learners

The `OptimalTreeSurvivor`

is used for training Optimal Survival Trees. In addition to the shared parameters, these learners also support the following parameters.

`death_minbucket`

`death_minbucket`

is similar to `minbucket`

, except it specifies the number of non-censored observations that are required in each leaf.

The default value is `1`

. As with `minbucket`

, we recommend raising this value only if you notice undesirable behavior in the fitted trees.

## Prescription Learners

`OptimalTreePrescriptionMinimizer`

and `OptimalTreePrescriptionMaximizer`

are used to train Optimal Prescriptive Trees. The learner you should select depends on whether the goal of your prescriptive problem is to minimize or maximize outcomes.

These learners support all parameters of `OptimalTreeRegressor`

, the shared prescription parameters, and the following parameters.

`treatment_minbucket`

`treatment_minbucket`

is similar to `minbucket`

, except it specifies the number of points of a given treatment that must be in a leaf in order for that leaf to prescribe this treatment. For instance, if `treatment_minbucket`

is 10, then there must be 10 points of treatment A in a leaf before the leaf is allowed to consider treatment A for prescription.

The default value is `1`

. As with `minbucket`

, we recommend raising this value only if you notice undesirable behavior in the fitted trees.