# Working with Tree Learners

Tree learners support all of the core learner functionality provided by IAIBase. In addition, they also support a number of additional functions related to trees.

## General Functions

The examples in this section use the following learner:

using CSV, DataFrames
X = df[:, 1:4]
y = df[:, 5]
lnr = IAI.OptimalTreeClassifier(max_depth=2, cp=0, random_seed=15)
IAI.fit!(lnr, X, y)
Optimal Trees Visualization

We can use apply to find the index of the leaf that contains each point in our data:

IAI.apply(lnr, X)
150-element Vector{Int64}:
2
2
2
2
2
2
2
2
2
2
⋮
5
5
5
5
5
5
5
5
5

We can get the set of points that fall into each node with apply_nodes:

IAI.apply_nodes(lnr, X)
5-element Vector{Vector{Int64}}:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  141, 142, 143, 144, 145, 146, 147, 148, 149, 150]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
[51, 52, 53, 54, 55, 56, 57, 58, 59, 60  …  141, 142, 143, 144, 145, 146, 147, 148, 149, 150]
[51, 52, 53, 54, 55, 56, 57, 58, 59, 60  …  95, 96, 97, 98, 99, 100, 120, 130, 134, 135]
[71, 78, 101, 102, 103, 104, 105, 106, 107, 108  …  141, 142, 143, 144, 145, 146, 147, 148, 149, 150]

To obtain the path of each point through the tree, use decision_path, which returns a sparse matrix indicating which nodes each point passes through:

IAI.decision_path(lnr, X)
150×5 SparseArrays.SparseMatrixCSC{Bool, Int64} with 400 stored entries:
⡇
⡇
⡇
⡇
⡇
⡇
⣇
⣿
⣿
⣿
⣿
⣿
⣿
⣿
⣿
⣿
⣿
⣿
⣿
⣿

Alternatively, we can see a textual representation of the path of a point through the tree with print_path:

IAI.print_path(lnr, X, 1)
Rules used to predict sample 1:
1) Split: PetalLength (=1.4) < 2.45
2) Predict: setosa (100.00%), [50,0,0], 50 points, error 0

The importance of each feature in the overall tree can be summarized with variable_importance:

IAI.variable_importance(lnr)
4×2 DataFrame
Row │ Feature      Importance
│ Symbol       Float64
─────┼─────────────────────────
1 │ PetalWidth     0.685106
2 │ PetalLength    0.314894
3 │ SepalLength    0.0
4 │ SepalWidth     0.0

In binary classification problems, the label prediction made by a leaf is typically the label with a predicted probability over 50%. However, it is possible to control this process and choose a different threshold for when a label will be predicted using set_threshold!. When specifying a threshold for a label, this label will be predicted if the predicted probability in the leaf for this label is at least the threshold, otherwise the other label will be predicted. To illustrate this, we will change the threshold of the following tree:
IAI.set_threshold!(lnr, "B", 0.25)