# Working with Tree Learners

Tree learners support all of the core learner functionality provided by IAIBase. In addition, they also support a number of additional functions related to trees.

## General Functions

The examples in this section use the following learner:

```
using CSV, DataFrames
df = CSV.read("iris.csv", DataFrame)
X = df[:, 1:4]
y = df[:, 5]
lnr = IAI.OptimalTreeClassifier(max_depth=2, cp=0, random_seed=15)
IAI.fit!(lnr, X, y)
```

We can use `apply`

to find the index of the leaf that contains each point in our data:

`IAI.apply(lnr, X)`

```
150-element Vector{Int64}:
2
2
2
2
2
2
2
2
2
2
⋮
5
5
5
5
5
5
5
5
5
```

We can get the set of points that fall into each node with `apply_nodes`

:

`IAI.apply_nodes(lnr, X)`

```
5-element Vector{Vector{Int64}}:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 141, 142, 143, 144, 145, 146, 147, 148, 149, 150]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
[51, 52, 53, 54, 55, 56, 57, 58, 59, 60 … 141, 142, 143, 144, 145, 146, 147, 148, 149, 150]
[51, 52, 53, 54, 55, 56, 57, 58, 59, 60 … 95, 96, 97, 98, 99, 100, 120, 130, 134, 135]
[71, 78, 101, 102, 103, 104, 105, 106, 107, 108 … 141, 142, 143, 144, 145, 146, 147, 148, 149, 150]
```

To obtain the path of each point through the tree, use `decision_path`

, which returns a sparse matrix indicating which nodes each point passes through:

`IAI.decision_path(lnr, X)`

```
150×5 SparseArrays.SparseMatrixCSC{Bool, Int64} with 400 stored entries:
⡇
⡇
⡇
⡇
⡇
⡇
⣇
⣿
⣿
⣿
⣿
⣿
⣿
⣿
⣿
⣿
⣿
⣿
⣿
⣿
```

Alternatively, we can see a textual representation of the path of a point through the tree with `print_path`

:

`IAI.print_path(lnr, X, 1)`

```
Rules used to predict sample 1:
1) Split: PetalLength (=1.4) < 2.45
2) Predict: setosa (100.00%), [50,0,0], 50 points, error 0
```

The importance of each feature in the overall tree can be summarized with `variable_importance`

:

`IAI.variable_importance(lnr)`

```
4×2 DataFrame
Row │ Feature Importance
│ Symbol Float64
─────┼─────────────────────────
1 │ PetalWidth 0.685106
2 │ PetalLength 0.314894
3 │ SepalLength 0.0
4 │ SepalWidth 0.0
```

## Task-specific Functions

### Classification Tree Learners

#### Setting the threshold

In binary classification problems, the label prediction made by a leaf is typically the label with a predicted probability over 50%. However, it is possible to control this process and choose a different threshold for when a label will be predicted using `set_threshold!`

. When specifying a threshold for a label, this label will be predicted if the predicted probability in the leaf for this label is at least the threshold, otherwise the other label will be predicted. To illustrate this, we will change the threshold of the following tree:

First, we specify a threshold of 0.25 for B, meaning that label B will be predicted if the probability of label B in a leaf is at least 25%. This causes all leaves to predict B:

`IAI.set_threshold!(lnr, "B", 0.25)`

Similarly, we can set the threshold for predicting A to 0.4, meaning that label A is predicted if the probability of label A in a leaf is at least 40%. This causes one of the leaves that originally predicted B to now predict A:

`IAI.set_threshold!(lnr, "A", 0.4)`

When using `set_threshold!`

, we can also simplify the resulting tree, meaning that any adjacent leaves with the same label prediction will be collapsed into a single leaf. In our case, the two leaves predicting A are merged, leaving just a single leaf predicting A:

`IAI.set_threshold!(lnr, "A", 0.4, simplify=true)`