Working with Tree Learners

Tree learners support all of the core learner functionality provided by IAIBase. In addition, they also support a number of additional functions related to trees.

General Functions

The examples in this section use the following learner:

using CSV, DataFrames
df = CSV.read("iris.csv", DataFrame)
X = df[:, 1:4]
y = df[:, 5]
lnr = IAI.OptimalTreeClassifier(max_depth=2, cp=0, random_seed=15)
IAI.fit!(lnr, X, y)
Optimal Trees Visualization

We can use apply to find the index of the leaf that contains each point in our data:

IAI.apply(lnr, X)
150-element Vector{Int64}:
 2
 2
 2
 2
 2
 2
 2
 2
 2
 2
 ⋮
 5
 5
 5
 5
 5
 5
 5
 5
 5

We can get the set of points that fall into each node with apply_nodes:

IAI.apply_nodes(lnr, X)
5-element Vector{Vector{Int64}}:
 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  141, 142, 143, 144, 145, 146, 147, 148, 149, 150]
 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60  …  141, 142, 143, 144, 145, 146, 147, 148, 149, 150]
 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60  …  95, 96, 97, 98, 99, 100, 120, 130, 134, 135]
 [71, 78, 101, 102, 103, 104, 105, 106, 107, 108  …  141, 142, 143, 144, 145, 146, 147, 148, 149, 150]

To obtain the path of each point through the tree, use decision_path, which returns a sparse matrix indicating which nodes each point passes through:

IAI.decision_path(lnr, X)
150×5 SparseArrays.SparseMatrixCSC{Bool, Int64} with 400 stored entries:
⎡⣿⠀⎤
⎢⣿⠀⎥
⎢⣿⠀⎥
⎢⣿⠀⎥
⎢⣿⠀⎥
⎢⣿⠀⎥
⎢⣿⡀⎥
⎢⣿⡇⎥
⎢⣿⡇⎥
⎢⣿⡗⎥
⎢⣿⡗⎥
⎢⣿⡇⎥
⎢⣿⡇⎥
⎢⣿⢱⎥
⎢⣿⢸⎥
⎢⣿⣸⎥
⎢⣿⢸⎥
⎢⣿⡹⎥
⎢⣿⢸⎥
⎣⣿⢸⎦

Alternatively, we can see a textual representation of the path of a point through the tree with print_path:

IAI.print_path(lnr, X, 1)
Rules used to predict sample 1:
  1) Split: PetalLength (=1.4) < 2.45
    2) Predict: setosa (100.00%), [50,0,0], 50 points, error 0

The importance of each feature in the overall tree can be summarized with variable_importance:

IAI.variable_importance(lnr)
4×2 DataFrame
 Row │ Feature      Importance
     │ Symbol       Float64
─────┼─────────────────────────
   1 │ PetalWidth     0.685106
   2 │ PetalLength    0.314894
   3 │ SepalLength    0.0
   4 │ SepalWidth     0.0

Task-specific Functions

Classification Tree Learners

Setting the threshold

In binary classification problems, the label prediction made by a leaf is typically the label with a predicted probability over 50%. However, it is possible to control this process and choose a different threshold for when a label will be predicted using set_threshold!. When specifying a threshold for a label, this label will be predicted if the predicted probability in the leaf for this label is at least the threshold, otherwise the other label will be predicted. To illustrate this, we will change the threshold of the following tree:

Optimal Trees Visualization

First, we specify a threshold of 0.25 for B, meaning that label B will be predicted if the probability of label B in a leaf is at least 25%. This causes all leaves to predict B:

IAI.set_threshold!(lnr, "B", 0.25)
Optimal Trees Visualization

Similarly, we can set the threshold for predicting A to 0.4, meaning that label A is predicted if the probability of label A in a leaf is at least 40%. This causes one of the leaves that originally predicted B to now predict A:

IAI.set_threshold!(lnr, "A", 0.4)
Optimal Trees Visualization

When using set_threshold!, we can also simplify the resulting tree, meaning that any adjacent leaves with the same label prediction will be collapsed into a single leaf. In our case, the two leaves predicting A are merged, leaving just a single leaf predicting A:

IAI.set_threshold!(lnr, "A", 0.4, simplify=true)
Optimal Trees Visualization