Working with Tree Learners
Tree learners support all of the core learner functionality provided by IAIBase. In addition, they also support a number of additional functions related to trees.
General Functions
The examples in this section use the following learner:
using CSV, DataFrames
df = CSV.read("iris.csv", DataFrame)
X = df[:, 1:4]
y = df[:, 5]
lnr = IAI.OptimalTreeClassifier(max_depth=2, cp=0, random_seed=15)
IAI.fit!(lnr, X, y)
We can use apply
to find the index of the leaf that contains each point in our data:
IAI.apply(lnr, X)
150-element Vector{Int64}:
2
2
2
2
2
2
2
2
2
2
⋮
5
5
5
5
5
5
5
5
5
We can get the set of points that fall into each node with apply_nodes
:
IAI.apply_nodes(lnr, X)
5-element Vector{Vector{Int64}}:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 141, 142, 143, 144, 145, 146, 147, 148, 149, 150]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 … 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
[51, 52, 53, 54, 55, 56, 57, 58, 59, 60 … 141, 142, 143, 144, 145, 146, 147, 148, 149, 150]
[51, 52, 53, 54, 55, 56, 57, 58, 59, 60 … 95, 96, 97, 98, 99, 100, 120, 130, 134, 135]
[71, 78, 101, 102, 103, 104, 105, 106, 107, 108 … 141, 142, 143, 144, 145, 146, 147, 148, 149, 150]
To obtain the path of each point through the tree, use decision_path
, which returns a sparse matrix indicating which nodes each point passes through:
IAI.decision_path(lnr, X)
150×5 SparseArrays.SparseMatrixCSC{Bool, Int64} with 400 stored entries:
⎡⣿⠀⎤
⎢⣿⠀⎥
⎢⣿⠀⎥
⎢⣿⠀⎥
⎢⣿⠀⎥
⎢⣿⠀⎥
⎢⣿⡀⎥
⎢⣿⡇⎥
⎢⣿⡇⎥
⎢⣿⡗⎥
⎢⣿⡗⎥
⎢⣿⡇⎥
⎢⣿⡇⎥
⎢⣿⢱⎥
⎢⣿⢸⎥
⎢⣿⣸⎥
⎢⣿⢸⎥
⎢⣿⡹⎥
⎢⣿⢸⎥
⎣⣿⢸⎦
Alternatively, we can see a textual representation of the path of a point through the tree with print_path
:
IAI.print_path(lnr, X, 1)
Rules used to predict sample 1:
1) Split: PetalLength (=1.4) < 2.45
2) Predict: setosa (100.00%), [50,0,0], 50 points, error 0
The importance of each feature in the overall tree can be summarized with variable_importance
:
IAI.variable_importance(lnr)
4×2 DataFrame
Row │ Feature Importance
│ Symbol Float64
─────┼─────────────────────────
1 │ PetalWidth 0.685106
2 │ PetalLength 0.314894
3 │ SepalLength 0.0
4 │ SepalWidth 0.0
Task-specific Functions
Classification Tree Learners
Setting the threshold
In binary classification problems, the label prediction made by a leaf is typically the label with a predicted probability over 50%. However, it is possible to control this process and choose a different threshold for when a label will be predicted using set_threshold!
. When specifying a threshold for a label, this label will be predicted if the predicted probability in the leaf for this label is at least the threshold, otherwise the other label will be predicted. To illustrate this, we will change the threshold of the following tree:
First, we specify a threshold of 0.25 for B, meaning that label B will be predicted if the probability of label B in a leaf is at least 25%. This causes all leaves to predict B:
IAI.set_threshold!(lnr, "B", 0.25)
Similarly, we can set the threshold for predicting A to 0.4, meaning that label A is predicted if the probability of label A in a leaf is at least 40%. This causes one of the leaves that originally predicted B to now predict A:
IAI.set_threshold!(lnr, "A", 0.4)
When using set_threshold!
, we can also simplify the resulting tree, meaning that any adjacent leaves with the same label prediction will be collapsed into a single leaf. In our case, the two leaves predicting A are merged, leaving just a single leaf predicting A:
IAI.set_threshold!(lnr, "A", 0.4, simplify=true)