Quick Start Guide: Optimal Classification Trees

This is an R version of the corresponding OptimalTrees quick start guide.

In this example we will use Optimal Classification Trees (OCT) on the banknote authentication dataset. First we load in the data and split into training and test datasets:

df <- read.table("data_banknote_authentication.txt", sep = ",",
                 col.names = c("variance", "skewness", "curtosis", "entropy",
                               "class"))
   variance skewness curtosis  entropy class
1   3.62160   8.6661 -2.80730 -0.44699     0
2   4.54590   8.1674 -2.45860 -1.46210     0
3   3.86600  -2.6383  1.92420  0.10645     0
4   3.45660   9.5228 -4.01120 -3.59440     0
5   0.32924  -4.4552  4.57180 -0.98880     0
6   4.36840   9.6718 -3.96060 -3.16250     0
7   3.59120   3.0129  0.72888  0.56421     0
8   2.09220  -6.8100  8.46360 -0.60216     0
9   3.20320   5.7588 -0.75345 -0.61251     0
10  1.53560   9.1772 -2.27180 -0.73535     0
11  1.22470   8.7779 -2.21350 -0.80647     0
12  3.98990  -2.7066  2.39460  0.86291     0
 [ reached 'max' / getOption("max.print") -- omitted 1360 rows ]
X <- df[, 1:4]
y <- df[, 5]
split <- iai::split_data("classification", X, y, seed = 1)
train_X <- split$train$X
train_y <- split$train$y
test_X <- split$test$X
test_y <- split$test$y

Optimal Classification Trees

We will use a grid_search to fit an optimal_tree_classifier:

grid <- iai::grid_search(
    iai::optimal_tree_classifier(
        random_seed = 1,
    ),
    max_depth = 1:5,
)
iai::fit(grid, train_X, train_y)
iai::get_learner(grid)
Optimal Trees Visualization
≥ 2.044 < 2.044 ≥ 0.7304 < 0.7304 ≥ 6.824 < 6.824 ≥ -0.35 < -0.35 ≥ -2.276 < -2.276 ≥ 0.4701 < 0.4701 ≥ -4.935 < -4.935 ≥ -1.916 < -1.916 ≥ -1.806 < -1.806 ≥ 0.3088 < 0.3088 ≥ -3.368 < -3.368 ≥ 0.7513 < 0.7513 ≥ 5.161 < 5.161Predict 0 p = 55.52%1skewnessPredict 1 p = 60.46%2variancePredict 0 p = 89.84%23variancePredict 1 p = 90.14%3variancePredict 0 p = 91.21%16curtosisPredict 1 p = 96.88%24entropyPredict 0 p = 100.00%27Predict 1 p = 94.61%4skewnessPredict 1 p = 53.33%11curtosisPredict 1 p = 87.50%17curtosisPredict 0 p = 100.00%22Predict 1 p = 100.00%25Predict 0 p = 100.00%26Predict 1 p = 99.24%5variancePredict 1 p = 92.05%8curtosisPredict 1 p = 100.00%12Predict 0 p = 91.30%13entropyPredict 1 p = 100.00%18Predict 0 p = 50.00%19skewnessPredict 1 p = 100.00%6Predict 0 p = 100.00%7Predict 1 p = 99.55%9Predict 0 p = 100.00%10Predict 0 p = 100.00%14Predict 1 p = 100.00%15Predict 1 p = 100.00%20Predict 0 p = 100.00%21
×

We can make predictions on new data using predict:

iai::predict(grid, test_X)
 [1] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[39] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [ reached getOption("max.print") -- omitted 352 entries ]

We can evaluate the quality of the tree using score with any of the supported loss functions. For example, the misclassification on the training set:

iai::score(grid, train_X, train_y, criterion = "misclassification")
[1] 0.9989583

Or the AUC on the test set:

iai::score(grid, test_X, test_y, criterion = "auc")
[1] 0.9909562

We can also plot the ROC curve on the test set:

iai::roc_curve(grid, test_X, test_y, positive_label = 1)
ROC
0.00.20.40.60.81.0False Positive Rate0.00.20.40.60.81.0True Positive Rate1.00.80.60.40.20.0AUC: 0.99
×

Optimal Classification Trees with Hyperplanes

To use Optimal Classification Trees with hyperplane splits (OCT-H), you should set the hyperplane_config parameter:

grid <- iai::grid_search(
    iai::optimal_tree_classifier(
        random_seed = 1,
        max_depth = 2,
        hyperplane_config = list(sparsity = "all"),
    ),
)
iai::fit(grid, train_X, train_y)
iai::get_learner(grid)
Optimal Trees Visualization
≥ 8.889 < 8.889 ≥ 0.05324 < 0.05324Predict 0 p = 55.52%10.05316 * variance + 0.02776 * skewness + 0.04017 * curtosisPredict 1 p = 100.00%2Predict 0 p = 96.04%3curtosisPredict 0 p = 100.00%4Predict 1 p = 100.00%5
×

Now we can find the performance on the test set with hyperplanes:

iai::score(grid, test_X, test_y, criterion = "auc")
[1] 0.9972678

It seems that a very small tree with a hyperplane splits is able to model this dataset perfectly.

This documentation is not for the latest stable release, but for either the development version or an older release.
Click here to go to the documentation for the latest stable release.