Classification

Quick Start Guide: Optimal Classification Trees

This is a Python version of the corresponding OptimalTrees quick start guide.

In this example we will use Optimal Classification Trees (OCT) on the banknote authentication dataset. First we load in the data and split into training and test datasets:

import pandas as pd
df = pd.read_csv("data_banknote_authentication.txt", header=None,
                 names=['variance', 'skewness', 'curtosis', 'entropy', 'class'])
      variance  skewness  curtosis  entropy  class
0      3.62160   8.66610  -2.80730 -0.44699      0
1      4.54590   8.16740  -2.45860 -1.46210      0
2      3.86600  -2.63830   1.92420  0.10645      0
3      3.45660   9.52280  -4.01120 -3.59440      0
4      0.32924  -4.45520   4.57180 -0.98880      0
5      4.36840   9.67180  -3.96060 -3.16250      0
6      3.59120   3.01290   0.72888  0.56421      0
...        ...       ...       ...      ...    ...
1365  -4.50460  -5.81260  10.88670 -0.52846      1
1366  -2.41000   3.74330  -0.40215 -1.29530      1
1367   0.40614   1.34920  -1.45010 -0.55949      1
1368  -1.38870  -4.87730   6.47740  0.34179      1
1369  -3.75030 -13.45860  17.59320 -2.77710      1
1370  -3.56370  -8.38270  12.39300 -1.28230      1
1371  -2.54190  -0.65804   2.68420  1.19520      1

[1372 rows x 5 columns]
from interpretableai import iai
X = df.iloc[:, 0:4]
y = df.iloc[:, 4]
(train_X, train_y), (test_X, test_y) = iai.split_data('classification', X, y,
                                                      seed=1)

Optimal Classification Trees

We will use a GridSearch to fit an OptimalTreeClassifier:

grid = iai.GridSearch(
    iai.OptimalTreeClassifier(
        random_seed=1,
    ),
    max_depth=range(1, 6),
)
grid.fit(train_X, train_y)
grid.get_learner()
Optimal Trees Visualization

We can make predictions on new data using predict:

grid.predict(test_X)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

We can evaluate the quality of the tree using score with any of the supported loss functions. For example, the misclassification on the training set:

grid.score(train_X, train_y, criterion='misclassification')
0.9989583333333333

Or the AUC on the test set:

grid.score(test_X, test_y, criterion='auc')
0.9926145989930083

We can also plot the ROC curve on the test set:

iai.ROCCurve(grid, test_X, test_y)
ROC

Optimal Classification Trees with Hyperplanes

To use Optimal Classification Trees with hyperplane splits (OCT-H), you should set the hyperplane_config parameter:

grid = iai.GridSearch(
    iai.OptimalTreeClassifier(
        random_seed=1,
        max_depth=2,
        hyperplane_config={'sparsity': 'all'}
    ),
)
grid.fit(train_X, train_y)
grid.get_learner()
Optimal Trees Visualization

Now we can find the performance on the test set with hyperplanes:

grid.score(test_X, test_y, criterion='auc')
1.0

It seems that a very small tree with a hyperplane splits is able to model this dataset perfectly.