Quick Start Guide: Heuristic Classifiers

This is a Python version of the corresponding Heuristics quick start guide.

In this example we will use classifiers from Heuristics on the banknote authentication dataset. First we load in the data and split into training and test datasets:

import pandas as pd
df = pd.read_csv("data_banknote_authentication.txt", header=None,
                 names=['variance', 'skewness', 'curtosis', 'entropy', 'class'])
      variance  skewness  curtosis  entropy  class
0      3.62160   8.66610  -2.80730 -0.44699      0
1      4.54590   8.16740  -2.45860 -1.46210      0
2      3.86600  -2.63830   1.92420  0.10645      0
3      3.45660   9.52280  -4.01120 -3.59440      0
4      0.32924  -4.45520   4.57180 -0.98880      0
5      4.36840   9.67180  -3.96060 -3.16250      0
6      3.59120   3.01290   0.72888  0.56421      0
...        ...       ...       ...      ...    ...
1365  -4.50460  -5.81260  10.88670 -0.52846      1
1366  -2.41000   3.74330  -0.40215 -1.29530      1
1367   0.40614   1.34920  -1.45010 -0.55949      1
1368  -1.38870  -4.87730   6.47740  0.34179      1
1369  -3.75030 -13.45860  17.59320 -2.77710      1
1370  -3.56370  -8.38270  12.39300 -1.28230      1
1371  -2.54190  -0.65804   2.68420  1.19520      1

[1372 rows x 5 columns]
from interpretableai import iai
X = df.iloc[:, 0:4]
y = df.iloc[:, 4]
(train_X, train_y), (test_X, test_y) = iai.split_data('classification', X, y,
                                                      seed=1)

Random Forest Classifier

We will use a GridSearch to fit a RandomForestClassifier with some basic parameter validation:

grid = iai.GridSearch(
    iai.RandomForestClassifier(
        random_seed=1,
    ),
    max_depth=range(5, 11),
)
grid.fit(train_X, train_y)

We can make predictions on new data using predict:

grid.predict(test_X)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

We can evaluate the quality of the model using score with any of the supported loss functions. For example, the misclassification on the training set:

grid.score(train_X, train_y, criterion='misclassification')
1.0

Or the AUC on the test set:

grid.score(test_X, test_y, criterion='auc')
0.9995943398477585

We can also look at the variable importance:

grid.get_learner().variable_importance()
    Feature  Importance
0  variance    0.554808
1  skewness    0.252052
2  curtosis    0.139902
3   entropy    0.053238

XGBoost Classifier

We will use a GridSearch to fit an XGBoostClassifier with some basic parameter validation:

grid = iai.GridSearch(
    iai.XGBoostClassifier(
        random_seed=1,
    ),
    max_depth=range(2, 6),
    num_round=[20, 50, 100],
)
grid.fit(train_X, train_y)

We can make predictions on new data using predict:

grid.predict(test_X)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

We can evaluate the quality of the model using score with any of the supported loss functions. For example, the misclassification on the training set:

grid.score(train_X, train_y, criterion='misclassification')
1.0

Or the AUC on the test set:

grid.score(test_X, test_y, criterion='auc')
0.9999522752762073

We can also look at the variable importance:

grid.get_learner().variable_importance()
    Feature  Importance
0  variance    0.616981
1  skewness    0.247354
2  curtosis    0.130078
3   entropy    0.005587