Quick Start Guide: Heuristic Classifiers
This is a Python version of the corresponding Heuristics quick start guide.
In this example we will use classifiers from Heuristics on the banknote authentication dataset. First we load in the data and split into training and test datasets:
import pandas as pd
df = pd.read_csv("data_banknote_authentication.txt", header=None,
names=['variance', 'skewness', 'curtosis', 'entropy', 'class'])
variance skewness curtosis entropy class
0 3.62160 8.66610 -2.80730 -0.44699 0
1 4.54590 8.16740 -2.45860 -1.46210 0
2 3.86600 -2.63830 1.92420 0.10645 0
3 3.45660 9.52280 -4.01120 -3.59440 0
4 0.32924 -4.45520 4.57180 -0.98880 0
5 4.36840 9.67180 -3.96060 -3.16250 0
6 3.59120 3.01290 0.72888 0.56421 0
... ... ... ... ... ...
1365 -4.50460 -5.81260 10.88670 -0.52846 1
1366 -2.41000 3.74330 -0.40215 -1.29530 1
1367 0.40614 1.34920 -1.45010 -0.55949 1
1368 -1.38870 -4.87730 6.47740 0.34179 1
1369 -3.75030 -13.45860 17.59320 -2.77710 1
1370 -3.56370 -8.38270 12.39300 -1.28230 1
1371 -2.54190 -0.65804 2.68420 1.19520 1
[1372 rows x 5 columns]
from interpretableai import iai
X = df.iloc[:, 0:4]
y = df.iloc[:, 4]
(train_X, train_y), (test_X, test_y) = iai.split_data('classification', X, y,
seed=1)
Random Forest Classifier
We will use a GridSearch
to fit a RandomForestClassifier
with some basic parameter validation:
grid = iai.GridSearch(
iai.RandomForestClassifier(
random_seed=1,
),
max_depth=range(5, 11),
)
grid.fit(train_X, train_y)
We can make predictions on new data using predict
:
grid.predict(test_X)
array([0, 0, 0, ..., 1, 1, 1])
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the misclassification on the training set:
grid.score(train_X, train_y, criterion='misclassification')
1.0
Or the AUC on the test set:
grid.score(test_X, test_y, criterion='auc')
0.9995943398477585
We can also look at the variable importance:
grid.get_learner().variable_importance()
Feature Importance
0 variance 0.554808
1 skewness 0.252052
2 curtosis 0.139902
3 entropy 0.053238
XGBoost Classifier
We will use a GridSearch
to fit an XGBoostClassifier
with some basic parameter validation:
grid = iai.GridSearch(
iai.XGBoostClassifier(
random_seed=1,
),
max_depth=range(2, 6),
num_round=[20, 50, 100],
)
grid.fit(train_X, train_y)
We can make predictions on new data using predict
:
grid.predict(test_X)
array([0, 0, 0, ..., 1, 1, 1])
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the misclassification on the training set:
grid.score(train_X, train_y, criterion='misclassification')
1.0
Or the AUC on the test set:
grid.score(test_X, test_y, criterion='auc')
0.9999522752762073
We can also look at the variable importance:
grid.get_learner().variable_importance()
Feature Importance
0 variance 0.616981
1 skewness 0.247354
2 curtosis 0.130078
3 entropy 0.005587