Quick Start Guide: Heuristic Classifiers

This is an R version of the corresponding Heuristics quick start guide.

In this example we will use classifiers from Heuristics on the banknote authentication dataset. First we load in the data and split into training and test datasets:

df <- read.table("data_banknote_authentication.txt", sep = ",",
                 col.names = c("variance", "skewness", "curtosis", "entropy",
                               "class"))

   variance skewness curtosis  entropy class
1   3.62160   8.6661 -2.80730 -0.44699     0
2   4.54590   8.1674 -2.45860 -1.46210     0
3   3.86600  -2.6383  1.92420  0.10645     0
4   3.45660   9.5228 -4.01120 -3.59440     0
5   0.32924  -4.4552  4.57180 -0.98880     0
6   4.36840   9.6718 -3.96060 -3.16250     0
7   3.59120   3.0129  0.72888  0.56421     0
8   2.09220  -6.8100  8.46360 -0.60216     0
9   3.20320   5.7588 -0.75345 -0.61251     0
10  1.53560   9.1772 -2.27180 -0.73535     0
11  1.22470   8.7779 -2.21350 -0.80647     0
12  3.98990  -2.7066  2.39460  0.86291     0
 [ reached 'max' / getOption("max.print") -- omitted 1360 rows ]

X <- df[, 1:4]
y <- df[, 5]
split <- iai::split_data("classification", X, y, seed = 1)
train_X <- split$train$X
train_y <- split$train$y
test_X <- split$test$X
test_y <- split$test$y

Random Forest Classifier

We will use a grid_search to fit a random_forest_classifier with some basic parameter validation:

grid <- iai::grid_search(
    iai::random_forest_classifier(
        random_seed = 1,
    ),
    max_depth = 5:10,
)
iai::fit(grid, train_X, train_y)

We can make predictions on new data using predict:

iai::predict(grid, test_X)

 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[39] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [ reached getOption("max.print") -- omitted 352 entries ]

We can evaluate the quality of the model using score with any of the supported loss functions. For example, the misclassification on the training set:

iai::score(grid, train_X, train_y, criterion = "misclassification")

[1] 1

Or the AUC on the test set:

iai::score(grid, test_X, test_y, criterion = "auc")

[1] 0.9995943

We can also look at the variable importance:

iai::variable_importance(iai::get_learner(grid))

   Feature Importance
1 variance 0.55480836
2 skewness 0.25205160
3 curtosis 0.13990229
4  entropy 0.05323775

XGBoost Classifier

We will use a grid_search to fit an xgboost_classifier with some basic parameter validation:

grid <- iai::grid_search(
    iai::xgboost_classifier(
        random_seed = 1,
    ),
    max_depth = 2:5,
    num_round = c(20, 50, 100),
)
iai::fit(grid, train_X, train_y)

We can make predictions on new data using predict:

iai::predict(grid, test_X)

 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[39] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 [ reached getOption("max.print") -- omitted 352 entries ]

We can evaluate the quality of the model using score with any of the supported loss functions. For example, the misclassification on the training set:

iai::score(grid, train_X, train_y, criterion = "misclassification")

[1] 1

Or the AUC on the test set:

iai::score(grid, test_X, test_y, criterion = "auc")

[1] 0.9999523

We can also look at the variable importance:

iai::variable_importance(iai::get_learner(grid))

   Feature Importance
1 variance 0.61698139
2 skewness 0.24735367
3 curtosis 0.13007769
4  entropy 0.00558724