Quick Start Guide: Heuristic Classifiers
This is an R version of the corresponding Heuristics quick start guide.
In this example we will use classifiers from Heuristics on the banknote authentication dataset. First we load in the data and split into training and test datasets:
df <- read.table("data_banknote_authentication.txt", sep = ",",
col.names = c("variance", "skewness", "curtosis", "entropy",
"class"))
variance skewness curtosis entropy class
1 3.62160 8.6661 -2.80730 -0.44699 0
2 4.54590 8.1674 -2.45860 -1.46210 0
3 3.86600 -2.6383 1.92420 0.10645 0
4 3.45660 9.5228 -4.01120 -3.59440 0
5 0.32924 -4.4552 4.57180 -0.98880 0
6 4.36840 9.6718 -3.96060 -3.16250 0
7 3.59120 3.0129 0.72888 0.56421 0
8 2.09220 -6.8100 8.46360 -0.60216 0
9 3.20320 5.7588 -0.75345 -0.61251 0
10 1.53560 9.1772 -2.27180 -0.73535 0
11 1.22470 8.7779 -2.21350 -0.80647 0
12 3.98990 -2.7066 2.39460 0.86291 0
[ reached 'max' / getOption("max.print") -- omitted 1360 rows ]
X <- df[, 1:4]
y <- df[, 5]
split <- iai::split_data("classification", X, y, seed = 1)
train_X <- split$train$X
train_y <- split$train$y
test_X <- split$test$X
test_y <- split$test$y
Random Forest Classifier
We will use a grid_search
to fit a random_forest_classifier
with some basic parameter validation:
grid <- iai::grid_search(
iai::random_forest_classifier(
random_seed = 1,
),
max_depth = 5:10,
)
iai::fit(grid, train_X, train_y)
We can make predictions on new data using predict
:
iai::predict(grid, test_X)
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[39] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[ reached getOption("max.print") -- omitted 352 entries ]
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the misclassification on the training set:
iai::score(grid, train_X, train_y, criterion = "misclassification")
[1] 1
Or the AUC on the test set:
iai::score(grid, test_X, test_y, criterion = "auc")
[1] 0.9995943
We can also look at the variable importance:
iai::variable_importance(iai::get_learner(grid))
Feature Importance
1 variance 0.55480836
2 skewness 0.25205160
3 curtosis 0.13990229
4 entropy 0.05323775
XGBoost Classifier
We will use a grid_search
to fit an xgboost_classifier
with some basic parameter validation:
grid <- iai::grid_search(
iai::xgboost_classifier(
random_seed = 1,
),
max_depth = 2:5,
num_round = c(20, 50, 100),
)
iai::fit(grid, train_X, train_y)
We can make predictions on new data using predict
:
iai::predict(grid, test_X)
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[39] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[ reached getOption("max.print") -- omitted 352 entries ]
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the misclassification on the training set:
iai::score(grid, train_X, train_y, criterion = "misclassification")
[1] 1
Or the AUC on the test set:
iai::score(grid, test_X, test_y, criterion = "auc")
[1] 0.9999523
We can also look at the variable importance:
iai::variable_importance(iai::get_learner(grid))
Feature Importance
1 variance 0.61698139
2 skewness 0.24735367
3 curtosis 0.13007769
4 entropy 0.00558724