Quick Start Guide: Heuristic Classifiers
In this example we will use classifiers from Heuristics on the banknote authentication dataset. First we load in the data and split into training and test datasets:
using CSV, DataFrames
df = DataFrame(CSV.File("data_banknote_authentication.txt",
header=[:variance, :skewness, :curtosis, :entropy, :class]))
1372×5 DataFrame
Row │ variance skewness curtosis entropy class
│ Float64 Float64 Float64 Float64 Int64
──────┼─────────────────────────────────────────────────
1 │ 3.6216 8.6661 -2.8073 -0.44699 0
2 │ 4.5459 8.1674 -2.4586 -1.4621 0
3 │ 3.866 -2.6383 1.9242 0.10645 0
4 │ 3.4566 9.5228 -4.0112 -3.5944 0
5 │ 0.32924 -4.4552 4.5718 -0.9888 0
6 │ 4.3684 9.6718 -3.9606 -3.1625 0
7 │ 3.5912 3.0129 0.72888 0.56421 0
8 │ 2.0922 -6.81 8.4636 -0.60216 0
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
1366 │ -4.5046 -5.8126 10.8867 -0.52846 1
1367 │ -2.41 3.7433 -0.40215 -1.2953 1
1368 │ 0.40614 1.3492 -1.4501 -0.55949 1
1369 │ -1.3887 -4.8773 6.4774 0.34179 1
1370 │ -3.7503 -13.4586 17.5932 -2.7771 1
1371 │ -3.5637 -8.3827 12.393 -1.2823 1
1372 │ -2.5419 -0.65804 2.6842 1.1952 1
1357 rows omitted
X = df[:, 1:4]
y = df[:, 5]
(train_X, train_y), (test_X, test_y) = IAI.split_data(:classification, X, y,
seed=1)
Random Forest Classifier
We will use a GridSearch
to fit a RandomForestClassifier
with some basic parameter validation:
grid = IAI.GridSearch(
IAI.RandomForestClassifier(
random_seed=1,
),
max_depth=5:10,
)
IAI.fit!(grid, train_X, train_y)
All Grid Results:
Row │ max_depth train_score valid_score rank_valid_score
│ Int64 Float64 Float64 Int64
─────┼───────────────────────────────────────────────────────
1 │ 5 0.903239 0.871707 6
2 │ 6 0.938004 0.902482 5
3 │ 7 0.95689 0.918628 4
4 │ 8 0.963382 0.923913 3
5 │ 9 0.965263 0.926137 1
6 │ 10 0.965293 0.925891 2
Best Params:
max_depth => 9
Best Model - Fitted RandomForestClassifier
We can make predictions on new data using predict
:
IAI.predict(grid, test_X)
412-element Array{Int64,1}:
0
0
0
0
0
0
0
0
0
0
⋮
1
1
1
1
1
1
1
1
1
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the misclassification on the training set:
IAI.score(grid, train_X, train_y, criterion=:misclassification)
1.0
Or the AUC on the test set:
IAI.score(grid, test_X, test_y, criterion=:auc)
0.9995943398477585
We can also look at the variable importance:
IAI.variable_importance(IAI.get_learner(grid))
4×2 DataFrame
Row │ Feature Importance
│ Symbol Float64
─────┼──────────────────────
1 │ variance 0.554808
2 │ skewness 0.252052
3 │ curtosis 0.139902
4 │ entropy 0.0532378
XGBoost Classifier
We will use a GridSearch
to fit an XGBoostClassifier
with some basic parameter validation:
grid = IAI.GridSearch(
IAI.XGBoostClassifier(
random_seed=1,
),
max_depth=2:5,
num_round=[20, 50, 100],
)
IAI.fit!(grid, train_X, train_y)
All Grid Results:
Row │ num_round max_depth train_score valid_score rank_valid_score
│ Int64 Int64 Float64 Float64 Int64
─────┼──────────────────────────────────────────────────────────────────
1 │ 20 2 0.902156 0.877842 12
2 │ 20 3 0.970873 0.944634 10
3 │ 20 4 0.987441 0.952704 9
4 │ 20 5 0.989803 0.940254 11
5 │ 50 2 0.985077 0.962432 6
6 │ 50 3 0.995074 0.976761 3
7 │ 50 4 0.995969 0.967541 5
8 │ 50 5 0.996379 0.959306 8
9 │ 100 2 0.995566 0.978982 2
10 │ 100 3 0.996601 0.979295 1
11 │ 100 4 0.996853 0.968232 4
12 │ 100 5 0.997185 0.961939 7
Best Params:
num_round => 100
max_depth => 3
Best Model - Fitted XGBoostClassifier
We can make predictions on new data using predict
:
IAI.predict(grid, test_X)
412-element Array{Int64,1}:
0
0
0
0
0
0
0
0
0
0
⋮
1
1
1
1
1
1
1
1
1
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the misclassification on the training set:
IAI.score(grid, train_X, train_y, criterion=:misclassification)
1.0
Or the AUC on the test set:
IAI.score(grid, test_X, test_y, criterion=:auc)
0.9999522752762073
We can also look at the variable importance:
IAI.variable_importance(IAI.get_learner(grid))
4×2 DataFrame
Row │ Feature Importance
│ Symbol Float64
─────┼──────────────────────
1 │ variance 0.616981
2 │ skewness 0.247354
3 │ curtosis 0.130078
4 │ entropy 0.00558724