Quick Start Guide: Heuristic Classifiers

In this example we will use classifiers from Heuristics on the banknote authentication dataset. First we load in the data and split into training and test datasets:

using CSV, DataFrames
df = DataFrame(CSV.File("data_banknote_authentication.txt",
    header=[:variance, :skewness, :curtosis, :entropy, :class]))
1372×5 DataFrame
  Row │ variance  skewness   curtosis   entropy   class
      │ Float64   Float64    Float64    Float64   Int64
──────┼─────────────────────────────────────────────────
    1 │  3.6216     8.6661   -2.8073    -0.44699      0
    2 │  4.5459     8.1674   -2.4586    -1.4621       0
    3 │  3.866     -2.6383    1.9242     0.10645      0
    4 │  3.4566     9.5228   -4.0112    -3.5944       0
    5 │  0.32924   -4.4552    4.5718    -0.9888       0
    6 │  4.3684     9.6718   -3.9606    -3.1625       0
    7 │  3.5912     3.0129    0.72888    0.56421      0
    8 │  2.0922    -6.81      8.4636    -0.60216      0
  ⋮   │    ⋮          ⋮          ⋮         ⋮        ⋮
 1366 │ -4.5046    -5.8126   10.8867    -0.52846      1
 1367 │ -2.41       3.7433   -0.40215   -1.2953       1
 1368 │  0.40614    1.3492   -1.4501    -0.55949      1
 1369 │ -1.3887    -4.8773    6.4774     0.34179      1
 1370 │ -3.7503   -13.4586   17.5932    -2.7771       1
 1371 │ -3.5637    -8.3827   12.393     -1.2823       1
 1372 │ -2.5419    -0.65804   2.6842     1.1952       1
                                       1357 rows omitted
X = df[:, 1:4]
y = df[:, 5]
(train_X, train_y), (test_X, test_y) = IAI.split_data(:classification, X, y,
                                                      seed=1)

Random Forest Classifier

We will use a GridSearch to fit a RandomForestClassifier with some basic parameter validation:

grid = IAI.GridSearch(
    IAI.RandomForestClassifier(
        random_seed=1,
    ),
    max_depth=5:10,
)
IAI.fit!(grid, train_X, train_y)
All Grid Results:

 Row │ max_depth  train_score  valid_score  rank_valid_score
     │ Int64      Float64      Float64      Int64
─────┼───────────────────────────────────────────────────────
   1 │         5     0.903239     0.871707                 6
   2 │         6     0.938004     0.902482                 5
   3 │         7     0.95689      0.918628                 4
   4 │         8     0.963382     0.923913                 3
   5 │         9     0.965263     0.926137                 1
   6 │        10     0.965293     0.925891                 2

Best Params:
  max_depth => 9

Best Model - Fitted RandomForestClassifier

We can make predictions on new data using predict:

IAI.predict(grid, test_X)
412-element Array{Int64,1}:
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 ⋮
 1
 1
 1
 1
 1
 1
 1
 1
 1

We can evaluate the quality of the model using score with any of the supported loss functions. For example, the misclassification on the training set:

IAI.score(grid, train_X, train_y, criterion=:misclassification)
1.0

Or the AUC on the test set:

IAI.score(grid, test_X, test_y, criterion=:auc)
0.9995943398477585

We can also look at the variable importance:

IAI.variable_importance(IAI.get_learner(grid))
4×2 DataFrame
 Row │ Feature   Importance
     │ Symbol    Float64
─────┼──────────────────────
   1 │ variance   0.554808
   2 │ skewness   0.252052
   3 │ curtosis   0.139902
   4 │ entropy    0.0532378

XGBoost Classifier

We will use a GridSearch to fit an XGBoostClassifier with some basic parameter validation:

grid = IAI.GridSearch(
    IAI.XGBoostClassifier(
        random_seed=1,
    ),
    max_depth=2:5,
    num_round=[20, 50, 100],
)
IAI.fit!(grid, train_X, train_y)
All Grid Results:

 Row │ num_round  max_depth  train_score  valid_score  rank_valid_score
     │ Int64      Int64      Float64      Float64      Int64
─────┼──────────────────────────────────────────────────────────────────
   1 │        20          2     0.902156     0.877842                12
   2 │        20          3     0.970873     0.944634                10
   3 │        20          4     0.987441     0.952704                 9
   4 │        20          5     0.989803     0.940254                11
   5 │        50          2     0.985077     0.962432                 6
   6 │        50          3     0.995074     0.976761                 3
   7 │        50          4     0.995969     0.967541                 5
   8 │        50          5     0.996379     0.959306                 8
   9 │       100          2     0.995566     0.978982                 2
  10 │       100          3     0.996601     0.979295                 1
  11 │       100          4     0.996853     0.968232                 4
  12 │       100          5     0.997185     0.961939                 7

Best Params:
  num_round => 100
  max_depth => 3

Best Model - Fitted XGBoostClassifier

We can make predictions on new data using predict:

IAI.predict(grid, test_X)
412-element Array{Int64,1}:
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 ⋮
 1
 1
 1
 1
 1
 1
 1
 1
 1

We can evaluate the quality of the model using score with any of the supported loss functions. For example, the misclassification on the training set:

IAI.score(grid, train_X, train_y, criterion=:misclassification)
1.0

Or the AUC on the test set:

IAI.score(grid, test_X, test_y, criterion=:auc)
0.9999522752762073

We can also look at the variable importance:

IAI.variable_importance(IAI.get_learner(grid))
4×2 DataFrame
 Row │ Feature   Importance
     │ Symbol    Float64
─────┼──────────────────────
   1 │ variance  0.616981
   2 │ skewness  0.247354
   3 │ curtosis  0.130078
   4 │ entropy   0.00558724
This documentation is not for the latest stable release, but for either the development version or an older release.
Click here to go to the documentation for the latest stable release.