Quick Start Guide: Optimal Feature Selection for Classification

This is an R version of the corresponding OptimalFeatureSelection quick start guide.

In this example we will use Optimal Feature Selection on the Mushroom dataset, where the goal is to distinguish poisonous from edible mushrooms.

First we load in the data and split into training and test datasets:

df <- read.table(
    "agaricus-lepiota.data",
    sep = ",",
    col.names = c("target", "cap_shape", "cap_surface", "cap_color",
                  "bruises", "odor", "gill_attachment", "gill_spacing",
                  "gill_size", "gill_color", "stalk_shape", "stalk_root",
                  "stalk_surface_above", "stalk_surface_below",
                  "stalk_color_above", "stalk_color_below", "veil_type",
                  "veil_color", "ring_number", "ring_type", "spore_color",
                  "population", "habitat"),
    stringsAsFactors = T,
)
  target cap_shape cap_surface cap_color bruises odor gill_attachment
1      p         x           s         n       t    p               f
2      e         x           s         y       t    a               f
  gill_spacing gill_size gill_color stalk_shape stalk_root stalk_surface_above
1            c         n          k           e          e                   s
2            c         b          k           e          c                   s
  stalk_surface_below stalk_color_above stalk_color_below veil_type veil_color
1                   s                 w                 w         p          w
2                   s                 w                 w         p          w
  ring_number ring_type spore_color population habitat
1           o         p           k          s       u
2           o         p           n          n       g
 [ reached 'max' / getOption("max.print") -- omitted 8122 rows ]
X <- df[, 2:23]
y <- df[, 1]
split <- iai::split_data("classification", X, y, seed = 1)
train_X <- split$train$X
train_y <- split$train$y
test_X <- split$test$X
test_y <- split$test$y

Model Fitting

We will use a grid_search to fit an optimal_feature_selection_classifier:

grid <- iai::grid_search(
    iai::optimal_feature_selection_classifier(
        random_seed = 1,
    ),
    sparsity = 1:10,
)
iai::fit(grid, train_X, train_y, validation_criterion = "auc")
Julia Object of type GridSearch{OptimalFeatureSelectionClassifier,IAIBase.NullGridResult}.
All Grid Results:

│ Row │ sparsity │ train_score │ valid_score │ rank_valid_score │
│     │ Int64    │ Float64     │ Float64     │ Int64            │
├─────┼──────────┼─────────────┼─────────────┼──────────────────┤
│ 1   │ 1        │ 0.512834    │ 0.888682    │ 10               │
│ 2   │ 2        │ 0.720156    │ 0.969598    │ 9                │
│ 3   │ 3        │ 0.776552    │ 0.982218    │ 8                │
│ 4   │ 4        │ 0.803455    │ 0.985475    │ 7                │
│ 5   │ 5        │ 0.80721     │ 0.988876    │ 5                │
│ 6   │ 6        │ 0.84452     │ 0.98828     │ 6                │
│ 7   │ 7        │ 0.828293    │ 0.989336    │ 4                │
│ 8   │ 8        │ 0.849972    │ 0.992436    │ 3                │
│ 9   │ 9        │ 0.934533    │ 0.999783    │ 1                │
│ 10  │ 10       │ 0.923272    │ 0.9996      │ 2                │

Best Params:
  sparsity => 9

Best Model - Fitted OptimalFeatureSelectionClassifier:
  Constant: 0.133011
  Weights:
    gill_color==b:   1.55261
    gill_size==n:    1.86241
    odor==a:        -3.74467
    odor==f:         2.96065
    odor==l:        -3.75129
    odor==n:        -3.64143
    odor==p:         1.68068
    spore_color==r:  5.93176
    stalk_root==c:   0.167442
  (Higher score indicates stronger prediction for class `p`)

The model selected a sparsity of 9 as the best parameter, but we observe that the validation scores are close for many of the parameters. We can use the results of the grid search to explore the tradeoff between the complexity of the regression and the quality of predictions:

results <- iai::get_grid_result_summary(grid)
plot(results$sparsity, results$valid_score, type = "l", xlab = "Sparsity",
     ylab = "Validation AUC")

We see that the quality of the model quickly increases with as features are adding, reaching AUC 0.98 with 3 features. After this, additional features increase the quality more slowly, eventually reaching AUC close to 1 with 9 features. Depending on the application, we might decide to choose a lower sparsity for the final model than the value chosen by the grid search.

We can see the relative importance of the selected features with variable_importance:

iai::variable_importance(iai::get_learner(grid))
         Feature  Importance
1         odor_n 0.253546903
2         odor_f 0.183224178
3    gill_size_n 0.121381181
4         odor_l 0.115966717
5         odor_a 0.112102542
6   gill_color_b 0.089695729
7  spore_color_r 0.076257534
8         odor_p 0.041899568
9   stalk_root_c 0.005925647
10     bruises_t 0.000000000
11   cap_color_b 0.000000000
12   cap_color_c 0.000000000
13   cap_color_e 0.000000000
14   cap_color_g 0.000000000
15   cap_color_n 0.000000000
16   cap_color_p 0.000000000
17   cap_color_r 0.000000000
18   cap_color_u 0.000000000
19   cap_color_w 0.000000000
20   cap_color_y 0.000000000
21   cap_shape_b 0.000000000
22   cap_shape_c 0.000000000
23   cap_shape_f 0.000000000
24   cap_shape_k 0.000000000
25   cap_shape_s 0.000000000
26   cap_shape_x 0.000000000
27 cap_surface_f 0.000000000
28 cap_surface_g 0.000000000
29 cap_surface_s 0.000000000
30 cap_surface_y 0.000000000
 [ reached 'max' / getOption("max.print") -- omitted 82 rows ]

We can make predictions on new data using predict:

iai::predict(grid, test_X)
 [1] "e" "e" "e" "e" "e" "e" "e" "p" "e" "e" "p" "e" "e" "e" "e" "e" "e" "e" "e"
[20] "e" "e" "p" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e"
[39] "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e" "e"
[58] "e" "e" "e"
 [ reached getOption("max.print") -- omitted 2377 entries ]

We can evaluate the quality of the model using score with any of the supported loss functions. For example, the misclassification on the training set:

iai::score(grid, train_X, train_y, criterion = "misclassification")
[1] 0.9949007

Or the AUC on the test set:

iai::score(grid, test_X, test_y, criterion = "auc")
[1] 0.9997498