Quick Start Guide: Optimal Feature Selection for Regression

This is an R version of the corresponding OptimalFeatureSelection quick start guide.

In this example we will use Optimal Feature Selection on the Ailerons dataset, which addresses a control problem, namely flying a F16 aircraft. The attributes describe the status of the aeroplane, while the goal is to predict the control action on the ailerons of the aircraft.

First we load in the data and split into training and test datasets:

df <- read.table("ailerons.csv", header = TRUE, sep = ",")

  climbRate Sgz     p     q curPitch curRoll absRoll diffClb diffRollRate
1         2 -56 -0.33 -0.09      0.9     0.2     -11      12        0.004
  diffDiffClb SeTime1 SeTime2 SeTime3 SeTime4 SeTime5 SeTime6 SeTime7 SeTime8
1        -0.1   0.032   0.032   0.032   0.032   0.032   0.032   0.032   0.032
  SeTime9 SeTime10 SeTime11 SeTime12 SeTime13 SeTime14 diffSeTime1 diffSeTime2
1   0.032    0.032    0.032    0.032    0.032    0.032           0           0
  diffSeTime3 diffSeTime4 diffSeTime5 diffSeTime6 diffSeTime7 diffSeTime8
1           0           0           0           0           0           0
  diffSeTime9 diffSeTime10 diffSeTime11 diffSeTime12 diffSeTime13 diffSeTime14
1           0            0            0            0            0            0
  alpha    Se   goal
1   0.9 0.032 -9e-04
 [ reached 'max' / getOption("max.print") -- omitted 13749 rows ]

X <- df[, 1:40]
y <- df[, 41]
split <- iai::split_data("regression", X, y, seed = 1)
train_X <- split$train$X
train_y <- split$train$y
test_X <- split$test$X
test_y <- split$test$y

Model Fitting

We will use a grid_search to fit an optimal_feature_selection_regressor:

grid <- iai::grid_search(
    iai::optimal_feature_selection_regressor(
        random_seed = 1,
    ),
    sparsity = 1:10,
)
iai::fit(grid, train_X, train_y)

Julia Object of type GridSearch{OptimalFeatureSelectionRegressor,IAIBase.NullGridResult}.
All Grid Results:

│ Row │ sparsity │ train_score │ valid_score │ rank_valid_score │
│     │ Int64    │ Float64     │ Float64     │ Int64            │
├─────┼──────────┼─────────────┼─────────────┼──────────────────┤
│ 1   │ 1        │ 0.502496    │ 0.469551    │ 10               │
│ 2   │ 2        │ 0.664859    │ 0.661475    │ 9                │
│ 3   │ 3        │ 0.75009     │ 0.746062    │ 8                │
│ 4   │ 4        │ 0.808994    │ 0.800123    │ 7                │
│ 5   │ 5        │ 0.814076    │ 0.803629    │ 6                │
│ 6   │ 6        │ 0.816877    │ 0.807073    │ 5                │
│ 7   │ 7        │ 0.819178    │ 0.809386    │ 3                │
│ 8   │ 8        │ 0.819249    │ 0.809528    │ 2                │
│ 9   │ 9        │ 0.819444    │ 0.809719    │ 1                │
│ 10  │ 10       │ 0.818245    │ 0.808777    │ 4                │

Best Params:
  sparsity => 9

Best Model - Fitted OptimalFeatureSelectionRegressor:
  Constant: 0.000340054
  Weights:
    SeTime6:      -0.00762837
    SeTime7:      -0.00760919
    SeTime8:      -0.00533595
    SeTime9:      -0.00531819
    absRoll:       0.0000577878
    curRoll:      -0.0000863373
    diffClb:      -0.00000357877
    diffRollRate:  0.00253459
    p:            -0.000428755

The model selected a sparsity of 9 as the best parameter, but we observe that the validation scores are close for many of the parameters. We can use the results of the grid search to explore the tradeoff between the complexity of the regression and the quality of predictions:

results <- iai::get_grid_result_summary(grid)
plot(results$sparsity, results$valid_score, type = "l", xlab = "Sparsity",
     ylab = "Validation R-Squared")

We see that the quality of the model quickly increases with additional terms until we reach 4, and then only small increases afterwards. Depending on the application, we might decide to choose a lower sparsity for the final model than the value chosen by the grid search.

We can see the relative importance of the selected features with variable_importance:

iai::variable_importance(iai::get_learner(grid))

        Feature Importance
1       absRoll 0.33980375
2             p 0.18468617
3       curRoll 0.11928783
4       SeTime6 0.07525518
5       SeTime7 0.07507436
6       SeTime8 0.05270638
7       diffClb 0.05262911
8       SeTime9 0.05252926
9  diffRollRate 0.04802796
10           Se 0.00000000
11      SeTime1 0.00000000
12     SeTime10 0.00000000
13     SeTime11 0.00000000
14     SeTime12 0.00000000
15     SeTime13 0.00000000
16     SeTime14 0.00000000
17      SeTime2 0.00000000
18      SeTime3 0.00000000
19      SeTime4 0.00000000
20      SeTime5 0.00000000
21          Sgz 0.00000000
22        alpha 0.00000000
23    climbRate 0.00000000
24     curPitch 0.00000000
25  diffDiffClb 0.00000000
26  diffSeTime1 0.00000000
27 diffSeTime10 0.00000000
28 diffSeTime11 0.00000000
29 diffSeTime12 0.00000000
30 diffSeTime13 0.00000000
 [ reached 'max' / getOption("max.print") -- omitted 10 rows ]

We can make predictions on new data using predict:

iai::predict(grid, test_X)

 [1] -0.0010327312 -0.0012458850 -0.0012039664 -0.0008067696 -0.0011597699
 [6] -0.0009569567 -0.0008189253 -0.0007689838 -0.0009443703 -0.0008316866
[11] -0.0007186833 -0.0009064927 -0.0010340685 -0.0009037468 -0.0008267376
[16] -0.0006700133 -0.0005571744 -0.0006125905 -0.0009043593 -0.0008719884
[21] -0.0005976167 -0.0009425322 -0.0008874047 -0.0005906548 -0.0008569110
[26] -0.0008991132 -0.0008999666 -0.0006735482 -0.0008148005 -0.0006011243
[31] -0.0007952192 -0.0008250282 -0.0008411108 -0.0006957652 -0.0008232747
[36] -0.0005863755 -0.0007643248 -0.0009687375 -0.0008418961 -0.0009063566
[41] -0.0009060343 -0.0005859594 -0.0007851755 -0.0006899258 -0.0007124347
[46] -0.0008846562 -0.0005256708 -0.0009750276 -0.0009102622 -0.0009793354
[51] -0.0010605685 -0.0008787848 -0.0006554579 -0.0007240455 -0.0010050696
[56] -0.0009717790 -0.0007063349 -0.0008699437 -0.0008648700 -0.0009806303
 [ reached getOption("max.print") -- omitted 4065 entries ]

We can evaluate the quality of the model using score with any of the supported loss functions. For example, the $R^2$ on the training set:

iai::score(grid, train_X, train_y, criterion = "mse")

[1] 0.8162746

Or on the test set:

iai::score(grid, test_X, test_y, criterion = "mse")

[1] 0.8201052