Quick Start Guide: Optimal Feature Selection for Regression

This is a Python version of the corresponding OptimalFeatureSelection quick start guide.

In this example we will use Optimal Feature Selection on the Ailerons dataset, which addresses a control problem, namely flying a F16 aircraft. The attributes describe the status of the aeroplane, while the goal is to predict the control action on the ailerons of the aircraft.

First we load in the data and split into training and test datasets:

import pandas as pd
df = pd.read_csv("ailerons.csv")
       climbRate  Sgz     p     q  ...  diffSeTime14  alpha     Se    goal
0              2  -56 -0.33 -0.09  ...           0.0    0.9  0.032 -0.0009
1            470  -39  0.02  0.12  ...           0.0    0.9  0.034 -0.0011
2            165    4  0.14  0.14  ...           0.0    1.0  0.034 -0.0012
3           -113    5 -0.12  0.11  ...           0.0    0.9  0.033 -0.0011
4           -411  -21 -0.17  0.07  ...           0.0    0.9  0.032 -0.0008
5           -105  -42  0.23 -0.06  ...           0.0    0.8  0.028 -0.0010
6            144  -40  0.31 -0.01  ...           0.0    0.8  0.029 -0.0012
...          ...  ...   ...   ...  ...           ...    ...    ...     ...
13743       -224  -24 -0.22  0.00  ...           0.0    0.7  0.026 -0.0007
13744       -204  -27 -0.25  0.01  ...           0.0    0.7  0.026 -0.0006
13745        399  -22  0.17  0.20  ...           0.0    0.8  0.027 -0.0008
13746        237   -6  0.26  0.10  ...           0.0    0.8  0.027 -0.0010
13747       -148   -3 -0.37  0.09  ...           0.0    0.7  0.026 -0.0006
13748       -237  -11 -0.47 -0.16  ...           0.0    0.7  0.023 -0.0005
13749        128  -14 -0.07 -0.11  ...           0.0    0.6  0.022 -0.0006

[13750 rows x 41 columns]
from interpretableai import iai
X = df.iloc[:, 0:-1]
y = df.iloc[:, -1]
(train_X, train_y), (test_X, test_y) = iai.split_data('regression', X, y, seed=1)

Model Fitting

We will use a GridSearch to fit an OptimalFeatureSelectionRegressor:

grid = iai.GridSearch(
    iai.OptimalFeatureSelectionRegressor(
        random_seed=1,
    ),
    sparsity=range(1, 11),
)
grid.fit(train_X, train_y)
All Grid Results:

│ Row │ sparsity │ train_score │ valid_score │ rank_valid_score │
│     │ Int64    │ Float64     │ Float64     │ Int64            │
├─────┼──────────┼─────────────┼─────────────┼──────────────────┤
│ 1   │ 1        │ 0.502496    │ 0.469551    │ 10               │
│ 2   │ 2        │ 0.664859    │ 0.661475    │ 9                │
│ 3   │ 3        │ 0.75009     │ 0.746062    │ 8                │
│ 4   │ 4        │ 0.808994    │ 0.800123    │ 7                │
│ 5   │ 5        │ 0.814076    │ 0.803629    │ 6                │
│ 6   │ 6        │ 0.816877    │ 0.807073    │ 5                │
│ 7   │ 7        │ 0.819178    │ 0.809386    │ 3                │
│ 8   │ 8        │ 0.819249    │ 0.809528    │ 2                │
│ 9   │ 9        │ 0.819444    │ 0.809719    │ 1                │
│ 10  │ 10       │ 0.818245    │ 0.808777    │ 4                │

Best Params:
  sparsity => 9

Best Model - Fitted OptimalFeatureSelectionRegressor:
  Constant: 0.000340054
  Weights:
    SeTime6:      -0.00762837
    SeTime7:      -0.00760919
    SeTime8:      -0.00533595
    SeTime9:      -0.00531819
    absRoll:       0.0000577878
    curRoll:      -0.0000863373
    diffClb:      -0.00000357877
    diffRollRate:  0.00253459
    p:            -0.000428755

The model selected a sparsity of 9 as the best parameter, but we observe that the validation scores are close for many of the parameters. We can use the results of the grid search to explore the tradeoff between the complexity of the regression and the quality of predictions:

results = grid.get_grid_result_summary()
ax = results.plot(x='sparsity', y='valid_score', legend=False)
ax.set_xlabel('Sparsity')
ax.set_ylabel('Validation R-Squared')

We see that the quality of the model quickly increases with additional terms until we reach 4, and then only small increases afterwards. Depending on the application, we might decide to choose a lower sparsity for the final model than the value chosen by the grid search.

We can see the relative importance of the selected features with variable_importance:

grid.get_learner().variable_importance()
        Feature  Importance
0       absRoll    0.339804
1             p    0.184686
2       curRoll    0.119288
3       SeTime6    0.075255
4       SeTime7    0.075074
5       SeTime8    0.052706
6       diffClb    0.052629
..          ...         ...
33  diffSeTime4    0.000000
34  diffSeTime5    0.000000
35  diffSeTime6    0.000000
36  diffSeTime7    0.000000
37  diffSeTime8    0.000000
38  diffSeTime9    0.000000
39            q    0.000000

[40 rows x 2 columns]

We can make predictions on new data using predict:

grid.predict(test_X)
array([-0.00103273, -0.00124588, -0.00120397, ..., -0.00114447,
       -0.00082961, -0.00089865])

We can evaluate the quality of the model using score with any of the supported loss functions. For example, the $R^2$ on the training set:

grid.score(train_X, train_y, criterion='mse')
0.816274630679

Or on the test set:

grid.score(test_X, test_y, criterion='mse')
0.820105222067