Quick Start Guide: Heuristic Regressors
In this example we will use regressors from Heuristics on the yacht hydrodynamics dataset. First we load in the data and split into training and test datasets:
using CSV, DataFrames
df = DataFrame(CSV.File(
"yacht_hydrodynamics.data",
delim=' ', # file uses ' ' as separators rather than ','
ignorerepeated=true, # sometimes columns are separated by more than one ' '
header=[:position, :prismatic, :length_displacement, :beam_draught,
:length_beam, :froude, :resistance],
))
308×7 DataFrame
Row │ position prismatic length_displacement beam_draught length_beam fr ⋯
│ Float64 Float64 Float64 Float64 Float64 Fl ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ -2.3 0.568 4.78 3.99 3.17 ⋯
2 │ -2.3 0.568 4.78 3.99 3.17
3 │ -2.3 0.568 4.78 3.99 3.17
4 │ -2.3 0.568 4.78 3.99 3.17
5 │ -2.3 0.568 4.78 3.99 3.17 ⋯
6 │ -2.3 0.568 4.78 3.99 3.17
7 │ -2.3 0.568 4.78 3.99 3.17
8 │ -2.3 0.568 4.78 3.99 3.17
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
302 │ -2.3 0.6 4.34 4.23 2.73 ⋯
303 │ -2.3 0.6 4.34 4.23 2.73
304 │ -2.3 0.6 4.34 4.23 2.73
305 │ -2.3 0.6 4.34 4.23 2.73
306 │ -2.3 0.6 4.34 4.23 2.73 ⋯
307 │ -2.3 0.6 4.34 4.23 2.73
308 │ -2.3 0.6 4.34 4.23 2.73
2 columns and 293 rows omitted
X = df[:, 1:(end - 1)]
y = df[:, end]
(train_X, train_y), (test_X, test_y) = IAI.split_data(:regression, X, y, seed=1)
Random Forest Regressor
We will use a GridSearch
to fit a RandomForestRegressor
with some basic parameter validation:
grid = IAI.GridSearch(
IAI.RandomForestRegressor(
random_seed=1,
),
max_depth=5:10,
)
IAI.fit!(grid, train_X, train_y)
All Grid Results:
Row │ max_depth train_score valid_score rank_valid_score
│ Int64 Float64 Float64 Int64
─────┼───────────────────────────────────────────────────────
1 │ 5 0.998792 0.995212 6
2 │ 6 0.999109 0.995511 5
3 │ 7 0.999189 0.995521 2
4 │ 8 0.999205 0.995522 1
5 │ 9 0.999207 0.995519 3
6 │ 10 0.999207 0.995519 4
Best Params:
max_depth => 8
Best Model - Fitted RandomForestRegressor
We can make predictions on new data using predict
:
IAI.predict(grid, test_X)
92-element Array{Float64,1}:
0.096919986981
0.283672772148
1.2729423224
2.773836659452
5.083565
12.912132857143
21.031601666667
0.097826653648
0.281992851513
0.49682970807
⋮
2.874457215007
7.892077467532
12.821452857143
1.272315543901
1.934863766511
2.995196976912
12.93131952381
33.007085
50.495606666667
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the $R^2$ on the training set:
IAI.score(grid, train_X, train_y, criterion=:mse)
0.9993065912313013
Or on the test set:
IAI.score(grid, test_X, test_y, criterion=:mse)
0.9937779440350902
We can also look at the variable importance:
IAI.variable_importance(IAI.get_learner(grid))
6×2 DataFrame
Row │ Feature Importance
│ Symbol Float64
─────┼──────────────────────────────────
1 │ froude 0.990682
2 │ prismatic 0.00404472
3 │ beam_draught 0.00243067
4 │ position 0.0014151
5 │ length_displacement 0.00122642
6 │ length_beam 0.000201347
XGBoost Regressor
We will use a GridSearch
to fit a XGBoostRegressor
with some basic parameter validation:
grid = IAI.GridSearch(
IAI.XGBoostRegressor(
random_seed=1,
),
max_depth=2:5,
num_round=[20, 50, 100],
)
IAI.fit!(grid, train_X, train_y)
All Grid Results:
Row │ num_round max_depth train_score valid_score rank_valid_score
│ Int64 Int64 Float64 Float64 Int64
─────┼──────────────────────────────────────────────────────────────────
1 │ 20 2 0.997817 0.995225 6
2 │ 20 3 0.999371 0.9953 5
3 │ 20 4 0.999748 0.992537 7
4 │ 20 5 0.999816 0.992295 8
5 │ 50 2 0.999118 0.99551 4
6 │ 50 3 0.999904 0.995645 3
7 │ 50 4 0.999953 0.991627 9
8 │ 50 5 0.999966 0.990115 11
9 │ 100 2 0.999632 0.9962 1
10 │ 100 3 0.999904 0.995646 2
11 │ 100 4 0.999953 0.991627 10
12 │ 100 5 0.999966 0.990115 12
Best Params:
num_round => 100
max_depth => 2
Best Model - Fitted XGBoostRegressor
We can make predictions on new data using predict
:
IAI.predict(grid, test_X)
92-element Array{Float64,1}:
0.23346328735351562
0.3780546188354492
1.265873908996582
2.8110404014587402
5.415190696716309
12.598990440368652
20.525022506713867
0.2109689712524414
0.25713634490966797
0.42427921295166016
⋮
2.819088935852051
7.738818168640137
12.533449172973633
1.4572010040283203
2.0904359817504883
3.0023674964904785
11.986350059509277
33.26702117919922
49.76500701904297
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the $R^2$ on the training set:
IAI.score(grid, train_X, train_y, criterion=:mse)
0.9995068453374839
Or on the test set:
IAI.score(grid, test_X, test_y, criterion=:mse)
0.9973451304965533
We can also look at the variable importance:
IAI.variable_importance(IAI.get_learner(grid))
6×2 DataFrame
Row │ Feature Importance
│ Symbol Float64
─────┼──────────────────────────────────
1 │ froude 0.993665
2 │ prismatic 0.00254683
3 │ beam_draught 0.00137171
4 │ length_displacement 0.00115397
5 │ position 0.000648411
6 │ length_beam 0.000614171
GLMNet Regressor
We can use a GLMNetCVRegressor
to fit a GLMNet model using cross-validation:
lnr = IAI.GLMNetCVRegressor(
random_seed=1,
nfolds=10,
)
IAI.fit!(lnr, train_X, train_y)
Fitted GLMNetCVRegressor:
Constant: -22.0757
Weights:
froude: 113.256
We can access the coefficients from the fitted model with get_prediction_weights
and get_prediction_constant
:
numeric_weights, categoric_weights = IAI.get_prediction_weights(lnr)
numeric_weights
Dict{Symbol,Float64} with 1 entry:
:froude => 113.256
categoric_weights
Dict{Symbol,Dict{Any,Float64}}()
IAI.get_prediction_constant(lnr)
-22.07569550760808
We can make predictions on new data using predict
:
IAI.predict(lnr, test_X)
92-element Array{Float64,1}:
-7.9186331246794115
-5.087220648093677
3.4070167816635255
9.069841734834995
14.73266668800646
20.395491641177927
23.226904117763667
-7.9186331246794115
-5.087220648093677
-2.255808171507944
⋮
9.069841734834995
17.564079164592194
20.395491641177927
3.4070167816635255
6.238429258249258
9.069841734834995
20.395491641177927
26.058316594349392
28.889729070935132
We can evaluate the quality of the model using score
with any of the supported loss functions. For example, the $R^2$ on the training set:
IAI.score(lnr, train_X, train_y, criterion=:mse)
0.6541519917396235
Or on the test set:
IAI.score(lnr, test_X, test_y, criterion=:mse)
0.6504195810342512
We can also look at the variable importance:
IAI.variable_importance(lnr)
6×2 DataFrame
Row │ Feature Importance
│ Symbol Float64
─────┼─────────────────────────────────
1 │ froude 1.0
2 │ beam_draught 0.0
3 │ length_beam 0.0
4 │ length_displacement 0.0
5 │ position 0.0
6 │ prismatic 0.0