Quick Start Guide: Optimal Policy Trees with Numeric Treatment
In this guide we will give a demonstration of how to use Optimal Policy Trees with numeric treatment options. For this example, we will use the auto-mpg dataset, where the task is usually to predict a car's fuel efficiency (in miles-per-gallon, or MPG) based on other characteristics of the car. To apply a prescriptive lens to this case, we will instead treat the amount of acceleration as a treatment that can be controlled, and try to find the value that optimizes the MPG for a given car.
Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process. For a real-world case study using similar techniques, see the grocery pricing case study.
First we load in the data and drop 6 rows with missing values:
using CSV, DataFrames
df = CSV.read("auto-mpg.csv", DataFrame, missingstring="?")
dropmissing!(df)
392×9 DataFrame
Row │ mpg cylinders displacement horsepower weight acceleration mod ⋯
│ Float64 Int64 Float64 Int64 Int64 Float64 Int ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ 18.0 8 307.0 130 3504 12.0 ⋯
2 │ 15.0 8 350.0 165 3693 11.5
3 │ 18.0 8 318.0 150 3436 11.0
4 │ 16.0 8 304.0 150 3433 12.0
5 │ 17.0 8 302.0 140 3449 10.5 ⋯
6 │ 15.0 8 429.0 198 4341 10.0
7 │ 14.0 8 454.0 220 4354 9.0
8 │ 14.0 8 440.0 215 4312 8.5
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
386 │ 36.0 4 135.0 84 2370 13.0 ⋯
387 │ 27.0 4 151.0 90 2950 17.3
388 │ 27.0 4 140.0 86 2790 15.6
389 │ 44.0 4 97.0 52 2130 24.6
390 │ 32.0 4 135.0 84 2295 11.6 ⋯
391 │ 28.0 4 120.0 79 2625 18.6
392 │ 31.0 4 119.0 82 2720 19.4
3 columns and 377 rows omitted
Policy trees are trained using a features matrix/dataframe X
as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.
There are two ways to get this rewards matrix:
- in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
- more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment
Refer to the documentation on data preparation for more information on the data format.
In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.
Reward Estimation
Please refer to the Reward Estimation documentation for a detailed description on how to perform reward estimation for numeric treatments properly. For simplicity and to keep the focus on Optimal Policy Trees, this quick start guide does not cover tuning the reward estimation parameters, but in real problems this tuning is an important step.
First, we split into training and testing:
X = select(df, Not(["mpg", "acceleration", "car name"]))
treatments = df.acceleration
outcomes = df.mpg
(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) =
IAI.split_data(:policy_maximize, X, treatments, outcomes, seed=123, train_proportion=0.5)
Note that we have used a training/test split of 60%/40%, so that we save more data for testing to ensure high-quality reward estimation on the test set.
The treatment, acceleration, is a numeric value, so we follow the process for estimating rewards with numeric treatments. We will consider prescribing acceleration values from 11 to 20, in steps of 3:
treatment_candidates = 11:3:20
The outcome is continuous, so we use a NumericRegressionRewardEstimator
to estimate the MPG under our candidate acceleration values using a Random Forest model:
reward_lnr = IAI.NumericRegressionRewardEstimator(
propensity_estimator=IAI.RandomForestRegressor(),
outcome_estimator=IAI.RandomForestRegressor(),
reward_estimator=:doubly_robust,
propensity_min_value=0.1,
random_seed=1,
)
train_predictions, train_reward_score = IAI.fit_predict!(
reward_lnr, train_X, train_treatments, train_outcomes, treatment_candidates)
train_rewards = train_predictions[:reward]
196×4 DataFrame
Row │ 11 14 17 20
│ Float64 Float64 Float64 Float64
─────┼────────────────────────────────────
1 │ 15.5192 14.3211 15.9095 15.7161
2 │ 17.9843 17.0684 16.1826 16.6844
3 │ 15.1993 11.2804 14.1719 15.8209
4 │ 13.6143 11.0216 14.1719 15.8209
5 │ 14.0478 11.2476 14.5093 16.1168
6 │ 15.0572 15.5953 16.1991 16.6844
7 │ 15.0314 13.373 14.793 16.0183
8 │ 13.3217 15.0754 18.3585 18.2202
⋮ │ ⋮ ⋮ ⋮ ⋮
190 │ 21.6221 23.0741 22.5122 26.9959
191 │ 26.0113 36.0119 34.4337 30.4533
192 │ 26.4589 34.6858 42.8009 29.7727
193 │ 26.3356 37.3628 32.1039 30.9279
194 │ 26.4589 39.36 34.6957 31.9937
195 │ 19.6138 18.9977 35.9689 19.9677
196 │ 24.0463 26.3079 26.1804 39.972
181 rows omitted
train_reward_score[:propensity]
Dict{String, Float64} with 4 entries:
"17" => 0.120533
"20" => 0.200617
"11" => 0.287145
"14" => 0.165352
train_reward_score[:outcome]
Dict{String, Float64} with 4 entries:
"17" => 0.864211
"20" => 0.250276
"11" => 0.653976
"14" => 0.833718
We can see that the R2 of the internal outcome models for each candidate treatment are between 0.25 and 0.86, whereas the propensity models have R2 between 0.12 and 0.29. The non-zero propensity scores tell us that there is a small amount of treatment assignment bias in this data (keeping in mind that we arbitrarily selected a feature to use as the treatment column). It seems we can predict the outcomes reasonably well, so we have a good base for the reward estimates.
Optimal Policy Trees
Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes MPG. We will use a GridSearch
to fit an OptimalTreePolicyMaximizer
(note that if we were trying to minimize the outcomes, we would use OptimalTreePolicyMinimizer
):
grid = IAI.GridSearch(
IAI.OptimalTreePolicyMaximizer(
random_seed=1,
minbucket=15,
),
max_depth=4:5,
)
IAI.fit!(grid, train_X, train_rewards)
The resulting tree recommends different accelerations based on the characteristics of the car. The intensity of the color in each leaf shows the difference in quality between the best and second-best acceleration values.
We can see that a variety of our candidate acceleration values are prescribed by the tree. For instance, older cars with higher horsepower receive the highest acceleration.
We can make treatment prescriptions using predict
:
prescriptions = IAI.predict(grid, train_X)
196-element Vector{String}:
"20"
"20"
"20"
"20"
"20"
"20"
"20"
"20"
"20"
"20"
⋮
"14"
"14"
"20"
"14"
"14"
"14"
"14"
"14"
"20"
The prescriptions are always returned as strings matching the column names of the input rewards matrix. In our case the treatments are numeric values, and if we want them in numeric form to use later we can convert them to numeric treatments using convert_treatments_to_numeric
:
IAI.convert_treatments_to_numeric(prescriptions)
196-element Vector{Int64}:
20
20
20
20
20
20
20
20
20
20
⋮
14
14
20
14
14
14
14
14
20
If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with predict_treatment_rank
:
rank = IAI.predict_treatment_rank(grid, train_X)
196×4 Matrix{String}:
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
⋮
"14" "17" "20" "11"
"14" "17" "20" "11"
"20" "17" "11" "14"
"14" "17" "20" "11"
"14" "17" "20" "11"
"14" "17" "20" "11"
"14" "17" "20" "11"
"14" "11" "17" "20"
"20" "17" "11" "14"
For each point in the data, this gives the treatments in order of effectiveness. As before, this are returned as strings, but we can convert the treatments to numeric values with convert_treatments_to_numeric
:
IAI.convert_treatments_to_numeric(rank)
196×4 Matrix{Int64}:
20 11 17 14
20 11 17 14
20 11 17 14
20 11 17 14
20 11 17 14
20 11 17 14
20 11 17 14
20 11 17 14
20 11 17 14
20 11 17 14
⋮
14 17 20 11
14 17 20 11
20 17 11 14
14 17 20 11
14 17 20 11
14 17 20 11
14 17 20 11
14 11 17 20
20 17 11 14
To quantify the difference in performance behind the treatment rankings, we can use predict_treatment_outcome
to extract the estimated quality of each treatment for each point:
IAI.predict_treatment_outcome(grid, train_X)
196×4 DataFrame
Row │ 11 14 17 20
│ Float64 Float64 Float64 Float64
─────┼────────────────────────────────────
1 │ 17.021 14.7505 16.9222 18.2866
2 │ 17.021 14.7505 16.9222 18.2866
3 │ 17.021 14.7505 16.9222 18.2866
4 │ 17.021 14.7505 16.9222 18.2866
5 │ 17.021 14.7505 16.9222 18.2866
6 │ 17.021 14.7505 16.9222 18.2866
7 │ 17.021 14.7505 16.9222 18.2866
8 │ 17.021 14.7505 16.9222 18.2866
⋮ │ ⋮ ⋮ ⋮ ⋮
190 │ 22.1578 21.3878 22.3794 27.5812
191 │ 26.3143 31.3847 29.363 28.3271
192 │ 26.3143 31.3847 29.363 28.3271
193 │ 26.3143 31.3847 29.363 28.3271
194 │ 26.3143 31.3847 29.363 28.3271
195 │ 18.6563 19.3162 18.4529 17.3021
196 │ 22.1578 21.3878 22.3794 27.5812
181 rows omitted
We can also extract the standard errors of the outcome estimates with predict_treatment_outcome_standard_error
:
IAI.predict_treatment_outcome_standard_error(grid, train_X)
196×4 DataFrame
Row │ 11 14 17 20
│ Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────
1 │ 0.80353 1.06319 0.770092 0.710896
2 │ 0.80353 1.06319 0.770092 0.710896
3 │ 0.80353 1.06319 0.770092 0.710896
4 │ 0.80353 1.06319 0.770092 0.710896
5 │ 0.80353 1.06319 0.770092 0.710896
6 │ 0.80353 1.06319 0.770092 0.710896
7 │ 0.80353 1.06319 0.770092 0.710896
8 │ 0.80353 1.06319 0.770092 0.710896
⋮ │ ⋮ ⋮ ⋮ ⋮
190 │ 0.513389 0.882271 1.02996 1.77642
191 │ 0.948884 1.26322 0.739374 0.448817
192 │ 0.948884 1.26322 0.739374 0.448817
193 │ 0.948884 1.26322 0.739374 0.448817
194 │ 0.948884 1.26322 0.739374 0.448817
195 │ 0.23138 0.407594 0.625738 0.466373
196 │ 0.513389 0.882271 1.02996 1.77642
181 rows omitted
These standard errors can be combined with the outcome estimates to construct confidence intervals in the usual way.
Evaluating Optimal Policy Trees
It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.
Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:
test_predictions, test_reward_score = IAI.fit_predict!(
reward_lnr, test_X, test_treatments, test_outcomes, treatment_candidates)
test_rewards = test_predictions[:reward]
196×4 DataFrame
Row │ 11 14 17 20
│ Float64 Float64 Float64 Float64
─────┼────────────────────────────────────
1 │ 18.8265 17.9006 15.2741 16.5961
2 │ 18.5837 17.3373 15.3651 16.8543
3 │ 16.8317 17.7591 15.4516 17.111
4 │ 14.8371 12.4136 14.3317 16.1539
5 │ 15.8857 13.5001 14.0596 16.2479
6 │ 14.9192 16.8369 14.9 16.2122
7 │ 28.6542 23.7261 23.4555 23.7447
8 │ 28.7612 26.9357 25.2422 24.9468
⋮ │ ⋮ ⋮ ⋮ ⋮
190 │ 30.2826 38.5807 28.1592 27.4065
191 │ 27.5855 46.7469 31.719 25.8422
192 │ 28.4471 27.5669 26.3109 25.6221
193 │ 27.1386 22.2712 26.9411 27.5965
194 │ 30.6563 35.0577 35.2216 30.77
195 │ 70.4636 32.0715 31.719 25.7882
196 │ 29.963 31.6384 25.1198 30.191
181 rows omitted
test_reward_score[:propensity]
Dict{String, Float64} with 4 entries:
"17" => 0.158079
"20" => 0.267284
"11" => 0.393047
"14" => 0.217413
test_reward_score[:outcome]
Dict{String, Float64} with 4 entries:
"17" => 0.705786
"20" => 0.770381
"11" => 0.470776
"14" => 0.886658
We see the internal models of our test reward estimator have scores that are similar to those on the test set. As with the training set, this gives us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation.
We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted MPG under the treatments prescribed by the tree for the test set. To do this, we use predict_outcomes
which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:
policy_outcomes = IAI.predict_outcomes(grid, test_X, test_rewards)
196-element Vector{Float64}:
16.59605211
16.85433253
17.11098041
16.15388911
16.24788885
16.21219682
23.74468928
25.2421742
30.65150663
24.82638225
⋮
19.79697195
24.61787132
27.40654131
46.74687468
25.62213452
27.59645482
35.05765523
32.07152412
30.19096294
We can then get the average estimated MPG under our treatments:
using Statistics
mean(policy_outcomes)
23.93172935
We can compare this number to the average estimated MPG under a baseline policy that assigns the same treatment to all observations regardless of their features. Let us compare to a policy that assigns 17 everywhere:
mean(test_rewards[:, "17"])
23.32036357
We can see that the personalization offered by the tree policy indeed improves upon a baseline where each observation receives the same treatment.