Quick Start Guide: Optimal Policy Trees with Numeric Treatment
In this guide we will give a demonstration of how to use Optimal Policy Trees with numeric treatment options. For this example, we will use the auto-mpg dataset, where the task is usually to predict a car's fuel efficiency (in miles-per-gallon, or MPG) based on other characteristics of the car. To apply a prescriptive lens to this case, we will instead treat the amount of acceleration as a treatment that can be controlled, and try to find the value that optimizes the MPG for a given car.
Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process. For a real-world case study using similar techniques, see the grocery pricing case study.
First we load in the data and drop 6 rows with missing values:
using CSV, DataFrames
df = CSV.read("auto-mpg.csv", DataFrame, missingstring="?")
dropmissing!(df)
392×9 DataFrame
Row │ mpg cylinders displacement horsepower weight acceleration mod ⋯
│ Float64 Int64 Float64 Int64 Int64 Float64 Int ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ 18.0 8 307.0 130 3504 12.0 ⋯
2 │ 15.0 8 350.0 165 3693 11.5
3 │ 18.0 8 318.0 150 3436 11.0
4 │ 16.0 8 304.0 150 3433 12.0
5 │ 17.0 8 302.0 140 3449 10.5 ⋯
6 │ 15.0 8 429.0 198 4341 10.0
7 │ 14.0 8 454.0 220 4354 9.0
8 │ 14.0 8 440.0 215 4312 8.5
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
386 │ 36.0 4 135.0 84 2370 13.0 ⋯
387 │ 27.0 4 151.0 90 2950 17.3
388 │ 27.0 4 140.0 86 2790 15.6
389 │ 44.0 4 97.0 52 2130 24.6
390 │ 32.0 4 135.0 84 2295 11.6 ⋯
391 │ 28.0 4 120.0 79 2625 18.6
392 │ 31.0 4 119.0 82 2720 19.4
3 columns and 377 rows omitted
Policy trees are trained using a features matrix/dataframe X
as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.
There are two ways to get this rewards matrix:
- in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
- more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment
Refer to the documentation on data preparation for more information on the data format.
In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.
Reward Estimation
First, we split into training and testing:
X = select(df, Not(["mpg", "acceleration", "car name"]))
treatments = df.acceleration
outcomes = df.mpg
(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) =
IAI.split_data(:policy_maximize, X, treatments, outcomes, seed=1, train_proportion=0.6)
Note that we have used a training/test split of 60%/40% split, so that we save more data for testing to ensure high-quality reward estimation on the test set.
The treatment, acceleration, is a numeric value, so we follow the process for estimating rewards with numeric treatments. We will consider prescribing acceleration values from 8 to 23, in steps of 3:
treatment_candidates = 8:3:23
We use a NumericRewardEstimator
to estimate the MPG under our candidate acceleration values using an XGBoost model:
reward_lnr = IAI.NumericRewardEstimator(
outcome_estimator=IAI.XGBoostRegressor(),
random_seed=12345,
)
train_rewards, train_reward_score = IAI.fit_predict!(
reward_lnr, train_X, train_treatments, train_outcomes, treatment_candidates)
train_rewards
235×6 DataFrame
Row │ 8 11 14 17 20 23
│ Float64 Float64 Float64 Float64 Float64 Float64
─────┼──────────────────────────────────────────────────────
1 │ 17.5649 17.9279 16.553 16.0428 16.1734 16.2504
2 │ 16.2367 15.7576 14.8192 14.8333 14.4565 14.4565
3 │ 18.7436 19.1067 17.3859 16.6543 16.7998 16.8768
4 │ 14.8891 13.05 12.356 12.3016 12.3232 12.3077
5 │ 14.4061 13.2146 12.9017 12.691 12.9269 12.9395
6 │ 14.8891 13.05 12.356 12.3016 12.3232 12.3077
7 │ 13.1329 11.9413 11.6595 11.4393 11.6752 11.6878
8 │ 14.8969 13.133 12.7138 12.5202 12.6249 12.5813
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
229 │ 34.7262 34.7262 34.8821 31.2253 35.1343 35.1532
230 │ 21.9048 21.9048 22.2574 21.6184 21.2935 21.1384
231 │ 21.8871 21.8871 22.2052 21.4101 21.4086 21.2832
232 │ 27.8975 27.8975 29.6809 28.4205 27.6739 27.5188
233 │ 30.0696 30.0696 30.1089 29.2705 30.9828 30.957
234 │ 26.1627 26.1627 26.3938 25.314 24.8147 24.8284
235 │ 28.9793 28.9793 29.2104 28.1389 27.8686 27.8822
220 rows omitted
train_reward_score
0.8657202076286428
We can see that the R2 of the internal regression model is 0.87, which gives us good confidence that the reward estimates are high quality, and good to base our training on.
Optimal Policy Trees
Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes MPG. We will use a GridSearch
to fit an OptimalTreePolicyMaximizer
(note that if we were trying to minimize the outcomes, we would use OptimalTreePolicyMinimizer
):
grid = IAI.GridSearch(
IAI.OptimalTreePolicyMaximizer(
random_seed=1,
),
max_depth=4:5,
)
IAI.fit!(grid, train_X, train_rewards)
Fitted OptimalTreePolicyMaximizer:
1) Split: displacement < 264.5
2) Split: weight < 2162
3) Split: model year < 78.5
4) Prescribe: 17, 27 points, error 10.33
5) Prescribe: 23, 17 points, error 3.412
6) Split: model year < 72.5
7) Prescribe: 8, 16 points, error 18.05
8) Prescribe: 14, 108 points, error 15.89
9) Prescribe: 8, 67 points, error 24.89
The resulting tree recommends different accelerations based on the characteristics of the car. The intensity of the color in each leaf shows the difference in quality between the best and second-best acceleration values.
We can see that both extremes of our candidate acceleration values are prescribed by the tree. The cars with the greatest displacement received the lowest acceleration in order to maximise the MPG. However, for lower-powered, recent, heavy cars, the highest acceleration value is the best option.
We can make treatment prescriptions using predict
:
prescriptions = IAI.predict(grid, train_X)
235-element Array{String,1}:
"8"
"8"
"8"
"8"
"8"
"8"
"8"
"8"
"8"
"8"
⋮
"23"
"14"
"23"
"14"
"14"
"14"
"14"
"14"
"14"
The prescriptions are always returned as strings matching the column names of the input rewards matrix. In our case the treatments are numeric values, and if we want them in numeric form to use later we can convert them to numeric treatments using convert_treatments_to_numeric
:
IAI.convert_treatments_to_numeric(prescriptions)
235-element Array{Int64,1}:
8
8
8
8
8
8
8
8
8
8
⋮
23
14
23
14
14
14
14
14
14
If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with predict_treatment_rank
:
rank = IAI.predict_treatment_rank(grid, train_X)
235×6 Array{String,2}:
"8" "11" "14" "20" "23" "17"
"8" "11" "14" "20" "23" "17"
"8" "11" "14" "20" "23" "17"
"8" "11" "14" "20" "23" "17"
"8" "11" "14" "20" "23" "17"
"8" "11" "14" "20" "23" "17"
"8" "11" "14" "20" "23" "17"
"8" "11" "14" "20" "23" "17"
"8" "11" "14" "20" "23" "17"
"8" "11" "14" "20" "23" "17"
⋮ ⋮
"23" "20" "14" "17" "8" "11"
"14" "8" "11" "17" "20" "23"
"23" "20" "14" "17" "8" "11"
"14" "8" "11" "17" "20" "23"
"14" "8" "11" "17" "20" "23"
"14" "8" "11" "17" "20" "23"
"14" "8" "11" "17" "20" "23"
"14" "8" "11" "17" "20" "23"
"14" "8" "11" "17" "20" "23"
For each point in the data, this gives the treatments in order of effectiveness. As before, this are returned as strings, but we can convert the treatments to numeric values with convert_treatments_to_numeric
:
IAI.convert_treatments_to_numeric(rank)
235×6 Array{Int64,2}:
8 11 14 20 23 17
8 11 14 20 23 17
8 11 14 20 23 17
8 11 14 20 23 17
8 11 14 20 23 17
8 11 14 20 23 17
8 11 14 20 23 17
8 11 14 20 23 17
8 11 14 20 23 17
8 11 14 20 23 17
⋮ ⋮
23 20 14 17 8 11
14 8 11 17 20 23
23 20 14 17 8 11
14 8 11 17 20 23
14 8 11 17 20 23
14 8 11 17 20 23
14 8 11 17 20 23
14 8 11 17 20 23
14 8 11 17 20 23
To quantify the difference in performance behind the treatment rankings, we can use predict_treatment_outcome
to extract the estimated quality of each treatment for each point:
IAI.predict_treatment_outcome(grid, train_X)
235×6 DataFrame
Row │ 8 11 14 17 20 23
│ Float64 Float64 Float64 Float64 Float64 Float64
─────┼──────────────────────────────────────────────────────
1 │ 15.2432 14.6476 14.178 13.9031 13.9395 13.9393
2 │ 15.2432 14.6476 14.178 13.9031 13.9395 13.9393
3 │ 15.2432 14.6476 14.178 13.9031 13.9395 13.9393
4 │ 15.2432 14.6476 14.178 13.9031 13.9395 13.9393
5 │ 15.2432 14.6476 14.178 13.9031 13.9395 13.9393
6 │ 15.2432 14.6476 14.178 13.9031 13.9395 13.9393
7 │ 15.2432 14.6476 14.178 13.9031 13.9395 13.9393
8 │ 15.2432 14.6476 14.178 13.9031 13.9395 13.9393
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
229 │ 33.4467 33.4467 35.1713 34.3107 36.6724 36.7188
230 │ 23.9372 23.9348 24.2459 23.6747 23.2465 23.201
231 │ 23.9372 23.9348 24.2459 23.6747 23.2465 23.201
232 │ 23.9372 23.9348 24.2459 23.6747 23.2465 23.201
233 │ 23.9372 23.9348 24.2459 23.6747 23.2465 23.201
234 │ 23.9372 23.9348 24.2459 23.6747 23.2465 23.201
235 │ 23.9372 23.9348 24.2459 23.6747 23.2465 23.201
220 rows omitted
Evaluating Optimal Policy Trees
It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.
Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:
test_rewards, test_reward_score = IAI.fit_predict!(
reward_lnr, test_X, test_treatments, test_outcomes, treatment_candidates)
test_rewards
157×6 DataFrame
Row │ 8 11 14 17 20 23
│ Float64 Float64 Float64 Float64 Float64 Float64
─────┼──────────────────────────────────────────────────────
1 │ 14.6129 13.8657 13.044 13.7969 13.4663 13.4663
2 │ 15.3354 15.6313 14.7516 15.3333 13.8839 13.9302
3 │ 14.3205 14.935 14.2997 14.8618 13.2641 13.9901
4 │ 14.6497 13.9025 13.044 13.7969 13.4663 13.4663
5 │ 23.5706 23.8552 23.6826 24.5507 23.7003 23.8979
6 │ 22.3839 22.8751 21.7781 21.7699 20.2799 20.333
7 │ 21.596 21.596 21.0442 21.9311 21.6337 21.711
8 │ 32.7976 32.9805 32.3563 30.4403 28.8442 29.2377
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
151 │ 35.5521 35.7302 35.7228 37.1855 36.3021 42.1312
152 │ 26.0676 26.0676 25.8442 25.957 26.4951 26.5813
153 │ 31.1706 31.2501 31.2614 27.4992 27.1757 27.2066
154 │ 28.5079 28.6777 27.5714 27.4361 25.8763 25.9877
155 │ 38.871 38.9028 38.8375 37.9445 38.0162 38.0376
156 │ 30.5398 30.6071 30.6185 28.6492 28.3257 28.3542
157 │ 29.0757 29.143 29.1544 26.8099 26.4595 26.4803
142 rows omitted
test_reward_score
0.7929507332526184
We see the internal model of our test reward estimator has an R2 of 0.79. As with the training set, this gives us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation.
We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted MPG under the treatments prescribed by the tree for the test set. To do this, we use predict_outcomes
which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:
policy_outcomes = IAI.predict_outcomes(grid, test_X, test_rewards)
157-element Array{Float64,1}:
14.612932205200195
15.335400581359863
14.320535659790039
14.649663925170898
23.570629119873047
22.38387680053711
21.59600257873535
30.440296173095703
24.090347290039062
14.063908576965332
⋮
33.2572021484375
42.182464599609375
42.13115692138672
25.844161987304688
31.261442184448242
27.57144546508789
38.037628173828125
30.61852264404297
29.154422760009766
We can then get the average estimated MPG under our treatments:
using Statistics
mean(policy_outcomes)
25.186681510536534
We can compare this number to the average MPG observed:
mean(test_outcomes)
24.687261146496816
We see a small improvement over the real accelerations, although it is important to point out that the real accelerations were not restricted to the same discrete increments that we used.
Another point of comparison is a baseline policy that assigns the treatment that is best on average across all training points, which we can see from the root node of the tree is an acceleration of 14:
mean(test_rewards[:, "14"])
24.680347600560278
We can see that the personalization offered by the tree policy indeed improves upon a baseline where each observation receives the same treatment.