Quick Start Guide: Optimal Policy Trees with Numeric Treatment
This is a Python version of the corresponding OptimalTrees quick start guide.
In this example we will give a demonstration of how to use Optimal Policy Trees with numeric treatment options. For this example, we will the auto-mpg dataset, where the task is usually to predict a car's fuel efficiency (in miles-per-gallon, or MPG) based on other characteristics of the car. To apply a prescriptive lens to this case, we will instead treat the amount of acceleration as a treatment that can be controlled, and try to find the value that optimizes the MPG for a given car.
Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process. For a real-world case study using similar techniques, see the grocery pricing case study.
First we load in the data and drop 6 rows with missing values:
import pandas as pd
df = pd.read_csv("auto-mpg.csv", na_values='?').dropna()
mpg cylinders ... origin car name
0 18.0 8 ... 1 chevrolet chevelle malibu
1 15.0 8 ... 1 buick skylark 320
2 18.0 8 ... 1 plymouth satellite
3 16.0 8 ... 1 amc rebel sst
4 17.0 8 ... 1 ford torino
5 15.0 8 ... 1 ford galaxie 500
6 14.0 8 ... 1 chevrolet impala
.. ... ... ... ... ...
391 36.0 4 ... 1 dodge charger 2.2
392 27.0 4 ... 1 chevrolet camaro
393 27.0 4 ... 1 ford mustang gl
394 44.0 4 ... 2 vw pickup
395 32.0 4 ... 1 dodge rampage
396 28.0 4 ... 1 ford ranger
397 31.0 4 ... 1 chevy s-10
[392 rows x 9 columns]
Policy trees are trained using a features matrix/dataframe X
as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.
There are two ways to get this rewards matrix:
- in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
- more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment
Refer to the documentation on data preparation for more information on the data format.
In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.
Reward Estimation
Please refer to the Reward Estimation documentation for a detailed description on how to perform reward estimation for numeric treatments properly. For simplicity and to keep the focus on Optimal Policy Trees, this quick start guide does not cover tuning the reward estimation parameters, but in real problems this tuning is an important step.
First, we split into training and testing:
from interpretableai import iai
X = df.drop(columns=['mpg', 'acceleration', 'car name'])
treatments = df.acceleration
outcomes = df.mpg
(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) = (
iai.split_data('policy_maximize', X, treatments, outcomes, seed=1, train_proportion=0.6))
Note that we have used a training/test split of 60%/40%, so that we save more data for testing to ensure high-quality reward estimation on the test set.
The treatment, acceleration, is a numeric value, so we follow the process for estimating rewards with numeric treatments. We will consider prescribing acceleration values from 8 to 23, in steps of 3:
treatment_candidates = range(8, 26, 3)
The outcome is continuous, so we use a NumericRewardEstimator
to estimate the MPG under our candidate acceleration values using an XGBoost model:
reward_lnr = iai.NumericRewardEstimator(
outcome_estimator=iai.XGBoostRegressor(),
random_seed=12345,
)
train_rewards, train_reward_score = reward_lnr.fit_predict(
train_X, train_treatments, train_outcomes, treatment_candidates)
train_rewards
8 11 14 17 20 23
0 17.564890 17.927940 16.553040 16.042833 16.173355 16.250404
1 16.236668 15.757603 14.819180 14.833300 14.456474 14.456474
2 18.743624 19.106674 17.385897 16.654261 16.799778 16.876827
3 14.889137 13.050015 12.355983 12.301568 12.323156 12.307738
4 14.406099 13.214581 12.901717 12.691013 12.926937 12.939469
5 14.889137 13.050015 12.355983 12.301568 12.323156 12.307738
6 13.132867 11.941333 11.659549 11.439299 11.675222 11.687755
.. ... ... ... ... ... ...
228 34.726242 34.726242 34.882076 31.225338 35.134323 35.153179
229 21.904802 21.904802 22.257420 21.618423 21.293514 21.138433
230 21.887117 21.887117 22.205151 21.410133 21.408575 21.283152
231 27.897453 27.897453 29.680870 28.420534 27.673887 27.518806
232 30.069603 30.069603 30.108929 29.270477 30.982830 30.957016
233 26.162659 26.162659 26.393810 25.313986 24.814707 24.828367
234 28.979275 28.979275 29.210428 28.138910 27.868574 27.882235
[235 rows x 6 columns]
train_reward_score
0.8657202076286428
We can see that the R2 of the internal regression model is 0.87, which gives us good confidence that the reward estimates are high quality, and good to base our training on.
Optimal Policy Trees
Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes MPG. We will use a GridSearch
to fit an OptimalTreePolicyMaximizer
(note that if we were trying to minimize the outcomes, we would use OptimalTreePolicyMinimizer
):
grid = iai.GridSearch(
iai.OptimalTreePolicyMaximizer(
random_seed=1,
),
max_depth=range(4, 6),
)
grid.fit(train_X, train_rewards)
grid.get_learner()
The resulting tree recommends different accelerations based on the characteristics of the car. The intensity of the color in each leaf shows the difference in quality between the best and second-best acceleration values.
We can see that both extremes of our candidate acceleration values are prescribed by the tree. The cars with the greatest displacement received the lowest acceleration in order to maximise the MPG. However, for lower-powered, recent, heavy cars, the highest acceleration value is the best option.
We can make treatment prescriptions using predict
:
prescriptions = grid.predict(train_X)
The prescriptions are always returned as strings matching the column names of the input rewards matrix. In our case the treatments are numeric values, and if we want them in numeric form to use later we can convert them to numeric treatments using convert_treatments_to_numeric
:
iai.convert_treatments_to_numeric(prescriptions)
array([ 8, 8, 8, ..., 14, 14, 14], dtype=int64)
If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with predict_treatment_rank
:
rank = grid.predict_treatment_rank(train_X)
array([['8', '11', '14', '20', '23', '17'],
['8', '11', '14', '20', '23', '17'],
['8', '11', '14', '20', '23', '17'],
...,
['14', '8', '11', '17', '20', '23'],
['14', '8', '11', '17', '20', '23'],
['14', '8', '11', '17', '20', '23']], dtype='<U2')
For each point in the data, this gives the treatments in order of effectiveness. As before, this are returned as strings, but we can convert the treatments to numeric values with convert_treatments_to_numeric
:
iai.convert_treatments_to_numeric(rank)
array([[ 8, 11, 14, 20, 23, 17],
[ 8, 11, 14, 20, 23, 17],
[ 8, 11, 14, 20, 23, 17],
...,
[14, 8, 11, 17, 20, 23],
[14, 8, 11, 17, 20, 23],
[14, 8, 11, 17, 20, 23]], dtype=int64)
To quantify the difference in performance behind the treatment rankings, we can use predict_treatment_outcome
to extract the estimated quality of each treatment for each point:
grid.predict_treatment_outcome(train_X)
8 11 14 17 20 23
0 15.243216 14.647557 14.178015 13.903116 13.939541 13.939264
1 15.243216 14.647557 14.178015 13.903116 13.939541 13.939264
2 15.243216 14.647557 14.178015 13.903116 13.939541 13.939264
3 15.243216 14.647557 14.178015 13.903116 13.939541 13.939264
4 15.243216 14.647557 14.178015 13.903116 13.939541 13.939264
5 15.243216 14.647557 14.178015 13.903116 13.939541 13.939264
6 15.243216 14.647557 14.178015 13.903116 13.939541 13.939264
.. ... ... ... ... ... ...
228 33.446706 33.446706 35.171294 34.310683 36.672391 36.718783
229 23.937211 23.934765 24.245889 23.674675 23.246540 23.200968
230 23.937211 23.934765 24.245889 23.674675 23.246540 23.200968
231 23.937211 23.934765 24.245889 23.674675 23.246540 23.200968
232 23.937211 23.934765 24.245889 23.674675 23.246540 23.200968
233 23.937211 23.934765 24.245889 23.674675 23.246540 23.200968
234 23.937211 23.934765 24.245889 23.674675 23.246540 23.200968
[235 rows x 6 columns]
Evaluating Optimal Policy Trees
It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.
Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:
test_rewards, test_reward_score = reward_lnr.fit_predict(
test_X, test_treatments, test_outcomes, treatment_candidates)
test_rewards
8 11 14 17 20 23
0 14.612932 13.865725 13.044035 13.796869 13.466252 13.466344
1 15.335401 15.631332 14.751638 15.333280 13.883915 13.930205
2 14.320536 14.935003 14.299742 14.861835 13.264057 13.990059
3 14.649664 13.902456 13.044035 13.796869 13.466252 13.466344
4 23.570629 23.855164 23.682589 24.550678 23.700291 23.897943
5 22.383877 22.875113 21.778091 21.769884 20.279921 20.333008
6 21.596003 21.596003 21.044224 21.931101 21.633659 21.711040
.. ... ... ... ... ... ...
150 35.552063 35.730183 35.722778 37.185501 36.302055 42.131157
151 26.067631 26.067631 25.844162 25.956953 26.495090 26.581295
152 31.170589 31.250059 31.261442 27.499216 27.175722 27.206593
153 28.507931 28.677679 27.571445 27.436069 25.876339 25.987709
154 38.870956 38.902817 38.837486 37.944462 38.016224 38.037628
155 30.539764 30.607140 30.618523 28.649202 28.325710 28.354242
156 29.075665 29.143040 29.154423 26.809921 26.459511 26.480341
[157 rows x 6 columns]
test_reward_score
0.7929507332526184
We see the internal model of our test reward estimator has an R2 of 0.79. As with the training set, this gives us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation.
We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted MPG under the treatments prescribed by the tree for the test set. To do this, we use predict_outcomes
which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:
policy_outcomes = grid.predict_outcomes(test_X, test_rewards)
array([14.61293221, 15.33540058, 14.32053566, ..., 38.03762817,
30.61852264, 29.15442276])
We can then get the average estimated MPG under our treatments:
policy_outcomes.mean()
np.float64(25.18668151)
We can compare this number to the average estimated MPG under a baseline policy that assigns the treatment that is best on average across all training points, which we can see from the root node of the tree is an acceleration of 14:
test_rewards['14'].mean()
np.float64(24.6803476)
We can see that the personalization offered by the tree policy indeed improves upon a baseline where each observation receives the same treatment.