Quick Start Guide: Optimal Policy Trees with Categorical Treatment
This is a Python version of the corresponding OptimalTrees quick start guide.
In this example we will give a demonstration of how to use Optimal Policy Trees. We will examine the impact of job training on annual earnings using the Lalonde sample from the National Supported Work Demonstration dataset. This is the same dataset used in the quick start guide for Optimal Prescriptive Trees but showcases a different approach to prescription.
First we load in the data:
import pandas as pd
colnames = ['treatment', 'age', 'education', 'black', 'hispanic', 'married',
'nodegree', 'earnings_1975', 'earnings_1978']
df_control = pd.read_csv('nsw_control.txt', names=colnames, sep='\s+')
df_treated = pd.read_csv('nsw_treated.txt', names=colnames, sep='\s+')
df = pd.concat([df_control, df_treated])
treatment age education ... nodegree earnings_1975 earnings_1978
0 0.0 23.0 10.0 ... 1.0 0.000 0.000
1 0.0 26.0 12.0 ... 0.0 0.000 12383.680
2 0.0 22.0 9.0 ... 1.0 0.000 0.000
3 0.0 34.0 9.0 ... 1.0 4368.413 14051.160
4 0.0 18.0 9.0 ... 1.0 0.000 10740.080
5 0.0 45.0 11.0 ... 1.0 0.000 11796.470
6 0.0 18.0 9.0 ... 1.0 0.000 9227.052
.. ... ... ... ... ... ... ...
290 1.0 25.0 14.0 ... 0.0 11536.570 36646.950
291 1.0 26.0 10.0 ... 1.0 0.000 0.000
292 1.0 20.0 9.0 ... 1.0 0.000 8881.665
293 1.0 31.0 4.0 ... 1.0 4023.211 7382.549
294 1.0 24.0 10.0 ... 1.0 4078.152 0.000
295 1.0 33.0 11.0 ... 1.0 25142.240 4181.942
296 1.0 33.0 12.0 ... 0.0 10941.350 15952.600
[722 rows x 9 columns]
Policy trees are trained using a features matrix/dataframe X
as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.
There are two ways to get this rewards matrix:
- in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
- more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment
Refer to the documentation on data preparation for more information on the data format.
In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.
Reward Estimation
The treatment is whether or not the subject received job training, and the outcome is their 1978 earnings (which we are trying to maximize).
First, we extract this information and split into training and testing:
from interpretableai import iai
X = df.iloc[:, 1:-1]
treatments = df.treatment.map({1: "training", 0: "no training"})
outcomes = df.earnings_1978
(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) = (
iai.split_data('prescription_maximize', X, treatments, outcomes, seed=2))
Note that we have used the default 70%/30% split, but in many prescriptive problems it is desirable to save more data for testing to ensure high-quality reward estimation on the test set.
We will use a CategoricalRewardEstimator
to estimate the 1978 earnings for each participant in the study if they had received the opposite treatment to the one in the data:
reward_lnr = iai.CategoricalRewardEstimator(
propensity_estimation_method='random_forest',
outcome_estimation_method='random_forest',
reward_estimation_method='direct_method',
random_seed=123,
)
train_rewards = reward_lnr.fit_predict(train_X, train_treatments,
train_outcomes)
no training training
0 2285.855381 5478.406659
1 5110.322947 5288.380815
2 10112.899233 3752.631732
3 5144.866533 5475.134910
4 8585.753646 3806.614232
5 5144.866533 5475.134910
6 4325.421920 9195.874487
.. ... ...
499 4015.435727 4941.580227
500 4510.511407 6869.504678
501 4542.340423 6353.019829
502 7281.530896 5726.990139
503 3136.035717 4247.089859
504 8103.522340 6597.851004
505 5805.665336 8545.415283
[506 rows x 2 columns]
Optimal Policy Trees
Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes 1978 earnings. We will use a GridSearch
to fit an OptimalTreePolicyMaximizer
(note that if we were trying to minimize the outcomes, we would use OptimalTreePolicyMinimizer
):
grid = iai.GridSearch(
iai.OptimalTreePolicyMaximizer(
random_seed=1,
minbucket=10,
),
max_depth=range(1, 6),
)
grid.fit(train_X, train_rewards, train_proportion=0.5)
grid.get_learner()
The resulting tree recommends training based on three variables: age, education and 1975 earnings. The intensity of the color in each leaf shows the strength of the treatment effect. We make the following observations:
- Training is least effective in node 6, which are older people with low 1975 earnings. This is understandable, as unskilled people at this age may have trouble picking up new skills and benefiting from the training.
- Training is most effective in node 5, which are younger, educated people with low 1975 earnings. In contrast to node 6, these people seem to benefit from the training due to their youth and education.
- There is also a training benefit in node 11, which are those with high 1975 earnings and age over 26. It seems the training also benefits those that are slightly older and with a higher baseline level of earnings.
We can make treatment prescriptions using predict
:
grid.predict(train_X)
We can also use our estimated rewards to look up the predicted outcomes under these prescriptions with [predict_outcomes
](@ref predict_outcomes(::PolicyLearner):
grid.predict_outcomes(train_X, train_rewards)
array([5478.40665904, 5288.3808148 , 3752.63173195, ..., 4247.0898589 ,
6597.85100371, 8545.41528283])
Evaluating Optimal Policy Trees
It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.
Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:
test_reward_lnr = iai.CategoricalRewardEstimator(
propensity_estimation_method='random_forest',
outcome_estimation_method='random_forest',
reward_estimation_method='doubly_robust',
random_seed=1,
)
test_rewards = test_reward_lnr.fit_predict(test_X, test_treatments,
test_outcomes)
no training training
0 13466.146182 5840.713692
1 2950.447056 5562.140020
2 10780.625029 4377.646614
3 17391.485616 5864.284600
4 -599.990036 5225.244544
5 6346.626635 15981.109548
6 -847.699124 3445.858808
.. ... ...
209 9441.837867 13602.548048
210 2275.944765 10463.247238
211 5524.173342 -1373.690742
212 4849.778366 4586.343908
213 3339.218108 -366.204516
214 9404.629238 42383.012909
215 3758.271583 -4818.630926
[216 rows x 2 columns]
We can then evaluate the quality using these new estimated rewards. First, we will calculate the average predicted 1978 earnings under the treatments prescribed by the tree for the test set:
def evaluate_reward_mean(treatments, rewards):
total = 0.0
for i in range(len(treatments)):
total += rewards[treatments[i]][i]
return total / len(treatments)
evaluate_reward_mean(grid.predict(test_X), test_rewards)
6047.52316269659
We can compare this number to the average earnings under the treatment assignments that were actually observed:
evaluate_reward_mean(test_treatments, test_rewards)
5494.877142584506
We see a significant improvement in our prescriptions over the baseline.