Quick Start Guide: Optimal Policy Trees with Categorical Treatment

This is a Python version of the corresponding OptimalTrees quick start guide.

In this example we will give a demonstration of how to use Optimal Policy Trees. We will examine the impact of job training on annual earnings using the Lalonde sample from the National Supported Work Demonstration dataset. This is the same dataset used in the quick start guide for Optimal Prescriptive Trees but showcases a different approach to prescription.

First we load in the data:

import pandas as pd
colnames = ['treatment', 'age', 'education', 'black', 'hispanic', 'married',
            'nodegree', 'earnings_1975', 'earnings_1978']
df_control = pd.read_csv('nsw_control.txt', names=colnames, sep='\s+')
df_treated = pd.read_csv('nsw_treated.txt', names=colnames, sep='\s+')
df = pd.concat([df_control, df_treated])

     treatment   age  education  ...  nodegree  earnings_1975  earnings_1978
0          0.0  23.0       10.0  ...       1.0          0.000          0.000
1          0.0  26.0       12.0  ...       0.0          0.000      12383.680
2          0.0  22.0        9.0  ...       1.0          0.000          0.000
3          0.0  34.0        9.0  ...       1.0       4368.413      14051.160
4          0.0  18.0        9.0  ...       1.0          0.000      10740.080
5          0.0  45.0       11.0  ...       1.0          0.000      11796.470
6          0.0  18.0        9.0  ...       1.0          0.000       9227.052
..         ...   ...        ...  ...       ...            ...            ...
290        1.0  25.0       14.0  ...       0.0      11536.570      36646.950
291        1.0  26.0       10.0  ...       1.0          0.000          0.000
292        1.0  20.0        9.0  ...       1.0          0.000       8881.665
293        1.0  31.0        4.0  ...       1.0       4023.211       7382.549
294        1.0  24.0       10.0  ...       1.0       4078.152          0.000
295        1.0  33.0       11.0  ...       1.0      25142.240       4181.942
296        1.0  33.0       12.0  ...       0.0      10941.350      15952.600

[722 rows x 9 columns]

Policy trees are trained using a features matrix/dataframe X as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

Reward Estimation

The treatment is whether or not the subject received job training, and the outcome is their 1978 earnings (which we are trying to maximize).

First, we extract this information and split into training and testing:

from interpretableai import iai

X = df.iloc[:, 1:-1]
treatments = df.treatment.map({1: "training", 0: "no training"})
outcomes = df.earnings_1978

(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) = (
    iai.split_data('prescription_maximize', X, treatments, outcomes, seed=2))

Note that we have used the default 70%/30% split, but in many prescriptive problems it is desirable to save more data for testing to ensure high-quality reward estimation on the test set.

We will use a CategoricalRewardEstimator to estimate the 1978 earnings for each participant in the study if they had received the opposite treatment to the one in the data:

reward_lnr = iai.CategoricalRewardEstimator(
    propensity_estimation_method='random_forest',
    outcome_estimation_method='random_forest',
    reward_estimation_method='direct_method',
    random_seed=123,
)
train_rewards = reward_lnr.fit_predict(train_X, train_treatments,
                                       train_outcomes)

      no training     training
0     2285.855381  5478.406659
1     5110.322947  5288.380815
2    10112.899233  3752.631732
3     5144.866533  5475.134910
4     8585.753646  3806.614232
5     5144.866533  5475.134910
6     4325.421920  9195.874487
..            ...          ...
499   4015.435727  4941.580227
500   4510.511407  6869.504678
501   4542.340423  6353.019829
502   7281.530896  5726.990139
503   3136.035717  4247.089859
504   8103.522340  6597.851004
505   5805.665336  8545.415283

[506 rows x 2 columns]

Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes 1978 earnings. We will use a GridSearch to fit an OptimalTreePolicyMaximizer (note that if we were trying to minimize the outcomes, we would use OptimalTreePolicyMinimizer):

grid = iai.GridSearch(
    iai.OptimalTreePolicyMaximizer(
        random_seed=1,
        minbucket=10,
    ),
    max_depth=range(1, 6),
)
grid.fit(train_X, train_rewards, train_proportion=0.5)
grid.get_learner()

Optimal Trees Visualization

The resulting tree recommends training based on three variables: age, education and 1975 earnings. The intensity of the color in each leaf shows the strength of the treatment effect. We make the following observations:

Training is least effective in node 6, which are older people with low 1975 earnings. This is understandable, as unskilled people at this age may have trouble picking up new skills and benefiting from the training.
Training is most effective in node 5, which are younger, educated people with low 1975 earnings. In contrast to node 6, these people seem to benefit from the training due to their youth and education.
There is also a training benefit in node 11, which are those with high 1975 earnings and age over 26. It seems the training also benefits those that are slightly older and with a higher baseline level of earnings.

We can make treatment prescriptions using predict:

grid.predict(train_X)

We can also use our estimated rewards to look up the predicted outcomes under these prescriptions with [predict_outcomes](@ref predict_outcomes(::PolicyLearner):

grid.predict_outcomes(train_X, train_rewards)

array([5478.40665904, 5288.3808148 , 3752.63173195, ..., 4247.0898589 ,
       6597.85100371, 8545.41528283])

Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

test_reward_lnr = iai.CategoricalRewardEstimator(
    propensity_estimation_method='random_forest',
    outcome_estimation_method='random_forest',
    reward_estimation_method='doubly_robust',
    random_seed=1,
)
test_rewards = test_reward_lnr.fit_predict(test_X, test_treatments,
                                           test_outcomes)

      no training      training
0    13466.146182   5840.713692
1     2950.447056   5562.140020
2    10780.625029   4377.646614
3    17391.485616   5864.284600
4     -599.990036   5225.244544
5     6346.626635  15981.109548
6     -847.699124   3445.858808
..            ...           ...
209   9441.837867  13602.548048
210   2275.944765  10463.247238
211   5524.173342  -1373.690742
212   4849.778366   4586.343908
213   3339.218108   -366.204516
214   9404.629238  42383.012909
215   3758.271583  -4818.630926

[216 rows x 2 columns]

We can then evaluate the quality using these new estimated rewards. First, we will calculate the average predicted 1978 earnings under the treatments prescribed by the tree for the test set:

def evaluate_reward_mean(treatments, rewards):
  total = 0.0
  for i in range(len(treatments)):
    total += rewards[treatments[i]][i]
  return total / len(treatments)

evaluate_reward_mean(grid.predict(test_X), test_rewards)

6047.52316269659

We can compare this number to the average earnings under the treatment assignments that were actually observed:

evaluate_reward_mean(test_treatments, test_rewards)

5494.877142584506

We see a significant improvement in our prescriptions over the baseline.