Quick Start Guide: Optimal Policy Trees

In this example we will give a demonstration of how to use Optimal Policy Trees. We will examine the impact of job training on annual earnings using the Lalonde sample from the National Supported Work Demonstration dataset. This is the same dataset used in the quick start guide for Optimal Prescriptive Trees but showcases a different approach to prescription.

First we load in the data:

using CSV, DataFrames
colnames = [:treatment, :age, :education, :black, :hispanic, :married,
            :nodegree, :earnings_1975, :earnings_1978]
df_control = DataFrame(CSV.File("nsw_control.txt", header=colnames, delim=' ',
                                ignorerepeated=true))
df_treated = DataFrame(CSV.File("nsw_treated.txt", header=colnames, delim=' ',
                                ignorerepeated=true))
df = [df_control; df_treated]

722×9 DataFrame. Omitted printing of 3 columns
│ Row │ treatment │ age     │ education │ black   │ hispanic │ married │
│     │ Float64   │ Float64 │ Float64   │ Float64 │ Float64  │ Float64 │
├─────┼───────────┼─────────┼───────────┼─────────┼──────────┼─────────┤
│ 1   │ 0.0       │ 23.0    │ 10.0      │ 1.0     │ 0.0      │ 0.0     │
│ 2   │ 0.0       │ 26.0    │ 12.0      │ 0.0     │ 0.0      │ 0.0     │
│ 3   │ 0.0       │ 22.0    │ 9.0       │ 1.0     │ 0.0      │ 0.0     │
│ 4   │ 0.0       │ 34.0    │ 9.0       │ 1.0     │ 0.0      │ 0.0     │
│ 5   │ 0.0       │ 18.0    │ 9.0       │ 1.0     │ 0.0      │ 0.0     │
│ 6   │ 0.0       │ 45.0    │ 11.0      │ 1.0     │ 0.0      │ 0.0     │
│ 7   │ 0.0       │ 18.0    │ 9.0       │ 1.0     │ 0.0      │ 0.0     │
⋮
│ 715 │ 1.0       │ 26.0    │ 10.0      │ 1.0     │ 0.0      │ 0.0     │
│ 716 │ 1.0       │ 25.0    │ 14.0      │ 1.0     │ 0.0      │ 1.0     │
│ 717 │ 1.0       │ 26.0    │ 10.0      │ 1.0     │ 0.0      │ 1.0     │
│ 718 │ 1.0       │ 20.0    │ 9.0       │ 1.0     │ 0.0      │ 0.0     │
│ 719 │ 1.0       │ 31.0    │ 4.0       │ 1.0     │ 0.0      │ 0.0     │
│ 720 │ 1.0       │ 24.0    │ 10.0      │ 1.0     │ 0.0      │ 1.0     │
│ 721 │ 1.0       │ 33.0    │ 11.0      │ 1.0     │ 0.0      │ 1.0     │
│ 722 │ 1.0       │ 33.0    │ 12.0      │ 1.0     │ 0.0      │ 1.0     │

Policy trees are trained using a features matrix/dataframe X as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

Reward Estimation

The treatment is whether or not the subject received job training, and the outcome is their 1978 earnings (which we are trying to maximize).

First, we extract this information and split into training and testing:

X = df[:, 2:(end - 1)]
treatments = [t == 1 ? "training" : "no training" for t in df.treatment]
outcomes = df.earnings_1978

(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) =
    IAI.split_data(:prescription_maximize, X, treatments, outcomes, seed=2)

Note that we have used the default 70%/30% split, but in many prescriptive problems it is desirable to save more data for testing to ensure high-quality reward estimation on the test set.

We will use a RewardEstimator to estimate the 1978 earnings for each participant in the study if they had received the opposite treatment to the one in the data:

reward_lnr = IAI.RewardEstimator(
    propensity_estimation_method=:random_forest,
    outcome_estimation_method=:random_forest,
    reward_estimation_method=:direct_method,
    random_seed=123,
)
train_rewards = IAI.fit_predict!(reward_lnr, train_X, train_treatments,
                                 train_outcomes)

506×2 DataFrame
│ Row │ no training │ training │
│     │ Float64     │ Float64  │
├─────┼─────────────┼──────────┤
│ 1   │ 2285.86     │ 5478.41  │
│ 2   │ 5110.32     │ 5288.38  │
│ 3   │ 10112.9     │ 3752.63  │
│ 4   │ 5144.87     │ 5475.13  │
│ 5   │ 8585.75     │ 3806.61  │
│ 6   │ 5144.87     │ 5475.13  │
│ 7   │ 4325.42     │ 9195.87  │
⋮
│ 499 │ 8894.42     │ 5051.02  │
│ 500 │ 4015.44     │ 4941.58  │
│ 501 │ 4510.51     │ 6869.5   │
│ 502 │ 4542.34     │ 6353.02  │
│ 503 │ 7281.53     │ 5726.99  │
│ 504 │ 3136.04     │ 4247.09  │
│ 505 │ 8103.52     │ 6597.85  │
│ 506 │ 5805.67     │ 8545.42  │

Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes 1978 earnings. We will use a GridSearch to fit an OptimalTreePolicyMaximizer (note that if we were trying to minimize the outcomes, we would use OptimalTreePolicyMinimizer):

grid = IAI.GridSearch(
    IAI.OptimalTreePolicyMaximizer(
        random_seed=1,
        minbucket=10,
    ),
    max_depth=1:5,
)
IAI.fit!(grid, train_X, train_rewards, train_proportion=0.5)

Fitted OptimalTreePolicyMaximizer:
  1) Split: earnings_1975 < 5359
    2) Split: age < 42.5
      3) Split: education < 8.5
        4) Prescribe: no training, 54 points, error 14564
        5) Prescribe: training, 340 points, error 13278.4
      6) Prescribe: no training, 12 points, error 13202.4
    7) Split: age < 26.5
      8) Prescribe: no training, 64 points, error 11677.4
      9) Split: earnings_1975 < 6912.2
        10) Prescribe: no training, 11 points, error 13965.4
        11) Prescribe: training, 25 points, error 13058.4

Optimal Trees Visualization

The resulting tree recommends training based on three variables: age, education and 1975 earnings. The intensity of the color in each leaf shows the strength of the treatment effect. We make the following observations:

Training is least effective in node 6, which are older people with low 1975 earnings. This is understandable, as unskilled people at this age may have trouble picking up new skills and benefiting from the training.
Training is most effective in node 5, which are younger, educated people with low 1975 earnings. In contrast to node 6, these people seem to benefit from the training due to their youth and education.
There is also a training benefit in node 11, which are those with high 1975 earnings and age over 26. It seems the training also benefits those that are slightly older and with a higher baseline level of earnings.

We can make treatment prescriptions using predict:

IAI.predict(grid, train_X)

506-element Array{String,1}:
 "training"
 "training"
 "training"
 "training"
 "no training"
 "training"
 "training"
 "no training"
 "no training"
 "training"
 ⋮
 "training"
 "training"
 "training"
 "training"
 "training"
 "no training"
 "training"
 "training"
 "training"

We can also use our estimated rewards to look up the predicted outcomes under these prescriptions with predict_outcomes:

IAI.predict_outcomes(grid, train_X, train_rewards)

506-element Array{Float64,1}:
 5478.40665904332
 5288.38081480028
 3752.63173194986
 5475.13491034988
 8585.75364633975
 5475.13491034988
 9195.87448716382
 8388.73697496597
 5074.72757498081
 6452.28253923294
    ⋮
 5868.71916244308
 5051.02180960689
 4941.58022746884
 6869.50467782097
 6353.01982895157
 7281.53089567268
 4247.08985889609
 6597.85100370879
 8545.41528283379

Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

test_reward_lnr = IAI.RewardEstimator(
    propensity_estimation_method=:random_forest,
    outcome_estimation_method=:random_forest,
    reward_estimation_method=:doubly_robust,
    random_seed=1,
)
test_rewards = IAI.fit_predict!(test_reward_lnr, test_X, test_treatments,
                                test_outcomes)

216×2 DataFrame
│ Row │ no training │ training │
│     │ Float64     │ Float64  │
├─────┼─────────────┼──────────┤
│ 1   │ 13466.1     │ 5840.71  │
│ 2   │ 2950.45     │ 5562.14  │
│ 3   │ 10780.6     │ 4377.65  │
│ 4   │ 17391.5     │ 5864.28  │
│ 5   │ -599.99     │ 5225.24  │
│ 6   │ 6346.63     │ 15981.1  │
│ 7   │ -847.699    │ 3445.86  │
⋮
│ 209 │ 7298.05     │ -1284.03 │
│ 210 │ 9441.84     │ 13602.5  │
│ 211 │ 2275.94     │ 10463.2  │
│ 212 │ 5524.17     │ -1373.69 │
│ 213 │ 4849.78     │ 4586.34  │
│ 214 │ 3339.22     │ -366.205 │
│ 215 │ 9404.63     │ 42383.0  │
│ 216 │ 3758.27     │ -4818.63 │

We can then evaluate the quality using these new estimated rewards. First, we will calculate the average predicted 1978 earnings under the treatments prescribed by the tree for the test set:

using Statistics
mean(IAI.predict_outcomes(grid, test_X, test_rewards))

6047.523162696585

We can compare this number to the average earnings under the treatment assignments that were actually observed:

mean(test_rewards[i, test_treatments[i]] for i = 1:size(test_X, 1))

5494.877142584506

We see a significant improvement in our prescriptions over the baseline.