Quick Start Guide: Optimal Policy Trees

This is an R version of the corresponding OptimalTrees quick start guide.

In this example we will give a demonstration of how to use Optimal Policy Trees. We will examine the impact of job training on annual earnings using the Lalonde sample from the National Supported Work Demonstration dataset. This is the same dataset used in the quick start guide for Optimal Prescriptive Trees but showcases a different approach to prescription.

First we load in the data:

colnames <- c("treatment", "age", "education", "black", "hispanic", "married",
              "nodegree", "earnings_1975", "earnings_1978")
df_control <- read.table('nsw_control.txt', col.names=colnames)
df_treated <- read.table('nsw_treated.txt', col.names=colnames)
df <- rbind(df_control, df_treated)
  treatment age education black hispanic married nodegree earnings_1975
1         0  23        10     1        0       0        1         0.000
2         0  26        12     0        0       0        0         0.000
3         0  22         9     1        0       0        1         0.000
4         0  34         9     1        0       0        1      4368.413
5         0  18         9     1        0       0        1         0.000
6         0  45        11     1        0       0        1         0.000
  earnings_1978
1          0.00
2      12383.68
3          0.00
4      14051.16
5      10740.08
6      11796.47
 [ reached 'max' / getOption("max.print") -- omitted 716 rows ]

Policy trees are trained using a features matrix/dataframe X as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

  • in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
  • more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

Reward Estimation

The treatment is whether or not the subject received job training, and the outcome is their 1978 earnings (which we are trying to maximize).

First, we extract this information and split into training and testing:

X <- df[, 2:(ncol(df) - 1)]
treatments <- ifelse(df$treatment == 1, "training", "no training")
outcomes <- df$earnings_1978

split <- iai::split_data("prescription_maximize", X, treatments, outcomes,
                         seed = 2)
train_X <- split$train$X
train_treatments <- split$train$treatments
train_outcomes <- split$train$outcomes
test_X <- split$test$X
test_treatments <- split$test$treatments
test_outcomes <- split$test$outcomes

Note that we have used the default 70%/30% split, but in many prescriptive problems it is desirable to save more data for testing to ensure high-quality reward estimation on the test set.

We will use a reward_estimator to estimate the 1978 earnings for each participant in the study if they had received the opposite treatment to the one in the data:

reward_lnr <- iai::reward_estimator(
    propensity_estimation_method = "random_forest",
    outcome_estimation_method = "random_forest",
    reward_estimation_method = "direct_method",
    random_seed = 123,
)
train_rewards <- iai::fit_predict(reward_lnr, train_X, train_treatments,
                                  train_outcomes)
   no training training
1     2285.855 5478.407
2     5110.323 5288.381
3    10112.899 3752.632
4     5144.867 5475.135
5     8585.754 3806.614
6     5144.867 5475.135
7     4325.422 9195.874
8     8388.737 6035.564
9     5074.728 7009.062
10    8355.831 6452.283
11   12731.860 6345.318
12    5312.552 6736.182
13    7536.869 8873.670
14    6699.292 6355.603
15    3943.038 5181.536
16    2213.007 4896.049
17    8284.399 4018.888
18    1723.284 6707.135
19    2165.784 8741.794
20    3937.834 4957.077
21    4206.931 4634.368
22    3517.213 3103.345
23    2900.661 4125.286
24    7231.313 7227.731
25    5835.166 4616.612
26    3337.765 7009.737
27    4358.759 5381.365
28    6602.972 3230.551
29    4071.950 2721.125
30    3306.070 9826.666
 [ reached 'max' / getOption("max.print") -- omitted 476 rows ]

Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes 1978 earnings. We will use a grid_search to fit an optimal_tree_policy_maximizer (note that if we were trying to minimize the outcomes, we would use optimal_tree_policy_minimizer):

grid <- iai::grid_search(
    iai::optimal_tree_policy_maximizer(
        random_seed = 1,
        minbucket = 10,
    ),
    max_depth = 1:5,
)
iai::fit(grid, train_X, train_rewards, train_proportion=0.5)
iai::get_learner(grid)
Optimal Trees Visualization

The resulting tree recommends training based on three variables: age, education and 1975 earnings. The intensity of the color in each leaf shows the strength of the treatment effect. We make the following observations:

  • Training is least effective in node 6, which are older people with low 1975 earnings. This is understandable, as unskilled people at this age may have trouble picking up new skills and benefiting from the training.
  • Training is most effective in node 5, which are younger, educated people with low 1975 earnings. In contrast to node 6, these people seem to benefit from the training due to their youth and education.
  • There is also a training benefit in node 11, which are those with high 1975 earnings and age over 26. It seems the training also benefits those that are slightly older and with a higher baseline level of earnings.

We can make treatment prescriptions using predict:

iai::predict(grid, train_X)

We can also use our estimated rewards to look up the predicted outcomes under these prescriptions with predict_outcomes:

iai::predict_outcomes(grid, train_X, train_rewards)
 [1]  5478.407  5288.381  3752.632  5475.135  8585.754  5475.135  9195.874
 [8]  8388.737  5074.728  6452.283  6345.318  5312.552  7536.869  6355.603
[15]  5181.536  2213.007  4018.888  1723.284  8741.794  3937.834  4634.368
[22]  3103.345  4125.286  7227.731  4616.612  7009.737  5381.365  6602.972
[29]  4071.950  9826.666  7842.395  4542.637  6277.027  9007.217  2842.298
[36] 13293.527  8828.324  4250.640  7437.878  8255.703  6285.853  9145.151
[43]  6685.974  5006.211  6071.785  3774.492  6033.673  4125.286  3381.507
[50]  7630.163  3273.474  4713.087  6612.745  4048.027 10391.200  6440.708
[57]  4991.416  4273.344  6972.514  4554.249
 [ reached getOption("max.print") -- omitted 446 entries ]

Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

test_reward_lnr <- iai::reward_estimator(
    propensity_estimation_method = "random_forest",
    outcome_estimation_method = "random_forest",
    reward_estimation_method = "doubly_robust",
    random_seed = 1,
)
test_rewards <- iai::fit_predict(test_reward_lnr, test_X, test_treatments,
                                 test_outcomes)
   no training  training
1  13466.14618  5840.714
2   2950.44706  5562.140
3  10780.62503  4377.647
4  17391.48562  5864.285
5   -599.99004  5225.245
6   6346.62663 15981.110
7   -847.69912  3445.859
8   -469.11793  5503.624
9  -1836.01236  7905.532
10  1586.34820  5100.385
11 17416.11235  4879.462
12 10378.60362  6200.316
13  -337.29477  7499.461
14 17117.64083  5864.029
15  5589.84647  4173.271
16    31.59942  2100.046
17  6388.21546  4067.652
18  1685.01892  5225.815
19  6488.22346  5763.884
20 -2086.81416  2603.910
21  8173.40516 14579.700
22 -2236.90502  6278.348
23   616.79681  4985.877
24 10378.53776 10015.613
25  7603.21607  2371.283
26  -257.77528  2887.120
27 -2080.50426  3777.181
28  8433.54180  7665.401
29  -263.00484  4314.367
30  -941.05120  2423.764
 [ reached 'max' / getOption("max.print") -- omitted 186 rows ]

We can then evaluate the quality using these new estimated rewards. First, we will calculate the average predicted 1978 earnings under the treatments prescribed by the tree for the test set:

evaluate_reward_mean <- function(treatments, rewards) {
  total <- 0.0
  for (i in seq(treatments)) {
    total <- total + rewards[i, treatments[i]]
  }
  total / length(treatments)
}

evaluate_reward_mean(iai::predict(grid, test_X), test_rewards)
[1] 6047.523

We can compare this number to the average earnings under the treatment assignments that were actually observed:

evaluate_reward_mean(test_treatments, test_rewards)
[1] 5494.877

We see a significant improvement in our prescriptions over the baseline.