# Quick Start Guide: Optimal Policy Trees

*This is an R version of the corresponding OptimalTrees quick start guide.*

In this example we will give a demonstration of how to use Optimal Policy Trees. We will examine the impact of job training on annual earnings using the Lalonde sample from the National Supported Work Demonstration dataset. This is the same dataset used in the quick start guide for Optimal Prescriptive Trees but showcases a different approach to prescription.

First we load in the data:

```
colnames <- c("treatment", "age", "education", "black", "hispanic", "married",
"nodegree", "earnings_1975", "earnings_1978")
df_control <- read.table('nsw_control.txt', col.names=colnames)
df_treated <- read.table('nsw_treated.txt', col.names=colnames)
df <- rbind(df_control, df_treated)
```

```
treatment age education black hispanic married nodegree earnings_1975
1 0 23 10 1 0 0 1 0.000
2 0 26 12 0 0 0 0 0.000
3 0 22 9 1 0 0 1 0.000
4 0 34 9 1 0 0 1 4368.413
5 0 18 9 1 0 0 1 0.000
6 0 45 11 1 0 0 1 0.000
earnings_1978
1 0.00
2 12383.68
3 0.00
4 14051.16
5 10740.08
6 11796.47
[ reached 'max' / getOption("max.print") -- omitted 716 rows ]
```

Policy trees are trained using a features matrix/dataframe `X`

as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

- in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
- more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

## Reward Estimation

The treatment is whether or not the subject received job training, and the outcome is their 1978 earnings (which we are trying to maximize).

First, we extract this information and split into training and testing:

```
X <- df[, 2:(ncol(df) - 1)]
treatments <- ifelse(df$treatment == 1, "training", "no training")
outcomes <- df$earnings_1978
split <- iai::split_data("prescription_maximize", X, treatments, outcomes,
seed = 2)
train_X <- split$train$X
train_treatments <- split$train$treatments
train_outcomes <- split$train$outcomes
test_X <- split$test$X
test_treatments <- split$test$treatments
test_outcomes <- split$test$outcomes
```

Note that we have used the default 70%/30% split, but in many prescriptive problems it is desirable to save more data for testing to ensure high-quality reward estimation on the test set.

We will use a `reward_estimator`

to estimate the 1978 earnings for each participant in the study if they had received the opposite treatment to the one in the data:

```
reward_lnr <- iai::reward_estimator(
propensity_estimation_method = "random_forest",
outcome_estimation_method = "random_forest",
reward_estimation_method = "direct_method",
random_seed = 123,
)
train_rewards <- iai::fit_predict(reward_lnr, train_X, train_treatments,
train_outcomes)
```

```
no training training
1 2285.855 5478.407
2 5110.323 5288.381
3 10112.899 3752.632
4 5144.867 5475.135
5 8585.754 3806.614
6 5144.867 5475.135
7 4325.422 9195.874
8 8388.737 6035.564
9 5074.728 7009.062
10 8355.831 6452.283
11 12731.860 6345.318
12 5312.552 6736.182
13 7536.869 8873.670
14 6699.292 6355.603
15 3943.038 5181.536
16 2213.007 4896.049
17 8284.399 4018.888
18 1723.284 6707.135
19 2165.784 8741.794
20 3937.834 4957.077
21 4206.931 4634.368
22 3517.213 3103.345
23 2900.661 4125.286
24 7231.313 7227.731
25 5835.166 4616.612
26 3337.765 7009.737
27 4358.759 5381.365
28 6602.972 3230.551
29 4071.950 2721.125
30 3306.070 9826.666
[ reached 'max' / getOption("max.print") -- omitted 476 rows ]
```

## Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes 1978 earnings. We will use a `grid_search`

to fit an `optimal_tree_policy_maximizer`

(note that if we were trying to minimize the outcomes, we would use `optimal_tree_policy_minimizer`

):

```
grid <- iai::grid_search(
iai::optimal_tree_policy_maximizer(
random_seed = 1,
minbucket = 10,
),
max_depth = 1:5,
)
iai::fit(grid, train_X, train_rewards, train_proportion=0.5)
iai::get_learner(grid)
```

The resulting tree recommends training based on three variables: age, education and 1975 earnings. The intensity of the color in each leaf shows the strength of the treatment effect. We make the following observations:

- Training is least effective in node 6, which are older people with low 1975 earnings. This is understandable, as unskilled people at this age may have trouble picking up new skills and benefiting from the training.
- Training is most effective in node 5, which are younger, educated people with low 1975 earnings. In contrast to node 6, these people seem to benefit from the training due to their youth and education.
- There is also a training benefit in node 11, which are those with high 1975 earnings and age over 26. It seems the training also benefits those that are slightly older and with a higher baseline level of earnings.

We can make treatment prescriptions using `predict`

:

```
iai::predict(grid, train_X)
```

We can also use our estimated rewards to look up the predicted outcomes under these prescriptions with `predict_outcomes`

:

```
iai::predict_outcomes(grid, train_X, train_rewards)
```

```
[1] 5478.407 5288.381 3752.632 5475.135 8585.754 5475.135 9195.874
[8] 8388.737 5074.728 6452.283 6345.318 5312.552 7536.869 6355.603
[15] 5181.536 2213.007 4018.888 1723.284 8741.794 3937.834 4634.368
[22] 3103.345 4125.286 7227.731 4616.612 7009.737 5381.365 6602.972
[29] 4071.950 9826.666 7842.395 4542.637 6277.027 9007.217 2842.298
[36] 13293.527 8828.324 4250.640 7437.878 8255.703 6285.853 9145.151
[43] 6685.974 5006.211 6071.785 3774.492 6033.673 4125.286 3381.507
[50] 7630.163 3273.474 4713.087 6612.745 4048.027 10391.200 6440.708
[57] 4991.416 4273.344 6972.514 4554.249
[ reached getOption("max.print") -- omitted 446 entries ]
```

## Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

```
test_reward_lnr <- iai::reward_estimator(
propensity_estimation_method = "random_forest",
outcome_estimation_method = "random_forest",
reward_estimation_method = "doubly_robust",
random_seed = 1,
)
test_rewards <- iai::fit_predict(test_reward_lnr, test_X, test_treatments,
test_outcomes)
```

```
no training training
1 13466.14618 5840.714
2 2950.44706 5562.140
3 10780.62503 4377.647
4 17391.48562 5864.285
5 -599.99004 5225.245
6 6346.62663 15981.110
7 -847.69912 3445.859
8 -469.11793 5503.624
9 -1836.01236 7905.532
10 1586.34820 5100.385
11 17416.11235 4879.462
12 10378.60362 6200.316
13 -337.29477 7499.461
14 17117.64083 5864.029
15 5589.84647 4173.271
16 31.59942 2100.046
17 6388.21546 4067.652
18 1685.01892 5225.815
19 6488.22346 5763.884
20 -2086.81416 2603.910
21 8173.40516 14579.700
22 -2236.90502 6278.348
23 616.79681 4985.877
24 10378.53776 10015.613
25 7603.21607 2371.283
26 -257.77528 2887.120
27 -2080.50426 3777.181
28 8433.54180 7665.401
29 -263.00484 4314.367
30 -941.05120 2423.764
[ reached 'max' / getOption("max.print") -- omitted 186 rows ]
```

We can then evaluate the quality using these new estimated rewards. First, we will calculate the average predicted 1978 earnings under the treatments prescribed by the tree for the test set:

```
evaluate_reward_mean <- function(treatments, rewards) {
total <- 0.0
for (i in seq(treatments)) {
total <- total + rewards[i, treatments[i]]
}
total / length(treatments)
}
evaluate_reward_mean(iai::predict(grid, test_X), test_rewards)
```

`[1] 6047.523`

We can compare this number to the average earnings under the treatment assignments that were actually observed:

```
evaluate_reward_mean(test_treatments, test_rewards)
```

`[1] 5494.877`

We see a significant improvement in our prescriptions over the baseline.