# Quick Start Guide: Optimal Policy Trees with Categorical Treatment

*This is an R version of the corresponding OptimalTrees quick start guide.*

In this guide we will give a demonstration of how to use Optimal Policy Trees with categoric treatment options. For this example, we will use the Credit Approval dataset, where the task is to predict the approval outcome of credit card application. Because the features have been anonymized for confidentiality reasons, we will arbitrarily select one of the categoric variables `A1`

to be the treatment.

*Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process.*

First we load in the data and drop 37 rows with missing values:

```
df <- read.csv("crx.data", na.strings = "?", stringsAsFactors = T, header = F,
col.names = paste("A", as.character(seq(1, 16)), sep=""))
df <- na.omit(df)
```

```
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16
1 b 30.83 0.00 u g w v 1.25 t t 1 f g 202 0 +
2 a 58.67 4.46 u g q h 3.04 t t 6 f g 43 560 +
3 a 24.50 0.50 u g q h 1.50 t f 0 f g 280 824 +
[ reached 'max' / getOption("max.print") -- omitted 650 rows ]
```

Policy trees are trained using a features matrix/dataframe `X`

as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

- in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
- more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

## Reward Estimation

First, we split into training and testing:

```
X <- subset(df, select = -c(A1, A16))
treatments <- df$A1
outcomes <- df$A16 == "+"
split <- iai::split_data("policy_maximize", X, treatments, outcomes, seed = 1,
train_proportion = 0.6)
train_X <- split$train$X
train_treatments <- split$train$treatments
train_outcomes <- split$train$outcomes
test_X <- split$test$X
test_treatments <- split$test$treatments
test_outcomes <- split$test$outcomes
```

Note that we have used a training/test split of 60%/40% split, so that we save more data for testing to ensure high-quality reward estimation on the test set.

The treatment, `A1`

, is a categoric variable, so we follow the process for estimating rewards with categorical treatments.

We use a `categorical_reward_estimator`

to estimate the approval outcome under each option with a doubly-robust reward estimation method using random forests to estimate both propensity scores and outcomes:

```
reward_lnr <- iai::categorical_reward_estimator(
propensity_estimator = iai::random_forest_classifier(),
outcome_estimator = iai::random_forest_classifier(),
reward_estimator = "doubly_robust",
random_seed = 1,
)
train_rewards <- iai::fit_predict(
reward_lnr, train_X, train_treatments, train_outcomes,
propensity_score_criterion = "auc", outcome_score_criterion = "auc")
train_rewards$rewards
```

```
a b
1 0.5072970 1.1270442
2 1.4303011 0.7177813
3 0.9175000 1.2379856
4 0.4383333 1.2738960
5 0.6677778 1.0459521
6 2.0996227 0.7122323
7 0.3933333 1.8834391
8 1.0841345 0.8942857
9 0.5368507 1.2786780
10 1.1221707 0.8344643
11 0.8853595 1.0108452
12 0.2970255 1.0661191
13 1.0853732 0.9914444
14 1.6498251 0.7410098
15 1.3489717 0.5743182
16 1.6933752 0.6030278
17 1.2812771 0.8478636
18 0.9011569 1.0852148
19 0.9500000 1.0489109
20 1.5396342 0.8261215
21 0.7506013 1.0386142
22 1.2528396 0.8644118
23 0.6437729 1.1185766
24 0.6251538 1.1336791
25 0.8475817 1.0348805
26 0.6850000 1.0676756
27 0.9140000 1.0235274
28 0.2306026 1.2261112
29 0.6972719 1.1379483
30 0.3861438 1.1175471
[ reached 'max' / getOption("max.print") -- omitted 362 rows ]
```

```
train_rewards$score
```

```
$propensity
[1] 0.6160215
$outcome
$outcome$b
[1] 0.9134508
$outcome$a
[1] 0.9431854
```

We can see that the internal outcome estimation models have AUCs of 0.91 and 0.94, which gives us confidence that the reward estimates are of decent quality, and good to base our training on. The AUC for the propensity model is lower at 0.62, which is not particularly high, and suggests difficulty in reliably estimating the propensity. The doubly-robust estimation method should help to alleviate this problem, as it is designed to deliver good results if either propensity scores or outcomes are estimated well. However, we should always pay attention to these scores and proceed with caution if the estimation quality is low.

## Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes credit approval outcome. We will use a `grid_search`

to fit an `optimal_tree_policy_maximizer`

(note that if we were trying to minimize the outcomes, we would use `optimal_tree_policy_minimizer`

):

```
grid <- iai::grid_search(
iai::optimal_tree_policy_maximizer(
random_seed = 123,
max_categoric_levels_before_warning = 20,
),
max_depth = 1:5,
)
iai::fit(grid, train_X, train_rewards$rewards)
iai::get_learner(grid)
```

```
Fitted OptimalTreePolicyMaximizer:
1) Split: A6 in [aa,e,i,k,m,q,r,x] or is missing
2) Split: A14 < 162
3) Prescribe: a, 101 points, error 3.475
4) Prescribe: b, 89 points, error 3.721
5) Split: A2 < 28.63
6) Prescribe: b, 95 points, error 3.813
7) Prescribe: a, 107 points, error 3.706
```

The resulting tree recommends different values for `A1`

based on other characteristics: namely, the values of `A2`

, `A6`

, and `A14`

. The intensity of the color in each leaf shows the difference in quality between the best and second-best values. We can see that both `a`

and `b`

values are prescribed by the tree, implying each treatment is better suited to certain subgroups.

We can make treatment prescriptions using `predict`

:

```
iai::predict(grid, train_X)
```

```
[1] "a" "b" "b" "b" "a" "a" "a" "a" "b" "b" "b" "b" "a" "a" "a" "b" "a" "a" "a"
[20] "a" "b" "b" "b" "a" "a" "a" "a" "b" "a" "a" "a" "b" "b" "b" "a" "b" "a" "a"
[39] "b" "b" "b" "a" "a" "a" "b" "a" "a" "a" "a" "a" "a" "a" "b" "b" "a" "a" "a"
[58] "a" "a" "a"
[ reached getOption("max.print") -- omitted 332 entries ]
```

If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with `predict_treatment_rank`

:

```
iai::predict_treatment_rank(grid, train_X)
```

```
[,1] [,2]
[1,] "a" "b"
[2,] "b" "a"
[3,] "b" "a"
[4,] "b" "a"
[5,] "a" "b"
[6,] "a" "b"
[7,] "a" "b"
[8,] "a" "b"
[9,] "b" "a"
[10,] "b" "a"
[11,] "b" "a"
[12,] "b" "a"
[13,] "a" "b"
[14,] "a" "b"
[15,] "a" "b"
[16,] "b" "a"
[17,] "a" "b"
[18,] "a" "b"
[19,] "a" "b"
[20,] "a" "b"
[21,] "b" "a"
[22,] "b" "a"
[23,] "b" "a"
[24,] "a" "b"
[25,] "a" "b"
[26,] "a" "b"
[27,] "a" "b"
[28,] "b" "a"
[29,] "a" "b"
[30,] "a" "b"
[ reached getOption("max.print") -- omitted 362 rows ]
```

To quantify the difference in performance behind the treatment rankings, we can use `predict_treatment_outcome`

to extract the estimated quality of each treatment for each point:

```
iai::predict_treatment_outcome(grid, train_X)
```

```
a b
1 0.5104822 0.3522495
2 0.3175447 0.4955646
3 0.3175447 0.4955646
4 0.3175447 0.4955646
5 0.5104822 0.3522495
6 0.7412159 0.5613346
7 0.7412159 0.5613346
8 0.7412159 0.5613346
9 0.2519006 0.4029430
10 0.2519006 0.4029430
11 0.2519006 0.4029430
12 0.2519006 0.4029430
13 0.5104822 0.3522495
14 0.7412159 0.5613346
15 0.7412159 0.5613346
16 0.2519006 0.4029430
17 0.7412159 0.5613346
18 0.5104822 0.3522495
19 0.5104822 0.3522495
20 0.7412159 0.5613346
21 0.2519006 0.4029430
22 0.3175447 0.4955646
23 0.2519006 0.4029430
24 0.5104822 0.3522495
25 0.5104822 0.3522495
26 0.5104822 0.3522495
27 0.5104822 0.3522495
28 0.3175447 0.4955646
29 0.5104822 0.3522495
30 0.5104822 0.3522495
[ reached 'max' / getOption("max.print") -- omitted 362 rows ]
```

## Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

```
test_rewards <- iai::fit_predict(
reward_lnr, test_X, test_treatments, test_outcomes, propensity_score_criterion = "auc", outcome_score_criterion = "auc")
test_rewards$rewards
```

```
a b
1 1.05874091 0.9589181
2 0.69471041 1.0404291
3 0.53809524 1.4717999
4 0.49954545 1.2965528
5 2.21111945 0.5886364
6 0.23000000 1.2790227
7 0.32408541 1.0823379
8 1.50441067 0.9214676
9 0.61850000 1.0825273
10 0.34881297 1.1061304
11 0.79583333 1.0271951
12 0.86009868 1.0567228
13 0.93245098 1.0073525
14 0.83729187 1.0203060
15 1.05977627 0.8547368
16 0.84566667 1.0336681
17 0.79666667 1.0249298
18 0.79199415 1.0021401
19 0.91333333 1.0041882
20 0.90827273 1.0126069
21 1.80807455 0.7938889
22 0.09090909 1.2545612
23 4.47151165 0.6182483
24 0.21229293 1.3720624
25 0.50138889 1.1985341
26 0.43333333 1.1443273
27 0.79699415 1.0877025
28 0.87276535 1.0095916
29 1.62058188 0.7776111
30 0.96000000 1.0741291
[ reached 'max' / getOption("max.print") -- omitted 231 rows ]
```

```
test_rewards$score
```

```
$propensity
[1] 0.5759206
$outcome
$outcome$b
[1] 0.9409791
$outcome$a
[1] 0.9158503
```

We see the scores are similar to those on the training set, giving us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation. The AUC for the propensity model is again on the low side, but should be compensated for by means of the doubly-robust estimator.

We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted credict approval probability under the treatments prescribed by the tree for the test set. To do this, we use `predict_outcomes`

which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:

```
policy_outcomes <- iai::predict_outcomes(grid, test_X, test_rewards$rewards)
```

```
[1] 1.05874091 1.04042909 1.47179994 1.29655277 0.58863636 1.27902272
[7] 0.32408541 1.50441067 1.08252732 1.10613036 0.79583333 1.05672279
[13] 0.93245098 0.83729187 1.05977627 1.03366810 0.79666667 0.79199415
[19] 0.91333333 0.90827273 0.79388889 1.25456124 0.61824830 1.37206239
[25] 1.19853414 1.14432734 0.79699415 0.87276535 1.62058188 1.07412913
[31] -0.24684800 -0.10636450 -1.94628834 -0.09251022 0.36883333 -0.05582385
[37] -0.10618322 -0.07382335 -0.10850852 -0.03776958 0.71911187 -0.03429615
[43] 0.80150267 -0.24473192 1.01788302 0.48500000 1.04214179 0.97876374
[49] 0.44064394 0.75052264 0.89193381 1.02229824 0.87697368 1.15891552
[55] 1.11517368 0.87832945 1.90936380 0.62908541 0.54750000 1.84163959
[ reached getOption("max.print") -- omitted 201 entries ]
```

We can then get the average estimated approval probability under our treatments:

```
mean(policy_outcomes)
```

`[1] 0.444351`

We can compare this number to the average approval probability under the treatment assignments that were actually observed:

```
evaluate_reward_mean <- function(treatments, rewards) {
total <- 0.0
for (i in seq(treatments)) {
total <- total + rewards[i, treatments[i]]
}
total / length(treatments)
}
evaluate_reward_mean(test_treatments, test_rewards$rewards)
```

`[1] 0.4234183`

We see a small improvement over the real treatments.