# Quick Start Guide: Optimal Policy Trees with Numeric Treatment

*This is an R version of the corresponding OptimalTrees quick start guide.*

In this example we will give a demonstration of how to use Optimal Policy Trees with numeric treatment options. For this example, we will the auto-mpg dataset, where the task is usually to predict a car's fuel efficiency (in miles-per-gallon, or MPG) based on other characteristics of the car. To apply a prescriptive lens to this case, we will instead treat the amount of acceleration as a treatment that can be controlled, and try to find the value that optimizes the MPG for a given car.

*Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process. For a real-world case study using similar techniques, see the grocery pricing case study.*

First we load in the data and drop 6 rows with missing values:

```
df <- na.omit(read.csv("auto-mpg.csv", na.strings = "?"))
```

```
mpg cylinders displacement horsepower weight acceleration model.year origin
1 18 8 307 130 3504 12.0 70 1
2 15 8 350 165 3693 11.5 70 1
3 18 8 318 150 3436 11.0 70 1
4 16 8 304 150 3433 12.0 70 1
5 17 8 302 140 3449 10.5 70 1
6 15 8 429 198 4341 10.0 70 1
car.name
1 chevrolet chevelle malibu
2 buick skylark 320
3 plymouth satellite
4 amc rebel sst
5 ford torino
6 ford galaxie 500
[ reached 'max' / getOption("max.print") -- omitted 386 rows ]
```

Policy trees are trained using a features matrix/dataframe `X`

as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

- in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
- more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

## Reward Estimation

Please refer to the Reward Estimation documentation for a detailed description on how to perform reward estimation for numeric treatments properly. For simplicity and to keep the focus on Optimal Policy Trees, this quick start guide does not cover tuning the reward estimation parameters, but in real problems this tuning is an important step.

First, we split into training and testing:

```
X <- subset(df, select = -c(mpg, acceleration, car.name))
treatments <- df$acceleration
outcomes <- df$mpg
split <- iai::split_data("policy_maximize", X, treatments, outcomes,
seed = 123, train_proportion = 0.5)
train_X <- split$train$X
train_treatments <- split$train$treatments
train_outcomes <- split$train$outcomes
test_X <- split$test$X
test_treatments <- split$test$treatments
test_outcomes <- split$test$outcomes
```

Note that we have used a training/test split of 60%/40%, so that we save more data for testing to ensure high-quality reward estimation on the test set.

The treatment, acceleration, is a numeric value, so we follow the process for estimating rewards with numeric treatments. We will consider prescribing acceleration values from 11 to 20, in steps of 3:

```
treatment_candidates <- seq(11, 20, 3)
```

The outcome is continuous, so we use a `numeric_regression_reward_estimator`

to estimate the MPG under our candidate acceleration values using a random forest model:

```
reward_lnr = iai::numeric_regression_reward_estimator(
propensity_estimator = iai::random_forest_regressor(),
outcome_estimator = iai::random_forest_regressor(),
reward_estimator = "doubly_robust",
propensity_min_value = 0.1,
random_seed = 1,
)
train_rewards <- iai::fit_predict(reward_lnr, train_X, train_treatments,
train_outcomes, treatment_candidates)
train_rewards$rewards
```

```
11 14 17 20
1 15.61183 14.324114 14.98341 15.88650
2 17.89016 16.706663 15.34452 16.37598
3 15.21088 11.313360 11.79762 16.05733
4 13.55233 11.037557 11.79762 16.05733
5 14.02919 11.451840 11.65667 16.34512
6 15.04257 15.661554 14.97618 16.30873
7 15.03297 13.564238 13.49279 16.25742
8 13.27892 15.067589 18.12993 17.50950
9 22.60668 22.551235 21.33069 18.97877
10 22.12423 17.811266 18.46614 20.78981
11 24.25679 23.646085 20.63887 19.28444
12 27.32696 27.999439 32.29332 24.77390
13 23.34823 23.554402 25.16990 25.22115
14 12.82628 -2.076632 13.48320 15.39619
15 14.03892 9.442752 12.84378 16.25809
[ reached 'max' / getOption("max.print") -- omitted 181 rows ]
```

```
train_rewards$score$propensity
```

```
$`17`
[1] 0.1205333
$`20`
[1] 0.200617
$`11`
[1] 0.2871446
$`14`
[1] 0.1653518
```

```
train_rewards$score$outcome
```

```
$`17`
[1] 0.8727428
$`20`
[1] 0.3228448
$`11`
[1] 0.6603072
$`14`
[1] 0.8352894
```

We can see that the R2 of the internal outcome models for each candidate treatment are between 0.32 and 0.87, whereas the propensity models have R2 between 0.12 and 0.29. The non-zero propensity scores tell us that there is a small amount of treatment assignment bias in this data (keeping in mind that we arbitrarily selected a feature to use as the treatment column). It seems we can predict the outcomes reasonably well, so we have a good base for the reward estimates.

## Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes MPG. We will use a `grid_search`

to fit an `optimal_tree_policy_maximizer`

(note that if we were trying to minimize the outcomes, we would use `optimal_tree_policy_minimizer`

):

```
grid <- iai::grid_search(
iai::optimal_tree_policy_maximizer(
random_seed = 1,
minbucket = 15,
),
max_depth = 4:5,
)
iai::fit(grid, train_X, train_rewards$rewards)
iai::get_learner(grid)
```

The resulting tree recommends different accelerations based on the characteristics of the car. The intensity of the color in each leaf shows the difference in quality between the best and second-best acceleration values.

We can see that both extremes of our candidate acceleration values are prescribed by the tree. Older cars with lower horsepower receive the lowest acceleration, whereas the highest acceleration is best for many different groups of cars.

We can make treatment prescriptions using `predict`

:

```
prescriptions <- iai::predict(grid, train_X)
```

The prescriptions are always returned as strings matching the column names of the input rewards matrix. In our case the treatments are numeric values, and if we want them in numeric form to use later we can convert them to numeric treatments using `convert_treatments_to_numeric`

:

```
iai::convert_treatments_to_numeric(prescriptions)
```

```
[1] 20 20 20 20 20 20 20 20 20 20 11 11 20 20 20 11 20 20 11 20 20 20 11 20 11
[26] 11 11 11 11 11 11 11 20 20 20 20 20 20 20 20 11 11 20 20 11 20 20 20 20 20
[51] 20 20 11 11 20 20 11 11 20 20
[ reached getOption("max.print") -- omitted 136 entries ]
```

If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with `predict_treatment_rank`

:

```
rank <- iai::predict_treatment_rank(grid, train_X)
```

```
[,1] [,2] [,3] [,4]
[1,] "20" "11" "17" "14"
[2,] "20" "11" "17" "14"
[3,] "20" "11" "17" "14"
[4,] "20" "11" "17" "14"
[5,] "20" "11" "17" "14"
[6,] "20" "11" "17" "14"
[7,] "20" "11" "17" "14"
[8,] "20" "11" "17" "14"
[9,] "20" "11" "17" "14"
[10,] "20" "11" "17" "14"
[11,] "11" "17" "14" "20"
[12,] "11" "17" "14" "20"
[13,] "20" "11" "17" "14"
[14,] "20" "11" "17" "14"
[15,] "20" "11" "17" "14"
[ reached getOption("max.print") -- omitted 181 rows ]
```

For each point in the data, this gives the treatments in order of effectiveness. As before, this are returned as strings, but we can convert the treatments to numeric values with `convert_treatments_to_numeric`

:

```
iai::convert_treatments_to_numeric(rank)
```

```
[,1] [,2] [,3] [,4]
[1,] 20 11 17 14
[2,] 20 11 17 14
[3,] 20 11 17 14
[4,] 20 11 17 14
[5,] 20 11 17 14
[6,] 20 11 17 14
[7,] 20 11 17 14
[8,] 20 11 17 14
[9,] 20 11 17 14
[10,] 20 11 17 14
[11,] 11 17 14 20
[12,] 11 17 14 20
[13,] 20 11 17 14
[14,] 20 11 17 14
[15,] 20 11 17 14
[ reached getOption("max.print") -- omitted 181 rows ]
```

To quantify the difference in performance behind the treatment rankings, we can use `predict_treatment_outcome`

to extract the estimated quality of each treatment for each point:

```
iai::predict_treatment_outcome(grid, train_X)
```

```
11 14 17 20
1 16.95464 15.73563 16.44407 18.33659
2 16.95464 15.73563 16.44407 18.33659
3 16.95464 15.73563 16.44407 18.33659
4 16.95464 15.73563 16.44407 18.33659
5 16.95464 15.73563 16.44407 18.33659
6 16.95464 15.73563 16.44407 18.33659
7 16.95464 15.73563 16.44407 18.33659
8 16.95464 15.73563 16.44407 18.33659
9 16.95464 15.73563 16.44407 18.33659
10 16.95464 15.73563 16.44407 18.33659
11 27.01673 24.65935 26.18041 23.08957
12 27.01673 24.65935 26.18041 23.08957
13 16.95464 15.73563 16.44407 18.33659
14 16.95464 15.73563 16.44407 18.33659
15 16.95464 15.73563 16.44407 18.33659
[ reached 'max' / getOption("max.print") -- omitted 181 rows ]
```

## Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

```
test_rewards <- iai::fit_predict(reward_lnr, test_X, test_treatments,
test_outcomes, treatment_candidates)
test_rewards$rewards
```

```
11 14 17 20
1 18.87610 17.348683 15.442553 13.02367
2 18.54885 17.177284 16.064991 13.10194
3 16.90637 17.195855 16.408816 13.26537
4 14.70057 12.306050 11.470901 12.95174
5 15.65078 13.336453 11.553937 12.99407
6 15.09902 16.819106 13.569338 12.67351
7 24.33651 23.789861 23.599930 22.83901
8 26.44638 27.176601 25.299428 24.02699
9 23.16950 23.695650 30.648817 23.61563
10 23.68355 21.405816 24.498837 20.54564
11 31.47751 25.039407 22.125471 20.36907
12 18.56323 21.176058 21.268225 21.27784
13 14.18914 4.528854 14.285364 17.85454
14 14.04011 13.358946 -7.535757 -18.70225
15 24.73810 26.717745 27.413140 27.73378
[ reached 'max' / getOption("max.print") -- omitted 181 rows ]
```

```
test_rewards$score$propensity
```

```
$`17`
[1] 0.1580786
$`20`
[1] 0.2672843
$`11`
[1] 0.3930468
$`14`
[1] 0.217413
```

```
test_rewards$score$outcome
```

```
$`17`
[1] 0.7164027
$`20`
[1] 0.7470992
$`11`
[1] 0.5085259
$`14`
[1] 0.8895272
```

We see the internal models of our test reward estimator have scores that are similar to those on the test set. As with the training set, this gives us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation.

We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted MPG under the treatments prescribed by the tree for the test set. To do this, we use `predict_outcomes`

which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:

```
policy_outcomes <- iai::predict_outcomes(grid, test_X, test_rewards$rewards)
```

```
[1] 13.02367 13.10194 13.26537 12.95174 12.99407 12.67351 22.83901
[8] 26.44638 23.16950 23.68355 20.36907 18.56323 17.85454 -18.70225
[15] 24.73810 19.84227 18.41289 18.02786 17.85454 12.66219 12.66219
[22] 12.99407 18.75970 17.48188 26.20989 28.31080 23.51552 25.02196
[29] 25.42858 17.85454 14.89811 12.74661 17.85454 14.02707 15.65969
[36] 15.65969 26.84839 28.07555 27.50332 15.65969 12.88554 14.38841
[43] 14.84884 14.84884 12.88554 15.65969 19.30518 20.20801 14.84884
[50] 14.84884 19.30518 24.62845 27.34884 22.65649 25.94121 20.78518
[57] 15.65969 30.14560 28.72838 22.97014
[ reached getOption("max.print") -- omitted 136 entries ]
```

We can then get the average estimated MPG under our treatments:

```
mean(policy_outcomes)
```

`[1] 23.98876`

We can compare this number to the average estimated MPG under a baseline policy that assigns the same treatment to all observations regardless of their features. Let us compare to a policy that assigns 17 everywhere:

```
mean(test_rewards$rewards[,"17"])
```

`[1] 23.1136`

We can see that the personalization offered by the tree policy indeed improves upon a baseline where each observation receives the same treatment.