# Quick Start Guide: Optimal Policy Trees with Numeric Treatment

*This is an R version of the corresponding OptimalTrees quick start guide.*

In this example we will give a demonstration of how to use Optimal Policy Trees with numeric treatment options. For this example, we will the auto-mpg dataset, where the task is usually to predict a car's fuel efficiency (in miles-per-gallon, or MPG) based on other characteristics of the car. To apply a prescriptive lens to this case, we will instead treat the amount of acceleration as a treatment that can be controlled, and try to find the value that optimizes the MPG for a given car.

*Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process. For a real-world case study using similar techniques, see the grocery pricing case study.*

First we load in the data and drop 6 rows with missing values:

```
df <- na.omit(read.csv("auto-mpg.csv", na.strings = "?"))
```

```
mpg cylinders displacement horsepower weight acceleration model.year origin
1 18 8 307 130 3504 12.0 70 1
2 15 8 350 165 3693 11.5 70 1
3 18 8 318 150 3436 11.0 70 1
4 16 8 304 150 3433 12.0 70 1
5 17 8 302 140 3449 10.5 70 1
6 15 8 429 198 4341 10.0 70 1
car.name
1 chevrolet chevelle malibu
2 buick skylark 320
3 plymouth satellite
4 amc rebel sst
5 ford torino
6 ford galaxie 500
[ reached 'max' / getOption("max.print") -- omitted 386 rows ]
```

Policy trees are trained using a features matrix/dataframe `X`

as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

- in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
- more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

## Reward Estimation

First, we split into training and testing:

```
X <- subset(df, select = -c(mpg, acceleration, car.name))
treatments <- df$acceleration
outcomes <- df$mpg
split <- iai::split_data("policy_maximize", X, treatments, outcomes, seed = 1,
train_proportion = 0.6)
train_X <- split$train$X
train_treatments <- split$train$treatments
train_outcomes <- split$train$outcomes
test_X <- split$test$X
test_treatments <- split$test$treatments
test_outcomes <- split$test$outcomes
```

Note that we have used a training/test split of 60%/40% split, so that we save more data for testing to ensure high-quality reward estimation on the test set.

The treatment, acceleration, is a numeric value, so we follow the process for estimating rewards with numeric treatments. We will consider prescribing acceleration values from 8 to 23, in steps of 3:

```
treatment_candidates <- seq(8, 23, 3)
```

We use a `numeric_reward_estimator`

to estimate the MPG under our candidate acceleration values using an XGBoost model:

```
reward_lnr = iai::numeric_reward_estimator(
outcome_estimator=iai::xgboost_regressor(),
random_seed=12345,
)
train_rewards <- iai::fit_predict(reward_lnr, train_X, train_treatments,
train_outcomes, treatment_candidates)
train_rewards$rewards
```

```
8 11 14 17 20 23
1 17.56489 17.92794 16.55304 16.04283 16.17336 16.25040
2 16.23667 15.75760 14.81918 14.83330 14.45647 14.45647
3 18.74362 19.10667 17.38590 16.65426 16.79978 16.87683
4 14.88914 13.05002 12.35598 12.30157 12.32316 12.30774
5 14.40610 13.21458 12.90172 12.69101 12.92694 12.93947
6 14.88914 13.05002 12.35598 12.30157 12.32316 12.30774
7 13.13287 11.94133 11.65955 11.43930 11.67522 11.68775
8 14.89688 13.13301 12.71378 12.52015 12.62485 12.58135
9 15.66143 15.18316 13.93992 13.97646 14.06758 14.02408
10 16.85309 15.78537 15.54899 14.56264 14.69318 14.77023
[ reached 'max' / getOption("max.print") -- omitted 225 rows ]
```

```
train_rewards$score
```

`[1] 0.8657202`

We can see that the R2 of the internal regression model is 0.87, which gives us good confidence that the reward estimates are high quality, and good to base our training on.

## Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes MPG. We will use a `grid_search`

to fit an `optimal_tree_policy_maximizer`

(note that if we were trying to minimize the outcomes, we would use `optimal_tree_policy_minimizer`

):

```
grid <- iai::grid_search(
iai::optimal_tree_policy_maximizer(
random_seed = 1,
),
max_depth = 4:5,
)
iai::fit(grid, train_X, train_rewards$rewards)
iai::get_learner(grid)
```

The resulting tree recommends different accelerations based on the characteristics of the car. The intensity of the color in each leaf shows the difference in quality between the best and second-best acceleration values.

We can see that both extremes of our candidate acceleration values are prescribed by the tree. The cars with the greatest displacement received the lowest acceleration in order to maximise the MPG. However, for lower-powered, recent, heavy cars, the highest acceleration value is the best option.

We can make treatment prescriptions using `predict`

:

```
prescriptions <- iai::predict(grid, train_X)
```

The prescriptions are always returned as strings matching the column names of the input rewards matrix. In our case the treatments are numeric values, and if we want them in numeric form to use later we can convert them to numeric treatments using `convert_treatments_to_numeric`

:

```
iai::convert_treatments_to_numeric(prescriptions)
```

```
[1] 8 8 8 8 8 8 8 8 8 8 8 17 8 8 8 8 8 8 17 8 8 8 8 8 8
[26] 8 8 8 8 17 17 17 17 8 17 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
[51] 8 17 8 8 8 8 8 8 8 8
[ reached getOption("max.print") -- omitted 175 entries ]
```

If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with `predict_treatment_rank`

:

```
rank <- iai::predict_treatment_rank(grid, train_X)
```

```
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "8" "11" "14" "20" "23" "17"
[2,] "8" "11" "14" "20" "23" "17"
[3,] "8" "11" "14" "20" "23" "17"
[4,] "8" "11" "14" "20" "23" "17"
[5,] "8" "11" "14" "20" "23" "17"
[6,] "8" "11" "14" "20" "23" "17"
[7,] "8" "11" "14" "20" "23" "17"
[8,] "8" "11" "14" "20" "23" "17"
[9,] "8" "11" "14" "20" "23" "17"
[10,] "8" "11" "14" "20" "23" "17"
[ reached getOption("max.print") -- omitted 225 rows ]
```

For each point in the data, this gives the treatments in order of effectiveness. As before, this are returned as strings, but we can convert the treatments to numeric values with `convert_treatments_to_numeric`

:

```
iai::convert_treatments_to_numeric(rank)
```

```
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 8 11 14 20 23 17
[2,] 8 11 14 20 23 17
[3,] 8 11 14 20 23 17
[4,] 8 11 14 20 23 17
[5,] 8 11 14 20 23 17
[6,] 8 11 14 20 23 17
[7,] 8 11 14 20 23 17
[8,] 8 11 14 20 23 17
[9,] 8 11 14 20 23 17
[10,] 8 11 14 20 23 17
[ reached getOption("max.print") -- omitted 225 rows ]
```

To quantify the difference in performance behind the treatment rankings, we can use `predict_treatment_outcome`

to extract the estimated quality of each treatment for each point:

```
iai::predict_treatment_outcome(grid, train_X)
```

```
8 11 14 17 20 23
1 15.24322 14.64756 14.17802 13.90312 13.93954 13.93926
2 15.24322 14.64756 14.17802 13.90312 13.93954 13.93926
3 15.24322 14.64756 14.17802 13.90312 13.93954 13.93926
4 15.24322 14.64756 14.17802 13.90312 13.93954 13.93926
5 15.24322 14.64756 14.17802 13.90312 13.93954 13.93926
6 15.24322 14.64756 14.17802 13.90312 13.93954 13.93926
7 15.24322 14.64756 14.17802 13.90312 13.93954 13.93926
8 15.24322 14.64756 14.17802 13.90312 13.93954 13.93926
9 15.24322 14.64756 14.17802 13.90312 13.93954 13.93926
10 15.24322 14.64756 14.17802 13.90312 13.93954 13.93926
[ reached 'max' / getOption("max.print") -- omitted 225 rows ]
```

## Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

```
test_rewards <- iai::fit_predict(reward_lnr, test_X, test_treatments,
test_outcomes, treatment_candidates)
test_rewards$rewards
```

```
8 11 14 17 20 23
1 14.61293 13.86572 13.04403 13.79687 13.46625 13.46634
2 15.33540 15.63133 14.75164 15.33328 13.88391 13.93021
3 14.32054 14.93500 14.29974 14.86184 13.26406 13.99006
4 14.64966 13.90246 13.04403 13.79687 13.46625 13.46634
5 23.57063 23.85516 23.68259 24.55068 23.70029 23.89794
6 22.38388 22.87511 21.77809 21.76988 20.27992 20.33301
7 21.59600 21.59600 21.04422 21.93110 21.63366 21.71104
8 32.79764 32.98054 32.35630 30.44030 28.84416 29.23769
9 24.09035 24.09035 23.66892 24.04285 23.19636 28.68661
10 14.06391 14.63017 13.67169 13.73958 12.26338 12.98938
[ reached 'max' / getOption("max.print") -- omitted 147 rows ]
```

```
test_rewards$score
```

`[1] 0.7929507`

We see the internal model of our test reward estimator has an R2 of 0.79. As with the training set, this gives us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation.

We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted MPG under the treatments prescribed by the tree for the test set. To do this, we use `predict_outcomes`

which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:

```
policy_outcomes <- iai::predict_outcomes(grid, test_X, test_rewards$rewards)
```

```
[1] 14.61293 15.33540 14.32054 14.64966 23.57063 22.38388 21.59600 30.44030
[9] 24.09035 14.06391 14.55532 27.24055 24.37529 19.57981 17.29555 14.12698
[17] 11.37780 27.15902 17.24721 27.36848 28.05967 30.71244 26.85830 23.88128
[25] 14.78551 15.03412 23.73075 20.64142 21.65401 23.17074 22.03889 28.26242
[33] 15.15779 15.08659 13.59195 20.07967 12.97886 13.55759 20.04034 24.24864
[41] 26.06670 12.94424 13.77090 31.67544 18.24339 18.89537 18.53356 28.06723
[49] 29.41886 16.81780 14.91063 14.53209 31.02500 22.91093 29.22216 20.25741
[57] 16.55737 21.86179 14.28482 19.90385
[ reached getOption("max.print") -- omitted 97 entries ]
```

We can then get the average estimated MPG under our treatments:

```
mean(policy_outcomes)
```

`[1] 25.18668`

We can compare this number to the average MPG observed:

```
mean(test_outcomes)
```

`[1] 24.68726`

We see a small improvement over the real accelerations, although it is important to point out that the real accelerations were not restricted to the same discrete increments that we used.

Another point of comparison is a baseline policy that assigns the treatment that is best on average across all training points, which we can see from the root node of the tree is an acceleration of 14:

```
mean(test_rewards$rewards[,"14"])
```

`[1] 24.68035`

We can see that the personalization offered by the tree policy indeed improves upon a baseline where each observation receives the same treatment.