# Quick Start Guide: Optimal Policy Trees with Numeric Treatment

In this guide we will give a demonstration of how to use Optimal Policy Trees with numeric treatment options. For this example, we will use the auto-mpg dataset, where the task is usually to predict a car's fuel efficiency (in miles-per-gallon, or MPG) based on other characteristics of the car. To apply a prescriptive lens to this case, we will instead treat the amount of acceleration as a treatment that can be controlled, and try to find the value that optimizes the MPG for a given car.

*Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process. For a real-world case study using similar techniques, see the grocery pricing case study.*

First we load in the data and drop 6 rows with missing values:

```
using CSV, DataFrames
df = CSV.read("auto-mpg.csv", DataFrame, missingstring="?")
dropmissing!(df)
```

```
392×9 DataFrame
Row │ mpg cylinders displacement horsepower weight acceleration mod ⋯
│ Float64 Int64 Float64 Int64 Int64 Float64 Int ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ 18.0 8 307.0 130 3504 12.0 ⋯
2 │ 15.0 8 350.0 165 3693 11.5
3 │ 18.0 8 318.0 150 3436 11.0
4 │ 16.0 8 304.0 150 3433 12.0
5 │ 17.0 8 302.0 140 3449 10.5 ⋯
6 │ 15.0 8 429.0 198 4341 10.0
7 │ 14.0 8 454.0 220 4354 9.0
8 │ 14.0 8 440.0 215 4312 8.5
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
386 │ 36.0 4 135.0 84 2370 13.0 ⋯
387 │ 27.0 4 151.0 90 2950 17.3
388 │ 27.0 4 140.0 86 2790 15.6
389 │ 44.0 4 97.0 52 2130 24.6
390 │ 32.0 4 135.0 84 2295 11.6 ⋯
391 │ 28.0 4 120.0 79 2625 18.6
392 │ 31.0 4 119.0 82 2720 19.4
3 columns and 377 rows omitted
```

Policy trees are trained using a features matrix/dataframe `X`

as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

- in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
- more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

## Reward Estimation

Please refer to the Reward Estimation documentation for a detailed description on how to perform reward estimation for numeric treatments properly. For simplicity and to keep the focus on Optimal Policy Trees, this quick start guide does not cover tuning the reward estimation parameters, but in real problems this tuning is an important step.

First, we split into training and testing:

```
X = select(df, Not(["mpg", "acceleration", "car name"]))
treatments = df.acceleration
outcomes = df.mpg
(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) =
IAI.split_data(:policy_maximize, X, treatments, outcomes, seed=123, train_proportion=0.5)
```

Note that we have used a training/test split of 60%/40%, so that we save more data for testing to ensure high-quality reward estimation on the test set.

The treatment, acceleration, is a numeric value, so we follow the process for estimating rewards with numeric treatments. We will consider prescribing acceleration values from 11 to 20, in steps of 3:

`treatment_candidates = 11:3:20`

The outcome is continuous, so we use a `NumericRegressionRewardEstimator`

to estimate the MPG under our candidate acceleration values using a Random Forest model:

```
reward_lnr = IAI.NumericRegressionRewardEstimator(
propensity_estimator=IAI.RandomForestRegressor(),
outcome_estimator=IAI.RandomForestRegressor(),
reward_estimator=:doubly_robust,
propensity_min_value=0.1,
random_seed=1,
)
train_rewards, train_reward_score = IAI.fit_predict!(
reward_lnr, train_X, train_treatments, train_outcomes, treatment_candidates)
train_rewards
```

```
196×4 DataFrame
Row │ 11 14 17 20
│ Float64 Float64 Float64 Float64
─────┼────────────────────────────────────
1 │ 15.6118 14.3241 14.9834 15.8865
2 │ 17.8902 16.7067 15.3445 16.376
3 │ 15.2109 11.3134 11.7976 16.0573
4 │ 13.5523 11.0376 11.7976 16.0573
5 │ 14.0292 11.4518 11.6567 16.3451
6 │ 15.0426 15.6616 14.9762 16.3087
7 │ 15.033 13.5642 13.4928 16.2574
8 │ 13.2789 15.0676 18.1299 17.5095
⋮ │ ⋮ ⋮ ⋮ ⋮
190 │ 25.0931 23.2679 21.857 27.8922
191 │ 32.061 36.0154 34.4012 31.4615
192 │ 36.3476 34.7843 42.7636 29.9423
193 │ 31.9912 39.3689 32.1998 31.1547
194 │ 35.6538 40.1882 34.9135 32.5599
195 │ 19.6372 19.0582 35.7369 20.3832
196 │ 28.3762 26.4423 26.6039 38.98
181 rows omitted
```

`train_reward_score[:propensity]`

```
Dict{String, Float64} with 4 entries:
"17" => 0.120533
"20" => 0.200617
"11" => 0.287145
"14" => 0.165352
```

`train_reward_score[:outcome]`

```
Dict{String, Float64} with 4 entries:
"17" => 0.872743
"20" => 0.322845
"11" => 0.660307
"14" => 0.835289
```

We can see that the R2 of the internal outcome models for each candidate treatment are between 0.32 and 0.87, whereas the propensity models have R2 between 0.12 and 0.29. The non-zero propensity scores tell us that there is a small amount of treatment assignment bias in this data (keeping in mind that we arbitrarily selected a feature to use as the treatment column). It seems we can predict the outcomes reasonably well, so we have a good base for the reward estimates.

## Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes MPG. We will use a `GridSearch`

to fit an `OptimalTreePolicyMaximizer`

(note that if we were trying to minimize the outcomes, we would use `OptimalTreePolicyMinimizer`

):

```
grid = IAI.GridSearch(
IAI.OptimalTreePolicyMaximizer(
random_seed=1,
minbucket=15,
),
max_depth=4:5,
)
IAI.fit!(grid, train_X, train_rewards)
```

```
Fitted OptimalTreePolicyMaximizer:
1) Split: model year < 75.5
2) Split: horsepower < 91
3) Prescribe: 11, 34 points, error 41.72
4) Prescribe: 20, 63 points, error 50.4
5) Split: weight < 2710
6) Split: displacement < 90.5
7) Prescribe: 20, 15 points, error 31.02
8) Prescribe: 14, 41 points, error 36.18
9) Prescribe: 20, 43 points, error 47.2
```

The resulting tree recommends different accelerations based on the characteristics of the car. The intensity of the color in each leaf shows the difference in quality between the best and second-best acceleration values.

We can see that both extremes of our candidate acceleration values are prescribed by the tree. Older cars with lower horsepower receive the lowest acceleration, whereas the highest acceleration is best for many different groups of cars.

We can make treatment prescriptions using `predict`

:

`prescriptions = IAI.predict(grid, train_X)`

```
196-element Vector{String}:
"20"
"20"
"20"
"20"
"20"
"20"
"20"
"20"
"20"
"20"
⋮
"14"
"14"
"20"
"14"
"14"
"14"
"14"
"20"
"20"
```

The prescriptions are always returned as strings matching the column names of the input rewards matrix. In our case the treatments are numeric values, and if we want them in numeric form to use later we can convert them to numeric treatments using `convert_treatments_to_numeric`

:

`IAI.convert_treatments_to_numeric(prescriptions)`

```
196-element Vector{Int64}:
20
20
20
20
20
20
20
20
20
20
⋮
14
14
20
14
14
14
14
20
20
```

If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with `predict_treatment_rank`

:

`rank = IAI.predict_treatment_rank(grid, train_X)`

```
196×4 Matrix{String}:
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
"20" "11" "17" "14"
⋮
"14" "17" "11" "20"
"14" "17" "11" "20"
"20" "11" "17" "14"
"14" "17" "11" "20"
"14" "17" "11" "20"
"14" "17" "11" "20"
"14" "17" "11" "20"
"20" "11" "17" "14"
"20" "11" "17" "14"
```

For each point in the data, this gives the treatments in order of effectiveness. As before, this are returned as strings, but we can convert the treatments to numeric values with `convert_treatments_to_numeric`

:

`IAI.convert_treatments_to_numeric(rank)`

```
196×4 Matrix{Int64}:
20 11 17 14
20 11 17 14
20 11 17 14
20 11 17 14
20 11 17 14
20 11 17 14
20 11 17 14
20 11 17 14
20 11 17 14
20 11 17 14
⋮
14 17 11 20
14 17 11 20
20 11 17 14
14 17 11 20
14 17 11 20
14 17 11 20
14 17 11 20
20 11 17 14
20 11 17 14
```

To quantify the difference in performance behind the treatment rankings, we can use `predict_treatment_outcome`

to extract the estimated quality of each treatment for each point:

`IAI.predict_treatment_outcome(grid, train_X)`

```
196×4 DataFrame
Row │ 11 14 17 20
│ Float64 Float64 Float64 Float64
─────┼────────────────────────────────────
1 │ 16.9546 15.7356 16.4441 18.3366
2 │ 16.9546 15.7356 16.4441 18.3366
3 │ 16.9546 15.7356 16.4441 18.3366
4 │ 16.9546 15.7356 16.4441 18.3366
5 │ 16.9546 15.7356 16.4441 18.3366
6 │ 16.9546 15.7356 16.4441 18.3366
7 │ 16.9546 15.7356 16.4441 18.3366
8 │ 16.9546 15.7356 16.4441 18.3366
⋮ │ ⋮ ⋮ ⋮ ⋮
190 │ 19.4255 18.6651 19.2955 21.5347
191 │ 29.8578 32.561 30.1077 28.93
192 │ 29.8578 32.561 30.1077 28.93
193 │ 29.8578 32.561 30.1077 28.93
194 │ 29.8578 32.561 30.1077 28.93
195 │ 19.4255 18.6651 19.2955 21.5347
196 │ 19.4255 18.6651 19.2955 21.5347
181 rows omitted
```

## Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

```
test_rewards, test_reward_score = IAI.fit_predict!(
reward_lnr, test_X, test_treatments, test_outcomes, treatment_candidates)
test_rewards
```

```
196×4 DataFrame
Row │ 11 14 17 20
│ Float64 Float64 Float64 Float64
─────┼────────────────────────────────────
1 │ 18.8761 17.3487 15.4426 13.0237
2 │ 18.5489 17.1773 16.065 13.1019
3 │ 16.9064 17.1959 16.4088 13.2654
4 │ 14.7006 12.306 11.4709 12.9517
5 │ 15.6508 13.3365 11.5539 12.9941
6 │ 15.099 16.8191 13.5693 12.6735
7 │ 24.3365 23.7899 23.5999 22.839
8 │ 26.4464 27.1766 25.2994 24.027
⋮ │ ⋮ ⋮ ⋮ ⋮
190 │ 29.7355 38.5779 28.4061 27.4744
191 │ 28.3353 46.667 32.2657 28.3823
192 │ 27.27 26.9207 26.418 26.5607
193 │ 27.8558 23.5933 26.9591 27.39
194 │ 33.5529 34.7929 35.2612 30.7266
195 │ 63.93 32.1007 32.2657 28.3823
196 │ 32.3054 31.5629 25.0507 27.6828
181 rows omitted
```

`test_reward_score[:propensity]`

```
Dict{String, Float64} with 4 entries:
"17" => 0.158079
"20" => 0.267284
"11" => 0.393047
"14" => 0.217413
```

`test_reward_score[:outcome]`

```
Dict{String, Float64} with 4 entries:
"17" => 0.716403
"20" => 0.747099
"11" => 0.508526
"14" => 0.889527
```

We see the internal models of our test reward estimator have scores that are similar to those on the test set. As with the training set, this gives us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation.

We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted MPG under the treatments prescribed by the tree for the test set. To do this, we use `predict_outcomes`

which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:

`policy_outcomes = IAI.predict_outcomes(grid, test_X, test_rewards)`

```
196-element Vector{Float64}:
13.02366889
13.1019369
13.26536812
12.9517449
12.99406544
12.67350622
22.83901263
26.44638436
23.16949622
23.68354799
⋮
20.03357808
26.42930003
38.57785239
46.66702847
26.56066784
27.39004761
34.79286695
32.10071157
31.56293754
```

We can then get the average estimated MPG under our treatments:

```
using Statistics
mean(policy_outcomes)
```

`23.98875902`

We can compare this number to the average estimated MPG under a baseline policy that assigns the same treatment to all observations regardless of their features. Let us compare to a policy that assigns 17 everywhere:

`mean(test_rewards[:, "17"])`

`23.11359676`

We can see that the personalization offered by the tree policy indeed improves upon a baseline where each observation receives the same treatment.