# Quick Start Guide: Optimal Policy Trees with Numeric Treatment

*This is a Python version of the corresponding OptimalTrees quick start guide.*

In this example we will give a demonstration of how to use Optimal Policy Trees with numeric treatment options. For this example, we will the auto-mpg dataset, where the task is usually to predict a car's fuel efficiency (in miles-per-gallon, or MPG) based on other characteristics of the car. To apply a prescriptive lens to this case, we will instead treat the amount of acceleration as a treatment that can be controlled, and try to find the value that optimizes the MPG for a given car.

*Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process. For a real-world case study using similar techniques, see the grocery pricing case study.*

First we load in the data and drop 6 rows with missing values:

```
import pandas as pd
df = pd.read_csv("auto-mpg.csv", na_values='?').dropna()
```

```
mpg cylinders ... origin car name
0 18.0 8 ... 1 chevrolet chevelle malibu
1 15.0 8 ... 1 buick skylark 320
2 18.0 8 ... 1 plymouth satellite
3 16.0 8 ... 1 amc rebel sst
4 17.0 8 ... 1 ford torino
5 15.0 8 ... 1 ford galaxie 500
6 14.0 8 ... 1 chevrolet impala
.. ... ... ... ... ...
391 36.0 4 ... 1 dodge charger 2.2
392 27.0 4 ... 1 chevrolet camaro
393 27.0 4 ... 1 ford mustang gl
394 44.0 4 ... 2 vw pickup
395 32.0 4 ... 1 dodge rampage
396 28.0 4 ... 1 ford ranger
397 31.0 4 ... 1 chevy s-10
[392 rows x 9 columns]
```

Policy trees are trained using a features matrix/dataframe `X`

as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

- in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
- more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

## Reward Estimation

Please refer to the Reward Estimation documentation for a detailed description on how to perform reward estimation for numeric treatments properly. For simplicity and to keep the focus on Optimal Policy Trees, this quick start guide does not cover tuning the reward estimation parameters, but in real problems this tuning is an important step.

First, we split into training and testing:

```
from interpretableai import iai
X = df.drop(columns=['mpg', 'acceleration', 'car name'])
treatments = df.acceleration
outcomes = df.mpg
(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) = (
iai.split_data('policy_maximize', X, treatments, outcomes, seed=123, train_proportion=0.5))
```

Note that we have used a training/test split of 60%/40%, so that we save more data for testing to ensure high-quality reward estimation on the test set.

The treatment, acceleration, is a numeric value, so we follow the process for estimating rewards with numeric treatments. We will consider prescribing acceleration values from 11 to 20, in steps of 3:

```
treatment_candidates = range(11, 23, 3)
```

The outcome is continuous, so we use a `NumericRegressionRewardEstimator`

to estimate the MPG under our candidate acceleration values using a Random Forest model:

```
reward_lnr = iai.NumericRegressionRewardEstimator(
propensity_estimator=iai.RandomForestRegressor(),
outcome_estimator=iai.RandomForestRegressor(),
reward_estimator='doubly_robust',
propensity_min_value=0.1,
random_seed=1,
)
train_rewards, train_reward_score = reward_lnr.fit_predict(
train_X, train_treatments, train_outcomes, treatment_candidates)
train_rewards
```

```
11 14 17 20
0 15.611833 14.324114 14.983413 15.886502
1 17.890162 16.706663 15.344517 16.375983
2 15.210879 11.313360 11.797619 16.057330
3 13.552331 11.037557 11.797619 16.057330
4 14.029191 11.451840 11.656672 16.345123
5 15.042570 15.661554 14.976184 16.308735
6 15.032972 13.564238 13.492789 16.257415
.. ... ... ... ...
189 25.093060 23.267907 21.856997 27.892171
190 32.060958 36.015403 34.401213 31.461481
191 36.347589 34.784259 42.763620 29.942321
192 31.991201 39.368938 32.199807 31.154686
193 35.653778 40.188168 34.913544 32.559909
194 19.637174 19.058223 35.736920 20.383175
195 28.376186 26.442281 26.603851 38.980019
[196 rows x 4 columns]
```

```
train_reward_score['propensity']
```

`{'17': 0.120533, '20': 0.200617, '11': 0.287145, '14': 0.165352}`

```
train_reward_score['outcome']
```

`{'17': 0.872743, '20': 0.322845, '11': 0.660307, '14': 0.835289}`

We can see that the R2 of the internal outcome models for each candidate treatment are between 0.32 and 0.87, whereas the propensity models have R2 between 0.12 and 0.29. The non-zero propensity scores tell us that there is a small amount of treatment assignment bias in this data (keeping in mind that we arbitrarily selected a feature to use as the treatment column). It seems we can predict the outcomes reasonably well, so we have a good base for the reward estimates.

## Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes MPG. We will use a `GridSearch`

to fit an `OptimalTreePolicyMaximizer`

(note that if we were trying to minimize the outcomes, we would use `OptimalTreePolicyMinimizer`

):

```
grid = iai.GridSearch(
iai.OptimalTreePolicyMaximizer(
random_seed=1,
minbucket=15,
),
max_depth=range(4, 6),
)
grid.fit(train_X, train_rewards)
grid.get_learner()
```

The resulting tree recommends different accelerations based on the characteristics of the car. The intensity of the color in each leaf shows the difference in quality between the best and second-best acceleration values.

We can see that both extremes of our candidate acceleration values are prescribed by the tree. Older cars with lower horsepower receive the lowest acceleration, whereas the highest acceleration is best for many different groups of cars.

We can make treatment prescriptions using `predict`

:

```
prescriptions = grid.predict(train_X)
```

The prescriptions are always returned as strings matching the column names of the input rewards matrix. In our case the treatments are numeric values, and if we want them in numeric form to use later we can convert them to numeric treatments using `convert_treatments_to_numeric`

:

```
iai.convert_treatments_to_numeric(prescriptions)
```

`array([20, 20, 20, ..., 14, 20, 20], dtype=int64)`

If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with `predict_treatment_rank`

:

```
rank = grid.predict_treatment_rank(train_X)
```

```
array([['20', '11', '17', '14'],
['20', '11', '17', '14'],
['20', '11', '17', '14'],
...,
['14', '17', '11', '20'],
['20', '11', '17', '14'],
['20', '11', '17', '14']], dtype='<U2')
```

For each point in the data, this gives the treatments in order of effectiveness. As before, this are returned as strings, but we can convert the treatments to numeric values with `convert_treatments_to_numeric`

:

```
iai.convert_treatments_to_numeric(rank)
```

```
array([[20, 11, 17, 14],
[20, 11, 17, 14],
[20, 11, 17, 14],
...,
[14, 17, 11, 20],
[20, 11, 17, 14],
[20, 11, 17, 14]], dtype=int64)
```

To quantify the difference in performance behind the treatment rankings, we can use `predict_treatment_outcome`

to extract the estimated quality of each treatment for each point:

```
grid.predict_treatment_outcome(train_X)
```

```
11 14 17 20
0 16.954638 15.735627 16.444070 18.336588
1 16.954638 15.735627 16.444070 18.336588
2 16.954638 15.735627 16.444070 18.336588
3 16.954638 15.735627 16.444070 18.336588
4 16.954638 15.735627 16.444070 18.336588
5 16.954638 15.735627 16.444070 18.336588
6 16.954638 15.735627 16.444070 18.336588
.. ... ... ... ...
189 19.425510 18.665101 19.295503 21.534733
190 29.857814 32.560975 30.107723 28.930012
191 29.857814 32.560975 30.107723 28.930012
192 29.857814 32.560975 30.107723 28.930012
193 29.857814 32.560975 30.107723 28.930012
194 19.425510 18.665101 19.295503 21.534733
195 19.425510 18.665101 19.295503 21.534733
[196 rows x 4 columns]
```

## Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

```
test_rewards, test_reward_score = reward_lnr.fit_predict(
test_X, test_treatments, test_outcomes, treatment_candidates)
test_rewards
```

```
11 14 17 20
0 18.876095 17.348683 15.442553 13.023669
1 18.548851 17.177284 16.064991 13.101937
2 16.906367 17.195855 16.408816 13.265368
3 14.700567 12.306050 11.470901 12.951745
4 15.650784 13.336453 11.553937 12.994065
5 15.099024 16.819106 13.569338 12.673506
6 24.336513 23.789861 23.599930 22.839013
.. ... ... ... ...
189 29.735471 38.577852 28.406107 27.474362
190 28.335337 46.667028 32.265685 28.382276
191 27.269966 26.920733 26.417994 26.560668
192 27.855778 23.593299 26.959093 27.390048
193 33.552941 34.792867 35.261248 30.726645
194 63.929968 32.100712 32.265685 28.382276
195 32.305441 31.562938 25.050652 27.682774
[196 rows x 4 columns]
```

```
test_reward_score['propensity']
```

`{'17': 0.158079, '20': 0.267284, '11': 0.393047, '14': 0.217413}`

```
test_reward_score['outcome']
```

`{'17': 0.716403, '20': 0.747099, '11': 0.508526, '14': 0.889527}`

We see the internal models of our test reward estimator have scores that are similar to those on the test set. As with the training set, this gives us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation.

We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted MPG under the treatments prescribed by the tree for the test set. To do this, we use `predict_outcomes`

which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:

```
policy_outcomes = grid.predict_outcomes(test_X, test_rewards)
```

```
array([13.02366889, 13.1019369 , 13.26536812, ..., 34.79286695,
32.10071157, 31.56293754])
```

We can then get the average estimated MPG under our treatments:

```
policy_outcomes.mean()
```

`23.98875902`

We can compare this number to the average estimated MPG under a baseline policy that assigns the same treatment to all observations regardless of their features. Let us compare to a policy that assigns 17 everywhere:

```
test_rewards['17'].mean()
```

`23.11359676`

We can see that the personalization offered by the tree policy indeed improves upon a baseline where each observation receives the same treatment.