Quick Start Guide: Optimal Policy Trees with Numeric Treatment

This is a Python version of the corresponding OptimalTrees quick start guide.

In this example we will give a demonstration of how to use Optimal Policy Trees with numeric treatment options. For this example, we will the auto-mpg dataset, where the task is usually to predict a car's fuel efficiency (in miles-per-gallon, or MPG) based on other characteristics of the car. To apply a prescriptive lens to this case, we will instead treat the amount of acceleration as a treatment that can be controlled, and try to find the value that optimizes the MPG for a given car.

Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process. For a real-world case study using similar techniques, see the grocery pricing case study.

First we load in the data and drop 6 rows with missing values:

import pandas as pd
df = pd.read_csv("auto-mpg.csv", na_values='?').dropna()
      mpg  cylinders  ...  origin                   car name
0    18.0          8  ...       1  chevrolet chevelle malibu
1    15.0          8  ...       1          buick skylark 320
2    18.0          8  ...       1         plymouth satellite
3    16.0          8  ...       1              amc rebel sst
4    17.0          8  ...       1                ford torino
5    15.0          8  ...       1           ford galaxie 500
6    14.0          8  ...       1           chevrolet impala
..    ...        ...  ...     ...                        ...
391  36.0          4  ...       1          dodge charger 2.2
392  27.0          4  ...       1           chevrolet camaro
393  27.0          4  ...       1            ford mustang gl
394  44.0          4  ...       2                  vw pickup
395  32.0          4  ...       1              dodge rampage
396  28.0          4  ...       1                ford ranger
397  31.0          4  ...       1                 chevy s-10

[392 rows x 9 columns]

Policy trees are trained using a features matrix/dataframe X as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

  • in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
  • more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

Reward Estimation

Warning

Please refer to the Reward Estimation documentation for a detailed description on how to perform reward estimation for numeric treatments properly. For simplicity and to keep the focus on Optimal Policy Trees, this quick start guide does not cover tuning the reward estimation parameters, but in real problems this tuning is an important step.

First, we split into training and testing:

from interpretableai import iai

X = df.drop(columns=['mpg', 'acceleration', 'car name'])
treatments = df.acceleration
outcomes = df.mpg

(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) = (
    iai.split_data('policy_maximize', X, treatments, outcomes, seed=123, train_proportion=0.5))

Note that we have used a training/test split of 60%/40%, so that we save more data for testing to ensure high-quality reward estimation on the test set.

The treatment, acceleration, is a numeric value, so we follow the process for estimating rewards with numeric treatments. We will consider prescribing acceleration values from 11 to 20, in steps of 3:

treatment_candidates = range(11, 23, 3)

The outcome is continuous, so we use a NumericRegressionRewardEstimator to estimate the MPG under our candidate acceleration values using a Random Forest model:

reward_lnr = iai.NumericRegressionRewardEstimator(
    propensity_estimator=iai.RandomForestRegressor(),
    outcome_estimator=iai.RandomForestRegressor(),
    reward_estimator='doubly_robust',
    propensity_min_value=0.1,
    random_seed=1,
)
train_rewards, train_reward_score = reward_lnr.fit_predict(
    train_X, train_treatments, train_outcomes, treatment_candidates)
train_rewards
            11         14         17         20
0    15.611833  14.324114  14.983413  15.886502
1    17.890162  16.706663  15.344517  16.375983
2    15.210879  11.313360  11.797619  16.057330
3    13.552331  11.037557  11.797619  16.057330
4    14.029191  11.451840  11.656672  16.345123
5    15.042570  15.661554  14.976184  16.308735
6    15.032972  13.564238  13.492789  16.257415
..         ...        ...        ...        ...
189  25.093060  23.267907  21.856997  27.892171
190  32.060958  36.015403  34.401213  31.461481
191  36.347589  34.784259  42.763620  29.942321
192  31.991201  39.368938  32.199807  31.154686
193  35.653778  40.188168  34.913544  32.559909
194  19.637174  19.058223  35.736920  20.383175
195  28.376186  26.442281  26.603851  38.980019

[196 rows x 4 columns]
train_reward_score['propensity']
{'17': 0.120533, '20': 0.200617, '11': 0.287145, '14': 0.165352}
train_reward_score['outcome']
{'17': 0.872743, '20': 0.322845, '11': 0.660307, '14': 0.835289}

We can see that the R2 of the internal outcome models for each candidate treatment are between 0.32 and 0.87, whereas the propensity models have R2 between 0.12 and 0.29. The non-zero propensity scores tell us that there is a small amount of treatment assignment bias in this data (keeping in mind that we arbitrarily selected a feature to use as the treatment column). It seems we can predict the outcomes reasonably well, so we have a good base for the reward estimates.

Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes MPG. We will use a GridSearch to fit an OptimalTreePolicyMaximizer (note that if we were trying to minimize the outcomes, we would use OptimalTreePolicyMinimizer):

grid = iai.GridSearch(
    iai.OptimalTreePolicyMaximizer(
        random_seed=1,
        minbucket=15,
    ),
    max_depth=range(4, 6),
)
grid.fit(train_X, train_rewards)
grid.get_learner()
Optimal Trees Visualization

The resulting tree recommends different accelerations based on the characteristics of the car. The intensity of the color in each leaf shows the difference in quality between the best and second-best acceleration values.

We can see that both extremes of our candidate acceleration values are prescribed by the tree. Older cars with lower horsepower receive the lowest acceleration, whereas the highest acceleration is best for many different groups of cars.

We can make treatment prescriptions using predict:

prescriptions = grid.predict(train_X)

The prescriptions are always returned as strings matching the column names of the input rewards matrix. In our case the treatments are numeric values, and if we want them in numeric form to use later we can convert them to numeric treatments using convert_treatments_to_numeric:

iai.convert_treatments_to_numeric(prescriptions)
array([20, 20, 20, ..., 14, 20, 20], dtype=int64)

If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with predict_treatment_rank:

rank = grid.predict_treatment_rank(train_X)
array([['20', '11', '17', '14'],
       ['20', '11', '17', '14'],
       ['20', '11', '17', '14'],
       ...,
       ['14', '17', '11', '20'],
       ['20', '11', '17', '14'],
       ['20', '11', '17', '14']], dtype='<U2')

For each point in the data, this gives the treatments in order of effectiveness. As before, this are returned as strings, but we can convert the treatments to numeric values with convert_treatments_to_numeric:

iai.convert_treatments_to_numeric(rank)
array([[20, 11, 17, 14],
       [20, 11, 17, 14],
       [20, 11, 17, 14],
       ...,
       [14, 17, 11, 20],
       [20, 11, 17, 14],
       [20, 11, 17, 14]], dtype=int64)

To quantify the difference in performance behind the treatment rankings, we can use predict_treatment_outcome to extract the estimated quality of each treatment for each point:

grid.predict_treatment_outcome(train_X)
            11         14         17         20
0    16.954638  15.735627  16.444070  18.336588
1    16.954638  15.735627  16.444070  18.336588
2    16.954638  15.735627  16.444070  18.336588
3    16.954638  15.735627  16.444070  18.336588
4    16.954638  15.735627  16.444070  18.336588
5    16.954638  15.735627  16.444070  18.336588
6    16.954638  15.735627  16.444070  18.336588
..         ...        ...        ...        ...
189  19.425510  18.665101  19.295503  21.534733
190  29.857814  32.560975  30.107723  28.930012
191  29.857814  32.560975  30.107723  28.930012
192  29.857814  32.560975  30.107723  28.930012
193  29.857814  32.560975  30.107723  28.930012
194  19.425510  18.665101  19.295503  21.534733
195  19.425510  18.665101  19.295503  21.534733

[196 rows x 4 columns]

Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

test_rewards, test_reward_score = reward_lnr.fit_predict(
    test_X, test_treatments, test_outcomes, treatment_candidates)
test_rewards
            11         14         17         20
0    18.876095  17.348683  15.442553  13.023669
1    18.548851  17.177284  16.064991  13.101937
2    16.906367  17.195855  16.408816  13.265368
3    14.700567  12.306050  11.470901  12.951745
4    15.650784  13.336453  11.553937  12.994065
5    15.099024  16.819106  13.569338  12.673506
6    24.336513  23.789861  23.599930  22.839013
..         ...        ...        ...        ...
189  29.735471  38.577852  28.406107  27.474362
190  28.335337  46.667028  32.265685  28.382276
191  27.269966  26.920733  26.417994  26.560668
192  27.855778  23.593299  26.959093  27.390048
193  33.552941  34.792867  35.261248  30.726645
194  63.929968  32.100712  32.265685  28.382276
195  32.305441  31.562938  25.050652  27.682774

[196 rows x 4 columns]
test_reward_score['propensity']
{'17': 0.158079, '20': 0.267284, '11': 0.393047, '14': 0.217413}
test_reward_score['outcome']
{'17': 0.716403, '20': 0.747099, '11': 0.508526, '14': 0.889527}

We see the internal models of our test reward estimator have scores that are similar to those on the test set. As with the training set, this gives us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation.

We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted MPG under the treatments prescribed by the tree for the test set. To do this, we use predict_outcomes which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:

policy_outcomes = grid.predict_outcomes(test_X, test_rewards)
array([13.02366889, 13.1019369 , 13.26536812, ..., 34.79286695,
       32.10071157, 31.56293754])

We can then get the average estimated MPG under our treatments:

policy_outcomes.mean()
23.98875902

We can compare this number to the average estimated MPG under a baseline policy that assigns the same treatment to all observations regardless of their features. Let us compare to a policy that assigns 17 everywhere:

test_rewards['17'].mean()
23.11359676

We can see that the personalization offered by the tree policy indeed improves upon a baseline where each observation receives the same treatment.