Quick Start Guide: Optimal Policy Trees with Numeric Treatment

This is a Python version of the corresponding OptimalTrees quick start guide.

In this example we will give a demonstration of how to use Optimal Policy Trees with numeric treatment options. For this example, we will the auto-mpg dataset, where the task is usually to predict a car's fuel efficiency (in miles-per-gallon, or MPG) based on other characteristics of the car. To apply a prescriptive lens to this case, we will instead treat the amount of acceleration as a treatment that can be controlled, and try to find the value that optimizes the MPG for a given car.

Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process. For a real-world case study using similar techniques, see the grocery pricing case study.

First we load in the data and drop 6 rows with missing values:

import pandas as pd
df = pd.read_csv("auto-mpg.csv", na_values='?').dropna()

      mpg  cylinders  ...  origin                   car name
0    18.0          8  ...       1  chevrolet chevelle malibu
1    15.0          8  ...       1          buick skylark 320
2    18.0          8  ...       1         plymouth satellite
3    16.0          8  ...       1              amc rebel sst
4    17.0          8  ...       1                ford torino
5    15.0          8  ...       1           ford galaxie 500
6    14.0          8  ...       1           chevrolet impala
..    ...        ...  ...     ...                        ...
391  36.0          4  ...       1          dodge charger 2.2
392  27.0          4  ...       1           chevrolet camaro
393  27.0          4  ...       1            ford mustang gl
394  44.0          4  ...       2                  vw pickup
395  32.0          4  ...       1              dodge rampage
396  28.0          4  ...       1                ford ranger
397  31.0          4  ...       1                 chevy s-10

[392 rows x 9 columns]

Policy trees are trained using a features matrix/dataframe X as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

Reward Estimation

Warning

Please refer to the Reward Estimation documentation for a detailed description on how to perform reward estimation for numeric treatments properly. For simplicity and to keep the focus on Optimal Policy Trees, this quick start guide does not cover tuning the reward estimation parameters, but in real problems this tuning is an important step.

First, we split into training and testing:

from interpretableai import iai

X = df.drop(columns=['mpg', 'acceleration', 'car name'])
treatments = df.acceleration
outcomes = df.mpg

(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) = (
    iai.split_data('policy_maximize', X, treatments, outcomes, seed=1, train_proportion=0.6))

Note that we have used a training/test split of 60%/40%, so that we save more data for testing to ensure high-quality reward estimation on the test set.

The treatment, acceleration, is a numeric value, so we follow the process for estimating rewards with numeric treatments. We will consider prescribing acceleration values from 8 to 23, in steps of 3:

treatment_candidates = range(8, 26, 3)

The outcome is continuous, so we use a NumericRewardEstimator to estimate the MPG under our candidate acceleration values using an XGBoost model:

reward_lnr = iai.NumericRewardEstimator(
    outcome_estimator=iai.XGBoostRegressor(),
    random_seed=12345,
)
train_rewards, train_reward_score = reward_lnr.fit_predict(
    train_X, train_treatments, train_outcomes, treatment_candidates)
train_rewards

             8         11         14         17         20         23
0    17.564890  17.927940  16.553040  16.042833  16.173355  16.250404
1    16.236668  15.757603  14.819180  14.833300  14.456474  14.456474
2    18.743624  19.106674  17.385897  16.654261  16.799778  16.876827
3    14.889137  13.050015  12.355983  12.301568  12.323156  12.307738
4    14.406099  13.214581  12.901717  12.691013  12.926937  12.939469
5    14.889137  13.050015  12.355983  12.301568  12.323156  12.307738
6    13.132867  11.941333  11.659549  11.439299  11.675222  11.687755
..         ...        ...        ...        ...        ...        ...
228  34.726242  34.726242  34.882076  31.225338  35.134323  35.153179
229  21.904802  21.904802  22.257420  21.618423  21.293514  21.138433
230  21.887117  21.887117  22.205151  21.410133  21.408575  21.283152
231  27.897453  27.897453  29.680870  28.420534  27.673887  27.518806
232  30.069603  30.069603  30.108929  29.270477  30.982830  30.957016
233  26.162659  26.162659  26.393810  25.313986  24.814707  24.828367
234  28.979275  28.979275  29.210428  28.138910  27.868574  27.882235

[235 rows x 6 columns]

train_reward_score

0.8657202076286428

We can see that the R2 of the internal regression model is 0.87, which gives us good confidence that the reward estimates are high quality, and good to base our training on.

Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes MPG. We will use a GridSearch to fit an OptimalTreePolicyMaximizer (note that if we were trying to minimize the outcomes, we would use OptimalTreePolicyMinimizer):

grid = iai.GridSearch(
    iai.OptimalTreePolicyMaximizer(
        random_seed=1,
    ),
    max_depth=range(4, 6),
)
grid.fit(train_X, train_rewards)
grid.get_learner()

Optimal Trees Visualization

The resulting tree recommends different accelerations based on the characteristics of the car. The intensity of the color in each leaf shows the difference in quality between the best and second-best acceleration values.

We can see that both extremes of our candidate acceleration values are prescribed by the tree. The cars with the greatest displacement received the lowest acceleration in order to maximise the MPG. However, for lower-powered, recent, heavy cars, the highest acceleration value is the best option.

We can make treatment prescriptions using predict:

prescriptions = grid.predict(train_X)

The prescriptions are always returned as strings matching the column names of the input rewards matrix. In our case the treatments are numeric values, and if we want them in numeric form to use later we can convert them to numeric treatments using convert_treatments_to_numeric:

iai.convert_treatments_to_numeric(prescriptions)

array([ 8,  8,  8, ..., 14, 14, 14], dtype=int64)

If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with predict_treatment_rank:

rank = grid.predict_treatment_rank(train_X)

array([['8', '11', '14', '20', '23', '17'],
       ['8', '11', '14', '20', '23', '17'],
       ['8', '11', '14', '20', '23', '17'],
       ...,
       ['14', '8', '11', '17', '20', '23'],
       ['14', '8', '11', '17', '20', '23'],
       ['14', '8', '11', '17', '20', '23']], dtype='<U2')

For each point in the data, this gives the treatments in order of effectiveness. As before, this are returned as strings, but we can convert the treatments to numeric values with convert_treatments_to_numeric:

iai.convert_treatments_to_numeric(rank)

array([[ 8, 11, 14, 20, 23, 17],
       [ 8, 11, 14, 20, 23, 17],
       [ 8, 11, 14, 20, 23, 17],
       ...,
       [14,  8, 11, 17, 20, 23],
       [14,  8, 11, 17, 20, 23],
       [14,  8, 11, 17, 20, 23]], dtype=int64)

To quantify the difference in performance behind the treatment rankings, we can use predict_treatment_outcome to extract the estimated quality of each treatment for each point:

grid.predict_treatment_outcome(train_X)

             8         11         14         17         20         23
0    15.243216  14.647557  14.178015  13.903116  13.939541  13.939264
1    15.243216  14.647557  14.178015  13.903116  13.939541  13.939264
2    15.243216  14.647557  14.178015  13.903116  13.939541  13.939264
3    15.243216  14.647557  14.178015  13.903116  13.939541  13.939264
4    15.243216  14.647557  14.178015  13.903116  13.939541  13.939264
5    15.243216  14.647557  14.178015  13.903116  13.939541  13.939264
6    15.243216  14.647557  14.178015  13.903116  13.939541  13.939264
..         ...        ...        ...        ...        ...        ...
228  33.446706  33.446706  35.171294  34.310683  36.672391  36.718783
229  23.937211  23.934765  24.245889  23.674675  23.246540  23.200968
230  23.937211  23.934765  24.245889  23.674675  23.246540  23.200968
231  23.937211  23.934765  24.245889  23.674675  23.246540  23.200968
232  23.937211  23.934765  24.245889  23.674675  23.246540  23.200968
233  23.937211  23.934765  24.245889  23.674675  23.246540  23.200968
234  23.937211  23.934765  24.245889  23.674675  23.246540  23.200968

[235 rows x 6 columns]

Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

test_rewards, test_reward_score = reward_lnr.fit_predict(
    test_X, test_treatments, test_outcomes, treatment_candidates)
test_rewards

             8         11         14         17         20         23
0    14.612932  13.865725  13.044035  13.796869  13.466252  13.466344
1    15.335401  15.631332  14.751638  15.333280  13.883915  13.930205
2    14.320536  14.935003  14.299742  14.861835  13.264057  13.990059
3    14.649664  13.902456  13.044035  13.796869  13.466252  13.466344
4    23.570629  23.855164  23.682589  24.550678  23.700291  23.897943
5    22.383877  22.875113  21.778091  21.769884  20.279921  20.333008
6    21.596003  21.596003  21.044224  21.931101  21.633659  21.711040
..         ...        ...        ...        ...        ...        ...
150  35.552063  35.730183  35.722778  37.185501  36.302055  42.131157
151  26.067631  26.067631  25.844162  25.956953  26.495090  26.581295
152  31.170589  31.250059  31.261442  27.499216  27.175722  27.206593
153  28.507931  28.677679  27.571445  27.436069  25.876339  25.987709
154  38.870956  38.902817  38.837486  37.944462  38.016224  38.037628
155  30.539764  30.607140  30.618523  28.649202  28.325710  28.354242
156  29.075665  29.143040  29.154423  26.809921  26.459511  26.480341

[157 rows x 6 columns]

test_reward_score

0.7929507332526184

We see the internal model of our test reward estimator has an R2 of 0.79. As with the training set, this gives us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation.

We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted MPG under the treatments prescribed by the tree for the test set. To do this, we use predict_outcomes which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:

policy_outcomes = grid.predict_outcomes(test_X, test_rewards)

array([14.61293221, 15.33540058, 14.32053566, ..., 38.03762817,
       30.61852264, 29.15442276])

We can then get the average estimated MPG under our treatments:

policy_outcomes.mean()

np.float64(25.18668151)

We can compare this number to the average estimated MPG under a baseline policy that assigns the treatment that is best on average across all training points, which we can see from the root node of the tree is an acceleration of 14:

test_rewards['14'].mean()

np.float64(24.6803476)

We can see that the personalization offered by the tree policy indeed improves upon a baseline where each observation receives the same treatment.