Quick Start Guide: Optimal Policy Trees with Numeric Treatment
This is a Python version of the corresponding OptimalTrees quick start guide.
In this example we will give a demonstration of how to use Optimal Policy Trees with numeric treatment options. For this example, we will the auto-mpg dataset, where the task is usually to predict a car's fuel efficiency (in miles-per-gallon, or MPG) based on other characteristics of the car. To apply a prescriptive lens to this case, we will instead treat the amount of acceleration as a treatment that can be controlled, and try to find the value that optimizes the MPG for a given car.
Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process. For a real-world case study using similar techniques, see the grocery pricing case study.
First we load in the data and drop 6 rows with missing values:
import pandas as pd
df = pd.read_csv("auto-mpg.csv", na_values='?').dropna()
      mpg  cylinders  ...  origin                   car name
0    18.0          8  ...       1  chevrolet chevelle malibu
1    15.0          8  ...       1          buick skylark 320
2    18.0          8  ...       1         plymouth satellite
3    16.0          8  ...       1              amc rebel sst
4    17.0          8  ...       1                ford torino
5    15.0          8  ...       1           ford galaxie 500
6    14.0          8  ...       1           chevrolet impala
..    ...        ...  ...     ...                        ...
391  36.0          4  ...       1          dodge charger 2.2
392  27.0          4  ...       1           chevrolet camaro
393  27.0          4  ...       1            ford mustang gl
394  44.0          4  ...       2                  vw pickup
395  32.0          4  ...       1              dodge rampage
396  28.0          4  ...       1                ford ranger
397  31.0          4  ...       1                 chevy s-10
[392 rows x 9 columns]Policy trees are trained using a features matrix/dataframe X as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.
There are two ways to get this rewards matrix:
- in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
- more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment
Refer to the documentation on data preparation for more information on the data format.
In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.
Reward Estimation
Please refer to the Reward Estimation documentation for a detailed description on how to perform reward estimation for numeric treatments properly. For simplicity and to keep the focus on Optimal Policy Trees, this quick start guide does not cover tuning the reward estimation parameters, but in real problems this tuning is an important step.
First, we split into training and testing:
from interpretableai import iai
X = df.drop(columns=['mpg', 'acceleration', 'car name'])
treatments = df.acceleration
outcomes = df.mpg
(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) = (
    iai.split_data('policy_maximize', X, treatments, outcomes, seed=123, train_proportion=0.5))
Note that we have used a training/test split of 60%/40%, so that we save more data for testing to ensure high-quality reward estimation on the test set.
The treatment, acceleration, is a numeric value, so we follow the process for estimating rewards with numeric treatments. We will consider prescribing acceleration values from 11 to 20, in steps of 3:
treatment_candidates = range(11, 23, 3)
The outcome is continuous, so we use a NumericRegressionRewardEstimator to estimate the MPG under our candidate acceleration values using a Random Forest model:
reward_lnr = iai.NumericRegressionRewardEstimator(
    propensity_estimator=iai.RandomForestRegressor(),
    outcome_estimator=iai.RandomForestRegressor(),
    reward_estimator='doubly_robust',
    propensity_min_value=0.1,
    random_seed=1,
)
train_predictions, train_reward_score = reward_lnr.fit_predict(
    train_X, train_treatments, train_outcomes, treatment_candidates)
train_rewards = train_predictions['reward']
            11         14         17         20
0    15.519248  14.321092  15.909451  15.716086
1    17.984288  17.068355  16.182601  16.684424
2    15.199302  11.280400  14.171940  15.820916
3    13.614308  11.021572  14.171940  15.820916
4    14.047812  11.247633  14.509284  16.116765
5    15.057220  15.595349  16.199104  16.684424
6    15.031424  13.373021  14.793007  16.018344
..         ...        ...        ...        ...
189  21.622121  23.074081  22.512198  26.995897
190  26.011327  36.011896  34.433747  30.453314
191  26.458852  34.685835  42.800888  29.772748
192  26.335569  37.362797  32.103874  30.927917
193  26.458852  39.360032  34.695656  31.993735
194  19.613755  18.997685  35.968945  19.967707
195  24.046321  26.307916  26.180446  39.971961
[196 rows x 4 columns]
train_reward_score['propensity']
{'17': 0.120533, '20': 0.200617, '11': 0.287145, '14': 0.165352}train_reward_score['outcome']
{'17': 0.864211, '20': 0.250276, '11': 0.653976, '14': 0.833718}We can see that the R2 of the internal outcome models for each candidate treatment are between 0.25 and 0.86, whereas the propensity models have R2 between 0.12 and 0.29. The non-zero propensity scores tell us that there is a small amount of treatment assignment bias in this data (keeping in mind that we arbitrarily selected a feature to use as the treatment column). It seems we can predict the outcomes reasonably well, so we have a good base for the reward estimates.
Optimal Policy Trees
Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes MPG. We will use a GridSearch to fit an OptimalTreePolicyMaximizer (note that if we were trying to minimize the outcomes, we would use OptimalTreePolicyMinimizer):
grid = iai.GridSearch(
    iai.OptimalTreePolicyMaximizer(
        random_seed=1,
        minbucket=15,
    ),
    max_depth=range(4, 6),
)
grid.fit(train_X, train_rewards)
grid.get_learner()
The resulting tree recommends different accelerations based on the characteristics of the car. The intensity of the color in each leaf shows the difference in quality between the best and second-best acceleration values.
We can see that a variety of our candidate acceleration values are prescribed by the tree. For instance, older cars with higher horsepower receive the highest acceleration.
We can make treatment prescriptions using predict:
prescriptions = grid.predict(train_X)
The prescriptions are always returned as strings matching the column names of the input rewards matrix. In our case the treatments are numeric values, and if we want them in numeric form to use later we can convert them to numeric treatments using convert_treatments_to_numeric:
iai.convert_treatments_to_numeric(prescriptions)
array([20, 20, 20, ..., 14, 14, 20], dtype=int64)If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with predict_treatment_rank:
rank = grid.predict_treatment_rank(train_X)
array([['20', '11', '17', '14'],
       ['20', '11', '17', '14'],
       ['20', '11', '17', '14'],
       ...,
       ['14', '17', '20', '11'],
       ['14', '11', '17', '20'],
       ['20', '17', '11', '14']], dtype='<U2')For each point in the data, this gives the treatments in order of effectiveness. As before, this are returned as strings, but we can convert the treatments to numeric values with convert_treatments_to_numeric:
iai.convert_treatments_to_numeric(rank)
array([[20, 11, 17, 14],
       [20, 11, 17, 14],
       [20, 11, 17, 14],
       ...,
       [14, 17, 20, 11],
       [14, 11, 17, 20],
       [20, 17, 11, 14]], dtype=int64)To quantify the difference in performance behind the treatment rankings, we can use predict_treatment_outcome to extract the estimated quality of each treatment for each point:
grid.predict_treatment_outcome(train_X)
            11         14         17         20
0    17.020992  14.750476  16.922193  18.286560
1    17.020992  14.750476  16.922193  18.286560
2    17.020992  14.750476  16.922193  18.286560
3    17.020992  14.750476  16.922193  18.286560
4    17.020992  14.750476  16.922193  18.286560
5    17.020992  14.750476  16.922193  18.286560
6    17.020992  14.750476  16.922193  18.286560
..         ...        ...        ...        ...
189  22.157787  21.387803  22.379440  27.581197
190  26.314332  31.384652  29.362991  28.327086
191  26.314332  31.384652  29.362991  28.327086
192  26.314332  31.384652  29.362991  28.327086
193  26.314332  31.384652  29.362991  28.327086
194  18.656272  19.316212  18.452876  17.302140
195  22.157787  21.387803  22.379440  27.581197
[196 rows x 4 columns]We can also extract the standard errors of the outcome estimates with predict_treatment_outcome_standard_error:
grid.predict_treatment_outcome_standard_error(train_X)
           11        14        17        20
0    0.803530  1.063187  0.770092  0.710896
1    0.803530  1.063187  0.770092  0.710896
2    0.803530  1.063187  0.770092  0.710896
3    0.803530  1.063187  0.770092  0.710896
4    0.803530  1.063187  0.770092  0.710896
5    0.803530  1.063187  0.770092  0.710896
6    0.803530  1.063187  0.770092  0.710896
..        ...       ...       ...       ...
189  0.513389  0.882271  1.029965  1.776420
190  0.948884  1.263225  0.739374  0.448817
191  0.948884  1.263225  0.739374  0.448817
192  0.948884  1.263225  0.739374  0.448817
193  0.948884  1.263225  0.739374  0.448817
194  0.231380  0.407594  0.625738  0.466373
195  0.513389  0.882271  1.029965  1.776420
[196 rows x 4 columns]These standard errors can be combined with the outcome estimates to construct confidence intervals in the usual way.
Evaluating Optimal Policy Trees
It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.
Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:
test_predictions, test_reward_score = reward_lnr.fit_predict(
    test_X, test_treatments, test_outcomes, treatment_candidates)
test_rewards = test_predictions['reward']
            11         14         17         20
0    18.826483  17.900647  15.274087  16.596052
1    18.583713  17.337311  15.365068  16.854333
2    16.831678  17.759072  15.451617  17.110980
3    14.837135  12.413611  14.331685  16.153889
4    15.885674  13.500115  14.059616  16.247889
5    14.919250  16.836939  14.900044  16.212197
6    28.654204  23.726073  23.455531  23.744689
..         ...        ...        ...        ...
189  30.282628  38.580671  28.159223  27.406541
190  27.585460  46.746875  31.718984  25.842200
191  28.447141  27.566946  26.310934  25.622135
192  27.138617  22.271248  26.941105  27.596455
193  30.656261  35.057655  35.221635  30.770049
194  70.463595  32.071524  31.718984  25.788215
195  29.962972  31.638435  25.119768  30.190963
[196 rows x 4 columns]
test_reward_score['propensity']
{'17': 0.158079, '20': 0.267284, '11': 0.393047, '14': 0.217413}test_reward_score['outcome']
{'17': 0.705786, '20': 0.770381, '11': 0.470776, '14': 0.886658}We see the internal models of our test reward estimator have scores that are similar to those on the test set. As with the training set, this gives us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation.
We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted MPG under the treatments prescribed by the tree for the test set. To do this, we use predict_outcomes which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:
policy_outcomes = grid.predict_outcomes(test_X, test_rewards)
array([16.59605211, 16.85433253, 17.11098041, ..., 35.05765523,
       32.07152412, 30.19096294])We can then get the average estimated MPG under our treatments:
policy_outcomes.mean()
np.float64(23.93172935)We can compare this number to the average estimated MPG under a baseline policy that assigns the same treatment to all observations regardless of their features. Let us compare to a policy that assigns 17 everywhere:
test_rewards['17'].mean()
np.float64(23.32036357)We can see that the personalization offered by the tree policy indeed improves upon a baseline where each observation receives the same treatment.