Quick Start Guide: Optimal Prescriptive Trees

This is a Python version of the corresponding OptimalTrees quick start guide.

In this guide we will give a demonstration of how to use Optimal Prescriptive Trees (OPT). For this example, we will use the Credit Approval dataset, where the task is to predict the approval outcome of credit card application. Because the features have been anonymized for confidentiality reasons, we will arbitrarily select one of the categoric variables A1 to be the treatment.

Note: this case is not intended to serve as a practical application of prescriptive trees, but rather to serve as an illustration of the training and evaluation process.

First we load in the data and drop 37 rows with missing values:

import pandas as pd
df = pd.read_csv('crx.data', names=['A' + str(i) for i in range(1, 17)],
                 na_values='?').dropna()

    A1     A2      A3 A4 A5  A6  A7  ...  A10 A11 A12  A13    A14    A15  A16
0    b  30.83   0.000  u  g   w   v  ...    t   1   f    g  202.0      0    +
1    a  58.67   4.460  u  g   q   h  ...    t   6   f    g   43.0    560    +
2    a  24.50   0.500  u  g   q   h  ...    f   0   f    g  280.0    824    +
3    b  27.83   1.540  u  g   w   v  ...    t   5   t    g  100.0      3    +
4    b  20.17   5.625  u  g   w   v  ...    f   0   f    s  120.0      0    +
5    b  32.08   4.000  u  g   m   v  ...    f   0   t    g  360.0      0    +
6    b  33.17   1.040  u  g   r   h  ...    f   0   t    g  164.0  31285    +
..  ..    ...     ... .. ..  ..  ..  ...  ...  ..  ..  ...    ...    ...  ...
683  b  36.42   0.750  y  p   d   v  ...    f   0   f    g  240.0      3    -
684  b  40.58   3.290  u  g   m   v  ...    f   0   t    s  400.0      0    -
685  b  21.08  10.085  y  p   e   h  ...    f   0   f    g  260.0      0    -
686  a  22.67   0.750  u  g   c   v  ...    t   2   t    g  200.0    394    -
687  a  25.25  13.500  y  p  ff  ff  ...    t   1   t    g  200.0      1    -
688  b  17.92   0.205  u  g  aa   v  ...    f   0   f    g  280.0    750    -
689  b  35.00   3.375  u  g   c   h  ...    f   0   t    g    0.0      0    -

[653 rows x 16 columns]

Data for prescriptive problems

Prescriptive trees are trained on observational data, and require three distinct types of data:

X: the features for each observation that can be used as the splits in the tree - this can be a matrix or a dataframe as for classification or regression problems
treatments: the treatment applied to each observation - this is a vector of the treatment labels similar to the target in a classification problem
outcomes: the outcome for each observation under the applied treatment - this is a vector of numeric values similar to the target in a regression problem

Refer to the documentation on data preparation for more information on the data format.

In this case, the treatment is the variable A1, and the outcome is whether the application was approved (which we are trying to make as likely as possible):

X = df.drop(columns=['A1', 'A16'])
treatments = df.A1
outcomes = df.A16 == '+'

for col in X.columns:
    if X.dtypes[col] == 'object':
        X[col] = X[col].astype('category')

We can now split into training and test datasets:

from interpretableai import iai
(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) = (
    iai.split_data('policy_maximize', X, treatments, outcomes, seed=123, train_proportion=0.5))

Note that we have used a training/test split of so that we save more data for testing to ensure high-quality reward estimation on the test set.

Fitting Optimal Prescriptive Trees

We will use a GridSearch to fit an OptimalTreePrescriptionMaximizer (note that if we were trying to minimize the outcomes, we would use OptimalTreePrescriptionMinimizer):

grid = iai.GridSearch(
    iai.OptimalTreePrescriptionMaximizer(
        prescription_factor=0.6,
        treatment_minbucket=10,
        random_seed=123,
        max_categoric_levels_before_warning=15
    ),
    max_depth=range(6),
)
grid.fit(train_X, train_treatments, train_outcomes)
grid.get_learner()

Optimal Trees Visualization

Here, we have set prescription_factor=0.6 to strike a balance between maximizing the outcome and estimating the approval probability, and treatment_minbucket=10 so that the tree can only prescribe a treatment in a leaf if there are at least 10 subjects in that leaf that received this treatment. This is to ensure that we have sufficient data on how the treatment affects subjects in this leaf before we can prescribe it.

In the resulting tree, the color in each leaf indicates which treatment is deemed to be stronger in this leaf, and the color intensity indicates the size of the difference. Because the variables are anonymized, we cannot directly interpret the tree, but we can see that both treatments are prescribed based on a variety of the features.

We can make predictions on new data using predict:

pred_treatments, pred_outcomes = grid.predict(test_X)

This returns the treatment prescribed for each subject as well as the outcome predicted for each subject under the prescribed treatment:

pred_treatments

['a', 'b', 'b', 'b', 'a', 'a', 'b', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'a', 'b', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'a', 'b', 'a', 'b', 'a', 'a', 'a', 'b', 'b', 'b', 'a', 'b', 'b', 'a', 'b', 'a', 'a', 'b', 'b', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'a', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'b', 'b', 'a', 'b', 'b', 'b', 'a', 'a', 'b', 'a', 'b', 'a', 'b', 'b', 'b', 'a', 'b', 'a', 'b', 'b', 'a', 'b', 'b', 'b', 'a', 'b', 'a', 'b', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'a', 'a', 'b', 'b', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'a', 'b', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'a', 'b', 'b', 'b', 'b', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'a', 'b', 'a', 'b', 'b', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'a', 'b', 'b', 'b', 'a', 'b', 'b', 'a', 'b', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'a', 'b', 'b', 'a', 'b', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'a', 'a', 'b', 'a', 'b', 'a', 'b', 'b', 'b', 'a', 'a', 'b', 'a', 'a', 'b', 'b', 'a', 'a', 'a', 'a', 'b', 'b', 'a', 'a', 'b', 'b', 'b', 'a', 'a', 'a', 'b', 'b', 'a', 'a', 'b', 'a', 'b', 'a', 'b', 'b', 'b', 'b', 'a', 'b', 'b', 'b', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'a', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b', 'b']

pred_outcomes

array([0.90909091, 0.75      , 0.75      , ..., 0.07894737, 0.07894737,
       0.07894737])

You can also use predict_outcomes to get the predicted outcomes for all treatments:

grid.predict_outcomes(test_X)

            a         b
0    0.909091  0.416667
1    0.000000  0.750000
2    0.000000  0.750000
3    0.000000  0.750000
4    0.800000  0.280000
5    0.909091  0.416667
6    1.000000  0.935484
..        ...       ...
320  0.000000  0.078947
321  0.000000  0.078947
322  0.000000  0.078947
323  0.000000  0.078947
324  0.000000  0.078947
325  0.000000  0.078947
326  0.000000  0.078947

[327 rows x 2 columns]

Evaluating Optimal Prescriptive Trees

In prescription problems, it is complicated to evaluate the quality of a prescription policy because our data only contains the outcome for the treatment that was received. Because we don't know the outcomes for the treatments that were not received (known as the counterfactuals), we cannot simply evaluate our prescriptions against the test set as we normally do.

A common approach to resolve this problem is reward estimation, where so-called rewards are estimated for each treatment for each observation. These rewards indicate the relative credit a model should be given for prescribing each treatment to each observation, and thus can be used to evaluate the quality of the prescription policy. For more details on how the reward estimation procedure is conducted, refer to the reward estimation documentation.

We use a CategoricalClassificationRewardEstimator to estimate the approval outcome under each option with a doubly-robust reward estimation method using random forests to estimate both propensity scores and outcomes. Note that we are passing in the test data rather the training data to ensure we get a fair out-of-sample evaluation:

reward_lnr = iai.CategoricalClassificationRewardEstimator(
    propensity_estimator=iai.RandomForestClassifier(),
    outcome_estimator=iai.RandomForestClassifier(),
    reward_estimator='doubly_robust',
    random_seed=123,
)
test_predictions, test_reward_score = reward_lnr.fit_predict(
    test_X, test_treatments, test_outcomes, propensity_score_criterion='auc',
    outcome_score_criterion='auc')
test_rewards = test_predictions['reward']
"""

            a         b
0    0.598000  1.168277
1    1.194955  0.706667
2    0.730000  1.171689
3    0.820000  1.080724
4    0.740000  1.061086
5    0.620000  1.197214
6    0.820000  1.013302
..        ...       ...
320 -0.095825  0.380000
321  0.030000 -0.003974
322  0.040000 -0.029723
323  0.137500 -0.011404
324 -0.907834  0.110000
325  0.100000 -0.143617
326  0.256667 -0.164480

[327 rows x 2 columns]

test_reward_score

{'propensity': 0.6226425120772946, 'outcome': {'b': 0.9240666100198265, 'a': 0.9219693500943501}}

We can see that the internal outcome estimation models have AUCs of 0.92, which gives us confidence that the reward estimates are of decent quality, and good to base our training on. The AUC for the propensity model is lower at 0.62, which is not particularly high, and suggests difficulty in reliably estimating the propensity. The doubly-robust estimation method should help to alleviate this problem, as it is designed to deliver good results if either propensity scores or outcomes are estimated well. However, we should always pay attention to these scores and proceed with caution if the estimation quality is low.

We can now use these reward values to evaluate the prescription in many ways. For example, we might like to see the mean reward achieved across all prescriptions on the test set:

import numpy as np
np.mean([test_rewards[pred_treatments[i]][i]
         for i in range(len(pred_treatments))])

np.float64(0.43780557502511)

For comparison's sake, we can compare this to the mean reward achieved under the actual treatment assignments that were observed in the data:

np.mean([test_rewards[test_treatments[i]][i]
         for i in range(len(test_treatments))])

np.float64(0.4074034869644)

We can see that the prescriptive tree policy indeed improves on the actual assignments.