# Quick Start Guide: Optimal Policy Trees with Survival Outcome

This is a Python version of the corresponding OptimalTrees quick start guide.

In this guide we will give a demonstration of how to use Optimal Policy Trees with survival outcomes. For this example, we will use the AIDS Clinical Trials Group Study 175 dataset, which was a randomized clinical trial examining the effects of four treatments on the survival of patients with HIV.

Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process.

First we load in the data:

import pandas as pd

      pidnum  age      wtkg  hemo  homo  ...  cd80  cd820  cens  days  arms
0      10056   48   89.8128     0     0  ...   566    324     0   948     2
1      10059   61   49.4424     0     0  ...   392    564     1  1002     3
2      10089   45   88.4520     0     1  ...  2063   1893     0   961     3
3      10093   47   85.2768     0     1  ...  1590    966     0  1166     3
4      10124   43   66.6792     0     1  ...   870    782     0  1090     0
5      10140   46   88.9056     0     1  ...   860   1060     0  1181     1
6      10165   31   73.0296     0     1  ...   708    699     1   794     0
...      ...  ...       ...   ...   ...  ...   ...    ...   ...   ...   ...
2132  990018   27   80.2872     1     0  ...   910   1009     0   413     3
2133  990019   39   64.8648     1     0  ...   504    367     1  1041     2
2134  990021   21   53.2980     1     0  ...   561    720     0  1091     3
2135  990026   17  102.9672     1     0  ...  1759   1030     0   395     0
2136  990030   53   69.8544     1     1  ...  1391   1041     0  1104     2
2137  990071   14   60.0000     1     0  ...   999   1838     1   465     0
2138  990077   45   77.3000     1     0  ...   885    526     0  1045     3

[2139 rows x 27 columns]

Policy trees are trained using a features matrix/dataframe X as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

• in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
• more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

## Reward Estimation

First, we separate the dataset into the various pieces:

• the features (X)
• the treatments observed in the data (treatments)
• whether the patient was known to have died (died)
• the time of last contact with the patient (times) - this is the survival time for patients that died, and a lower bound on the survival time otherwise
X = df[['age', 'wtkg', 'karnof', 'cd40', 'cd420', 'cd80', 'cd820', 'gender',
'homo', 'race', 'symptom', 'drugs', 'hemo', 'str2']]

treatments = df.arms.replace({
0: 'zidovudine',
1: 'zidovudine and didanosine',
2: 'zidovudine and zalcitabine',
3: 'didanosine',
})

died = df.cens.astype(bool)

times = df.days


Next, we split into training and testing:

from interpretableai import iai
(train_X, train_treatments, train_died, train_times), (test_X, test_treatments, test_died, test_times) = (
iai.split_data('prescription_maximize', X, treatments, died, times, seed=2345, train_proportion=0.5))


Note that we have used a training/test split of 50%/50%, so that we save more data for testing to ensure high-quality reward estimation on the test set.

The treatment is a categoric variable with four choices, so we follow the process for estimating rewards with categorical treatments.

Our outcome is the survival time of the patient, so we use a CategoricalSurvivalRewardEstimator to estimate the expected survival under each treatment option with a doubly-robust reward estimation method, using random forests to estimate both propensity scores and outcomes:

reward_lnr = iai.CategoricalSurvivalRewardEstimator(
propensity_estimator=iai.RandomForestClassifier(),
outcome_estimator=iai.RandomForestSurvivalLearner(),
reward_estimator='doubly_robust',
random_seed=1,
)

train_predictions, train_reward_score = reward_lnr.fit_predict(
train_X, train_treatments, train_died, train_times,
propensity_score_criterion='misclassification',
outcome_score_criterion='harrell_c_statistic')
train_rewards = train_predictions['reward']

       didanosine  ...  zidovudine and zalcitabine
0      914.744673  ...                  998.161166
1     1705.351470  ...                  877.228677
2     1079.137929  ...                 1116.155585
3      986.440389  ...                 1004.832842
4    -2805.685659  ...                  778.450793
5      752.110812  ...                  784.451586
6     1204.908181  ...                  207.675444
...           ...  ...                         ...
1062  1196.442029  ...                 1563.924126
1063  1194.018679  ...                 1108.786956
1064  1337.757919  ...                  950.435522
1065  1067.239127  ...                  422.424876
1066   971.123474  ...                 1060.711159
1067   947.833367  ...                 1023.908799
1068  1446.007626  ...                 1149.334155

[1069 rows x 4 columns]
train_reward_score['propensity']

0.26940253
train_reward_score['outcome']

{'zidovudine and zalcitabine': 0.687999, 'zidovudine': 0.702657, 'didanosine': 0.686716, 'zidovudine and didanosine': 0.712789}
train_reward_score['censoring']

{'zidovudine and zalcitabine': 0.569372, 'zidovudine': 0.593207, 'didanosine': 0.560309, 'zidovudine and didanosine': 0.617339}

We can see that the internal outcome estimation models have c-statistics around 0.7, which gives us confidence that the survival time estimates are of decent quality, and good to base our training on. The accuracy of the propensity model is around the same as random guessing at 25%, which is to be expected as the underlying data comes from a randomized trial. Given this, we may not expect the doubly-robust estimation method to perform significantly differently to the direct method, as there is not likely to be much treatment assignment bias to correct for.

## Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes survival time. Note that we exclude two features from our prescription policy (cd420 and cd820) as these are observed after the treatment assignment is decided. We will use a GridSearch to fit an OptimalTreePolicyMaximizer:

grid = iai.GridSearch(
iai.OptimalTreePolicyMaximizer(
random_seed=1,
minbucket=10,
),
max_depth=range(1, 6),
)
grid.fit(train_X.drop(columns=['cd420', 'cd820']), train_rewards)
grid.get_learner()

Optimal Trees Visualization

The resulting tree recommends different treatments based simply on the age of the patient.

We can make treatment prescriptions using predict:

grid.predict(train_X)

array(['zidovudine and didanosine', 'zidovudine and didanosine',
'zidovudine and didanosine', ..., 'zidovudine and zalcitabine',
'zidovudine and zalcitabine', 'zidovudine and didanosine'],
dtype='<U26')

If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with predict_treatment_rank:

grid.predict_treatment_rank(train_X)

array([['zidovudine and didanosine', 'didanosine',
'zidovudine and zalcitabine', 'zidovudine'],
['zidovudine and didanosine', 'didanosine',
'zidovudine and zalcitabine', 'zidovudine'],
['zidovudine and didanosine', 'didanosine',
'zidovudine and zalcitabine', 'zidovudine'],
...,
['zidovudine and zalcitabine', 'didanosine',
'zidovudine and didanosine', 'zidovudine'],
['zidovudine and zalcitabine', 'didanosine',
'zidovudine and didanosine', 'zidovudine'],
['zidovudine and didanosine', 'didanosine',
'zidovudine and zalcitabine', 'zidovudine']], dtype='<U26')

To quantify the difference in performance behind the treatment rankings, we can use predict_treatment_outcome to extract the estimated quality of each treatment for each point:

grid.predict_treatment_outcome(train_X)

       didanosine  ...  zidovudine and zalcitabine
0     1041.330914  ...                 1007.868451
1     1041.330914  ...                 1007.868451
2     1041.330914  ...                 1007.868451
3     1071.780226  ...                 1105.695838
4     1041.330914  ...                 1007.868451
5     1071.780226  ...                 1105.695838
6     1071.780226  ...                 1105.695838
...           ...  ...                         ...
1062  1071.780226  ...                 1105.695838
1063  1071.780226  ...                 1105.695838
1064  1071.780226  ...                 1105.695838
1065  1041.330914  ...                 1007.868451
1066  1071.780226  ...                 1105.695838
1067  1071.780226  ...                 1105.695838
1068  1041.330914  ...                 1007.868451

[1069 rows x 4 columns]

## Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

test_predictions, test_reward_score = reward_lnr.fit_predict(
test_X, test_treatments, test_died, test_times,
propensity_score_criterion='misclassification',
outcome_score_criterion='harrell_c_statistic')
test_rewards = test_predictions['reward']

       didanosine  ...  zidovudine and zalcitabine
0     1194.018661  ...                 1412.721987
1     1334.037470  ...                 1135.743532
2     1113.530101  ...                 1158.433430
3     1045.897627  ...                 1187.052873
4     -678.638579  ...                  947.514961
5     1178.746839  ...                  940.875650
6      771.323075  ...                  751.247438
...           ...  ...                         ...
1063  1158.644961  ...                 1197.064651
1064  1186.693142  ...                 1180.898392
1065   860.971644  ...                 -383.838537
1066   791.350526  ...                 1089.782113
1067  1727.160890  ...                  898.050692
1068  1708.732983  ...                  935.965987
1069  1049.665627  ...                 1390.239044

[1070 rows x 4 columns]
test_reward_score['propensity']

0.26268495
test_reward_score['outcome']

{'zidovudine and zalcitabine': 0.722348, 'zidovudine': 0.718315, 'didanosine': 0.746346, 'zidovudine and didanosine': 0.738453}
test_reward_score['censoring']

{'zidovudine and zalcitabine': 0.60256, 'zidovudine': 0.561125, 'didanosine': 0.608006, 'zidovudine and didanosine': 0.544843}

We see the scores are similar to those on the training set, giving us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation.

We can now evaluate the quality using these new estimated rewards. First, we will calculate the average survival time under the treatments prescribed by the tree for the test set. To do this, we use predict_outcomes which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:

policy_outcomes = grid.predict_outcomes(test_X, test_rewards)

array([1188.10916664, 1094.20211803, 1177.06657838, ...,  898.05069167,
935.96598699, 1116.67319147])

We can then get the average estimated survival times under our treatment policy:

policy_outcomes.mean()

1060.4842813

We can compare this number to the average estimated survival time under the treatment assignments that were actually observed:

np.array([test_rewards[test_treatments[i]][i]
for i in range(len(test_treatments))]).mean()

931.928055

We see that our policy leads to a sizeable improvement in survival times compared to the randomized assignments.

## Survival Probability as Outcome

In the example above, we addressed the problem with the goal of assigning treatments to maximize the expected survival time of each patient. As an alternative way of looking at the problem, we can also try to assign treatments that maximize each patient's probability of survival at a given point in time.

In this case, we will try to assign treatments to maximize the probability of surviving at least 2.5 years. The only change to the previous workflow is that we need to use the evaluation_time parameter on the reward estimator to specify the time of interest:

reward_lnr = iai.CategoricalSurvivalRewardEstimator(
propensity_estimator=iai.RandomForestClassifier(),
outcome_estimator=iai.RandomForestSurvivalLearner(),
reward_estimator='doubly_robust',
random_seed=1,
evaluation_time=(365 * 2.5),
)


We then proceed to estimate rewards on the training set and train an Optimal Policy Tree as before:

train_predictions, train_reward_score = reward_lnr.fit_predict(
train_X, train_treatments, train_died, train_times,
propensity_score_criterion='misclassification',
outcome_score_criterion='harrell_c_statistic')
train_rewards = train_predictions['reward']

grid = iai.GridSearch(
iai.OptimalTreePolicyMaximizer(
random_seed=1,
minbucket=10,
),
max_depth=range(1, 6),
)
grid.fit(train_X.drop(columns=['cd420', 'cd820']), train_rewards)
grid.get_learner()

Optimal Trees Visualization

We see that the resulting tree finds leaves that have very different responses to the treatments. In particular, Node 7 prescribes zidovudine and didanosine with an estimated 97% 2.5-year survival rate, whereas in the adjacent Node 6, the estimated survival rate under this treatment is only 51%.

We can also proceed to estimate the rewards on the test set in the same way as before:

test_predictions, test_reward_score = reward_lnr.fit_predict(
test_X, test_treatments, test_died, test_times,
propensity_score_criterion='misclassification',
outcome_score_criterion='harrell_c_statistic')
test_rewards = test_predictions['reward']


We can then get the average estimated 2.5-year survival probability under our treatment policy:

policy_outcomes = grid.predict_outcomes(test_X, test_rewards)
policy_outcomes.mean()

0.7423757

We can compare this number to the average estimated 2.5-year survival probability under the treatment assignments that were actually observed:

np.array([test_rewards[test_treatments[i]][i]
for i in range(len(test_treatments))]).mean()

0.564216

We see that our policy leads to a sizeable improvement in survival probability compared to the randomized assignments.