# Quick Start Guide: Optimal Prescriptive Trees

This is a Python version of the corresponding OptimalTrees quick start guide. The results are slightly different to the original guide as the data is randomly generated by Python rather than Julia.

In this example we will give an example of how to use Optimal Prescriptive Trees (OPT). We will use synthetically-generated data to better understand the workings of prescriptive trees and how we can fairly evaluate performance out-of-sample in a prescriptive problem.

## Data for prescriptive problems

Prescriptive trees are trained on observational data, and require three distinct types of data:

• X: the features for each observation that can be used as the splits in the tree - this can be a matrix or a dataframe as for classification or regression problems
• treatments: the treatment applied to each observation - this is a vector of the treatment labels similar to the target in a classification problem
• outcomes: the outcome for each observation under the applied treatment - this is a vector of numeric values similar to the target in a regression problem

Refer to the documentation on data preparation for more information on the data format.

First, we will generate the features X with a mix of binary and numeric data:

import numpy as np
import pandas as pd
np.random.seed(123)
n = 1000
p = 10

def generate_X(n, p):
X = np.zeros((n, p))
for j in range(p):
if j % 2 == 0:
X[:, j] = np.random.randn(n)
else:
X[:, j] = np.random.randint(0, 2, n)
return pd.DataFrame(X, columns=["x" + str(j + 1) for j in range(p)])

X = generate_X(n, p)

           x1   x2        x3   x4        x5   x6        x7   x8        x9  x10
0   -1.085631  0.0  0.547252  1.0 -1.699241  1.0 -0.933232  1.0 -1.765477  1.0
1    0.997345  1.0 -0.717632  1.0  1.190900  0.0  1.446001  0.0 -0.279695  0.0
2    0.282978  0.0  0.837935  0.0  1.640797  0.0  1.050432  1.0  0.818151  0.0
3   -1.506295  1.0  0.740198  0.0 -0.136843  0.0  0.246567  1.0  0.402709  0.0
4   -0.578600  1.0  0.403703  1.0  2.393079  0.0  1.933233  1.0  1.893465  1.0
5    1.651437  0.0 -0.209926  1.0  0.233501  0.0 -1.465329  0.0  0.202776  1.0
6   -2.426679  1.0 -0.526109  1.0 -0.504903  0.0  0.809485  0.0 -0.298224  0.0
..        ...  ...       ...  ...       ...  ...       ...  ...       ...  ...
993 -1.931769  0.0  0.872414  0.0 -2.131140  0.0  0.173978  0.0 -1.497893  1.0
994 -0.419641  0.0  0.201334  1.0 -1.457351  1.0  0.638074  0.0  1.670272  0.0
995  0.634763  1.0 -0.596032  0.0  0.079862  1.0 -0.586725  0.0  0.686771  0.0
996  1.069919  0.0 -0.358594  1.0 -0.739779  1.0  0.348569  1.0 -0.909776  1.0
997 -0.909327  1.0 -0.061931  1.0 -0.246920  1.0 -0.925705  0.0 -1.166603  1.0
998  0.470264  0.0  0.123191  1.0 -0.044476  1.0  0.145194  0.0  0.962710  1.0
999 -1.111430  0.0  1.336415  0.0 -0.033889  1.0  0.661059  1.0 -0.482874  1.0

[1000 rows x 10 columns]

We will consider data with two treatments ($A$ and $B$) where the treatment effect $y(B) - y(A)$ is given by

$4 \Iota\{x_1 > 1\} \Iota\{x_3 > 0\} + 4 \Iota\{x_5 > 1\} \Iota\{x_7 > 0\} + 2 x_8 x_9$

where $\Iota(x)$ is the indicator function. We can then generate the outcomes as follows:

def treatment_effect(x):
return (
4 * (1 if x.x1 > 1 else 0) * (1 if x.x3 > 0 else 0) +
4 * (1 if x.x5 > 1 else 0) * (1 if x.x7 > 0 else 0) +
2 * x.x8 * x.x9
)

def generate_outcomes(x):
baseline = 10
effect = treatment_effect(x)
yA = baseline - 0.5 * effect
yB = baseline + 0.5 * effect
return yA, yB


Let's look at some examples:

[generate_outcomes(X.iloc[i]) for i in range(3)]

[(11.765477348542346, 8.234522651457654), (8.0, 12.0), (7.1818492270089, 12.8181507729911)]

This gives us the ground truth for the data. We will assume that a lower outcome is more desirable, and so the optimal treatment is the one with the lowest outcome.

Now, we will generate some realistic training data. In real observational data, we usually see that the treatments are not assigned optimally nor entirely randomly. Instead, the treatments are somewhere inbetween: the treatment assignments are generally in the direction of optimality but with mistakes as well. To imitate this phenomenon with our synthetic data, we will assign the treatments randomly, but with the chance of receiving the optimal treatment depending on how far apart the outcomes for the two treatments are. In this way, the more obvious the choice of best treatment is in the ground truth, the more likely the training data will assign this treatment. The exact approach we use is:

$P(T = A) = \frac{e^{y(B) - y(A)}}{e^{y(B) - y(A)} + 1}$

In this way, the higher (worse) the impact of the treatment $B$, the lower the probability of being assigned $B$, and vice-versa.

The following code generates the treatment assignments and corresponding outcomes. We also add some small random noise to the outcomes.

from math import exp
def generate_treatment_and_outcome(X):
n = X.shape[0]
treatments = np.zeros(n, dtype='object')
outcomes = np.zeros(n)
for i in range(n):
outcomeA, outcomeB = generate_outcomes(X.iloc[i])
effect = outcomeB - outcomeA
prob = exp(effect) / (exp(effect) + 1)
if np.random.rand() < prob:
treatments[i] = "A"
outcomes[i] = outcomeA
else:
treatments[i] = "B"
outcomes[i] = outcomeB
outcomes[i] += 0.2 * np.random.randn()
return treatments, outcomes

treatments, outcomes = generate_treatment_and_outcome(X)


We can look at some examples:

treatments[0:3], outcomes[0:3]

(array(['B', 'A', 'A'], dtype=object), array([8.17758783, 8.06663497, 7.20219289]))

## Training Optimal Prescriptive Trees

We will use a GridSearch to fit an OptimalTreePrescriptionMinimizer:

from interpretableai import iai
grid = iai.GridSearch(
iai.OptimalTreePrescriptionMinimizer(
random_seed=1,
),
max_depth=range(1, 6),
)
grid.fit(X, treatments, outcomes)
grid.get_learner()

Optimal Trees Visualization

We can see that the structure of this tree is aligned with the synthetic data generation process:

• It makes strongest predictions for treatment $A$ when $x_1 > 1$ and $x_3 > 0$ or when $x_5 > 1$ and $x_7 > 0$, as this is when indicator terms activate and the outcome for treatment $B$ is significantly higher.
• When neither indicator term is active, and $x_8$ is zero, both treatments are the same up to random noise, and so the prescriptions have roughly the same predicted outcome.
• When neither indicator term is active, and $x_8$ is one, the prescribed treatment follows the sign of $x_9$.

Based on this tree, and knowing the underlying generation process, we can be confident in the ability to recover the true mechanism behind the optimal treatment policy.

## Evaluation against synthetic data

Let's evaluate it using the synthetic ground truth. First we generate some new data according to the ground truth:

n_test = 10000
test_X = generate_X(n_test, p)
test_outcomes = np.zeros((n_test, 2))
for i in range(n_test):
test_outcomes[i, :] = generate_outcomes(test_X.iloc[i])


Now we can use our model to make predictions on the new data and compare against the truth. We consider two measures, the accuracy of our treatment predictions against the true treatments, and the R-squared of our outcome predictions against the true best outcomes.

def evaluate_synthetic(grid, X, outcomes):
best_treatment = ['A' if outcomes[i, 0] < outcomes[i, 1] else 'B'
for i in range(X.shape[0])]
best_outcome = [np.min(outcomes[i, :]) for i in range(X.shape[0])]

treatment_predictions, outcome_predictions = grid.predict(X)
treatment_accuracy = np.mean(np.array(best_treatment) ==
np.array(treatment_predictions))
outcome_accuracy = 1 - (np.sum((best_outcome - outcome_predictions) ** 2) /
np.sum((best_outcome - np.mean(best_outcome)) ** 2))

return {
'treatment_accuracy': treatment_accuracy,
'outcome_accuracy': outcome_accuracy,
}

evaluate_synthetic(grid, test_X, test_outcomes)

{'treatment_accuracy': 0.8792, 'outcome_accuracy': 0.6897916323832631}

This says that our model identifies the correct treatment roughly 90% of the time, and the $R^2$ of the predicted outcomes is about 0.7.

We can try validating over prescription_factor to see if that improves the quality of the results. Note that when we validate over prescription_factor we need to specify a validation criterion - here we choose :prediction_accuracy to validate based on the quality of the predictions made by the tree:

grid = iai.GridSearch(
iai.OptimalTreePrescriptionMinimizer(
random_seed=1,
max_depth=5,
),
prescription_factor=np.linspace(0.0, 1.0, 6),
)
grid.fit(X, treatments, outcomes, validation_criterion='prediction_accuracy')
grid.get_learner()

Optimal Trees Visualization
evaluate_synthetic(grid, test_X, test_outcomes)

{'treatment_accuracy': 0.898, 'outcome_accuracy': 0.7213461072163296}

The results improved a little further over the default value of 0.5.

## Evaluation against real data

In real applications, we don't know the counterfactuals for the test data, and so we cannot simply find the best outcome for each test point and use that as the correct treatment. In order to develop a method for fair evaluation, we recommend using the approach from Bertsimas et. al (2016), which is implemented in the PrescriptionUtils.jl package.

Unfortunately this functionality is not yet available in Python