# Quick Start Guide: Optimal Prescriptive Trees

This is an R version of the corresponding OptimalTrees quick start guide. The results are slightly different to the original guide as the data is randomly generated by R rather than Julia.

In this example we will give an example of how to use Optimal Prescriptive Trees (OPT). We will use synthetically-generated data to better understand the workings of prescriptive trees and how we can fairly evaluate performance out-of-sample in a prescriptive problem.

## Data for prescriptive problems

Prescriptive trees are trained on observational data, and require three distinct types of data:

• X: the features for each observation that can be used as the splits in the tree - this can be a matrix or a dataframe as for classification or regression problems
• treatments: the treatment applied to each observation - this is a vector of the treatment labels similar to the target in a classification problem
• outcomes: the outcome for each observation under the applied treatment - this is a vector of numeric values similar to the target in a regression problem

Refer to the documentation on data preparation for more information on the data format.

First, we will generate the features X with a mix of binary and numeric data:

set.seed(1)
n <- 1000
p <- 10

generate_X <- function(n, p) {
X <- matrix(0, n, p)
for (j in 1:p) {
if (j %% 2 == 1) {
X[, j] <- rnorm(n)
} else {
X[, j] <- rbinom(n, 1, 0.5)
}
}
df <- as.data.frame(X)
names(df) <- sapply(1:10, function(i) paste("x", i, sep = ""))
df
}

X <- generate_X(n, p)

          x1 x2         x3 x4         x5 x6         x7 x8          x9 x10
1 -0.6264538  1  0.8500435  0  0.7391149  0  1.5579537  0 -0.61882708   0
2  0.1836433  1 -0.9253130  0  0.3866087  0 -0.7292970  0 -1.10942196   0
3 -0.8356286  1  0.8935812  0  1.2963972  1 -1.5039509  0 -2.17033523   1
4  1.5952808  0 -0.9410097  1 -0.8035584  0 -0.5667870  1 -0.03130307   1
5  0.3295078  0  0.5389521  0 -1.6026257  1 -2.1044536  1 -0.26039848   1
6 -0.8204684  0 -0.1819744  0  0.9332510  0  0.5307319  1  0.53443047   0
[ reached 'max' / getOption("max.print") -- omitted 994 rows ]

We will consider data with two treatments ($A$ and $B$) where the treatment effect $y(B) - y(A)$ is given by:

$4 \Iota\{x_1 > 1\} \Iota\{x_3 > 0\} + 4 \Iota\{x_5 > 1\} \Iota\{x_7 > 0\} + 2 x_8 x_9$

where $\Iota(x)$ is the indicator function. We can then generate the outcomes as follows:

treatment_effect <- function(x) {
return (
4 * ifelse(x$x1 > 1, 1, 0) * ifelse(x$x3 > 0, 1, 0) +
4 * ifelse(x$x5 > 1, 1, 0) * ifelse(x$x7 > 0, 1, 0) +
2 * x$x8 * x$x9
)
}

generate_outcomes <- function(x) {
baseline <- 10
effect <- treatment_effect(x)
yA <- baseline - 0.5 * effect
yB <- baseline + 0.5 * effect
list(A = yA, B = yB)
}


Let's look at some examples:

sapply(1:10, function(i) generate_outcomes(X[i,]))

  [,1] [,2] [,3] [,4]     [,5]     [,6]     [,7]     [,8]     [,9] [,10]
A 10   10   10   10.0313  10.2604  9.46557  8.559439 8.39163  8    10
B 10   10   10   9.968697 9.739602 10.53443 11.44056 11.60837 12   10

This gives us the ground truth for the data. We will assume that a lower outcome is more desirable, and so the optimal treatment is the one with the lowest outcome.

Now, we will generate some realistic training data. In real observational data, we usually see that the treatments are not assigned optimally nor entirely randomly. Instead, the treatments are somewhere inbetween: the treatment assignments are generally in the direction of optimality but with mistakes as well. To imitate this phenomenon with our synthetic data, we will assign the treatments randomly, but with the chance of receiving the optimal treatment depending on how far apart the outcomes for the two treatments are. In this way, the more obvious the choice of best treatment is in the ground truth, the more likely the training data will assign this treatment. The exact approach we use is:

$P(T = A) = \frac{e^{y(B) - y(A)}}{e^{y(B) - y(A)} + 1}$

In this way, the higher (worse) the impact of the treatment $B$, the lower the probability of being assigned $B$, and vice-versa.

The following code generates the treatment assignments and corresponding outcomes. We also add some small random noise to the outcomes.

generate_treatment_and_outcome <- function(X) {
n <- nrow(X)
treatments <- character(n)
outcomes <- numeric(n)
for (i in 1:n) {
outcome <- generate_outcomes(X[i,])
effect <- outcome$B - outcome$A
prob <- exp(effect) / (exp(effect) + 1)
if (runif(1) < prob) {
treatments[i] <- "A"
outcomes[i] <- outcome$A } else { treatments[i] <- "B" outcomes[i] <- outcome$B
}
outcomes[i] = outcomes[i] + 0.2 * rnorm(1)
}
list(treatments = treatments, outcomes = outcomes)
}

target <- generate_treatment_and_outcome(X)


## Training Optimal Prescriptive Trees

We will use a grid_search to fit an optimal_tree_prescription_minimizer:

grid <- iai::grid_search(
iai::optimal_tree_prescription_minimizer(
random_seed = 1,
),
max_depth = 1:5,
)
iai::fit(grid, X, target$treatments, target$outcomes)
iai::get_learner(grid)

Optimal Trees Visualization

We can see that the structure of this tree is aligned with the synthetic data generation process:

• It makes strongest predictions for treatment $A$ when $x_1 > 1$ and $x_3 > 0$ or when $x_5 > 1$ and $x_7 > 0$, as this is when indicator terms activate and the outcome for treatment $B$ is significantly higher.
• When neither indicator term is active, and $x_8$ is zero, both treatments are the same up to random noise, and so the prescriptions have roughly the same predicted outcome.
• When neither indicator term is active, and $x_8$ is one, the prescribed treatment follows the sign of $x_9$.

Based on this tree, and knowing the underlying generation process, we can be confident in the ability to recover the true mechanism behind the optimal treatment policy.

## Evaluation against synthetic data

Let's evaluate it using the synthetic ground truth. First we generate some new data according to the ground truth:

n_test <- 10000
test_X <- generate_X(n_test, p)
test_outcomes <- matrix(0, n_test, 2)
for (i in 1:n_test) {
outcome <- generate_outcomes(test_X[i,])
test_outcomes[i,] <- c(outcome$A, outcome$B)
}
test_outcomes


Now we can use our model to make predictions on the new data and compare against the truth. We consider two measures, the accuracy of our treatment predictions against the true treatments, and the R-squared of our outcome predictions against the true best outcomes.

evaluate_synthetic <- function(grid, X, outcomes) {
n_test <- nrow(X)
best_treatment <- sapply(1:n_test, function(i) {
ifelse(outcomes[i, 1] <= outcomes[i, 2], "A", "B")
})
best_outcome <- apply(outcomes, 1, min)

pred <- iai::predict(grid, X)
treatment_accuracy <- mean(best_treatment == pred$treatments) outcome_accuracy <- 1 - (sum((best_outcome - pred$outcomes) ^ 2) /
sum((best_outcome - mean(best_outcome)) ^ 2))

list(
treatment_accuracy = treatment_accuracy,
outcome_accuracy = outcome_accuracy
)
}

evaluate_synthetic(grid, test_X, test_outcomes)

$treatment_accuracy [1] 0.9269$outcome_accuracy
[1] 0.7268336

This says that our model identifies the correct treatment roughly 90% of the time, and the $R^2$ of the predicted outcomes is about 0.7.

We can try validating over prescription_factor to see if that improves the quality of the results. Note that when we validate over prescription_factor we need to specify a validation criterion - here we choose :prediction_accuracy to validate based on the quality of the predictions made by the tree:

grid <- iai::grid_search(
iai::optimal_tree_prescription_minimizer(
random_seed = 1,
max_depth = 5,
),
prescription_factor = seq(0, 1, 0.2),
)
iai::fit(grid, X, target$treatments, target$outcomes,
validation_criterion = "prediction_accuracy")
iai::get_learner(grid)

Optimal Trees Visualization