Quick Start Guide: Optimal Policy Trees with Numeric Treatment

This is an R version of the corresponding OptimalTrees quick start guide.

In this example we will give a demonstration of how to use Optimal Policy Trees with numeric treatment options. For this example, we will the auto-mpg dataset, where the task is usually to predict a car's fuel efficiency (in miles-per-gallon, or MPG) based on other characteristics of the car. To apply a prescriptive lens to this case, we will instead treat the amount of acceleration as a treatment that can be controlled, and try to find the value that optimizes the MPG for a given car.

Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process. For a real-world case study using similar techniques, see the grocery pricing case study.

First we load in the data and drop 6 rows with missing values:

df <- na.omit(read.csv("auto-mpg.csv", na.strings = "?"))

  mpg cylinders displacement horsepower weight acceleration model.year origin
1  18         8          307        130   3504         12.0         70      1
2  15         8          350        165   3693         11.5         70      1
3  18         8          318        150   3436         11.0         70      1
4  16         8          304        150   3433         12.0         70      1
5  17         8          302        140   3449         10.5         70      1
6  15         8          429        198   4341         10.0         70      1
                   car.name
1 chevrolet chevelle malibu
2         buick skylark 320
3        plymouth satellite
4             amc rebel sst
5               ford torino
6          ford galaxie 500
 [ reached 'max' / getOption("max.print") -- omitted 386 rows ]

Policy trees are trained using a features matrix/dataframe X as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

Reward Estimation

Warning

Please refer to the Reward Estimation documentation for a detailed description on how to perform reward estimation for numeric treatments properly. For simplicity and to keep the focus on Optimal Policy Trees, this quick start guide does not cover tuning the reward estimation parameters, but in real problems this tuning is an important step.

First, we split into training and testing:

X <- subset(df, select = -c(mpg, acceleration, car.name))
treatments <- df$acceleration
outcomes <- df$mpg

split <- iai::split_data("policy_maximize", X, treatments, outcomes,
                         seed = 123, train_proportion = 0.5)
train_X <- split$train$X
train_treatments <- split$train$treatments
train_outcomes <- split$train$outcomes
test_X <- split$test$X
test_treatments <- split$test$treatments
test_outcomes <- split$test$outcomes

Note that we have used a training/test split of 60%/40%, so that we save more data for testing to ensure high-quality reward estimation on the test set.

The treatment, acceleration, is a numeric value, so we follow the process for estimating rewards with numeric treatments. We will consider prescribing acceleration values from 11 to 20, in steps of 3:

treatment_candidates <- seq(11, 20, 3)

The outcome is continuous, so we use a numeric_regression_reward_estimator to estimate the MPG under our candidate acceleration values using a random forest model:

reward_lnr = iai::numeric_regression_reward_estimator(
    propensity_estimator = iai::random_forest_regressor(),
    outcome_estimator = iai::random_forest_regressor(),
    reward_estimator = "doubly_robust",
    propensity_min_value = 0.1,
    random_seed = 1,
)
train_rewards <- iai::fit_predict(reward_lnr, train_X, train_treatments,
                                  train_outcomes, treatment_candidates)
train_rewards$predictions$reward

         11        14       17       20
1  15.51925 14.321092 15.90945 15.71609
2  17.98429 17.068355 16.18260 16.68442
3  15.19930 11.280400 14.17194 15.82092
4  13.61431 11.021572 14.17194 15.82092
5  14.04781 11.247633 14.50928 16.11677
6  15.05722 15.595349 16.19910 16.68442
7  15.03142 13.373021 14.79301 16.01834
8  13.32171 15.075400 18.35853 18.22019
9  21.83795 22.238509 21.34982 18.91490
10 22.72739 17.816080 18.46572 20.29285
11 22.84085 24.624090 20.63584 19.50574
12 25.97438 28.254161 32.75685 24.65389
13 22.41439 23.645034 22.50530 25.11956
14 12.88894 -1.422647 15.23888 15.95957
15 14.33635  9.438246 14.33904 16.68442
 [ reached 'max' / getOption("max.print") -- omitted 181 rows ]

train_rewards$score$propensity

$`17`
[1] 0.1205333

$`20`
[1] 0.200617

$`11`
[1] 0.2871446

$`14`
[1] 0.1653518

train_rewards$score$outcome

$`17`
[1] 0.8642112

$`20`
[1] 0.2502764

$`11`
[1] 0.6539758

$`14`
[1] 0.8337176

We can see that the R2 of the internal outcome models for each candidate treatment are between 0.25 and 0.86, whereas the propensity models have R2 between 0.12 and 0.29. The non-zero propensity scores tell us that there is a small amount of treatment assignment bias in this data (keeping in mind that we arbitrarily selected a feature to use as the treatment column). It seems we can predict the outcomes reasonably well, so we have a good base for the reward estimates.

Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes MPG. We will use a grid_search to fit an optimal_tree_policy_maximizer (note that if we were trying to minimize the outcomes, we would use optimal_tree_policy_minimizer):

grid <- iai::grid_search(
    iai::optimal_tree_policy_maximizer(
        random_seed = 1,
        minbucket = 15,
    ),
    max_depth = 4:5,
)
iai::fit(grid, train_X, train_rewards$predictions$reward)
iai::get_learner(grid)

Optimal Trees Visualization

The resulting tree recommends different accelerations based on the characteristics of the car. The intensity of the color in each leaf shows the difference in quality between the best and second-best acceleration values.

We can see that a variety of our candidate acceleration values are prescribed by the tree. For instance, older cars with higher horsepower receive the highest acceleration.

We can make treatment prescriptions using predict:

prescriptions <- iai::predict(grid, train_X)

The prescriptions are always returned as strings matching the column names of the input rewards matrix. In our case the treatments are numeric values, and if we want them in numeric form to use later we can convert them to numeric treatments using convert_treatments_to_numeric:

iai::convert_treatments_to_numeric(prescriptions)

 [1] 20 20 20 20 20 20 20 20 20 20 17 17 20 20 20 17 20 20 17 20 20 20 17 20 17
[26] 17 17 17 17 17 17 17 20 20 20 20 20 20 20 20 17 17 17 20 17 20 20 20 20 20
[51] 14 14 14 14 20 20 20 14 20 20
 [ reached getOption("max.print") -- omitted 136 entries ]

If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with predict_treatment_rank:

rank <- iai::predict_treatment_rank(grid, train_X)

       [,1] [,2] [,3] [,4]
  [1,] "20" "11" "17" "14"
  [2,] "20" "11" "17" "14"
  [3,] "20" "11" "17" "14"
  [4,] "20" "11" "17" "14"
  [5,] "20" "11" "17" "14"
  [6,] "20" "11" "17" "14"
  [7,] "20" "11" "17" "14"
  [8,] "20" "11" "17" "14"
  [9,] "20" "11" "17" "14"
 [10,] "20" "11" "17" "14"
 [11,] "17" "14" "11" "20"
 [12,] "17" "14" "11" "20"
 [13,] "20" "11" "17" "14"
 [14,] "20" "11" "17" "14"
 [15,] "20" "11" "17" "14"
 [ reached getOption("max.print") -- omitted 181 rows ]

For each point in the data, this gives the treatments in order of effectiveness. As before, this are returned as strings, but we can convert the treatments to numeric values with convert_treatments_to_numeric:

iai::convert_treatments_to_numeric(rank)

       [,1] [,2] [,3] [,4]
  [1,]   20   11   17   14
  [2,]   20   11   17   14
  [3,]   20   11   17   14
  [4,]   20   11   17   14
  [5,]   20   11   17   14
  [6,]   20   11   17   14
  [7,]   20   11   17   14
  [8,]   20   11   17   14
  [9,]   20   11   17   14
 [10,]   20   11   17   14
 [11,]   17   14   11   20
 [12,]   17   14   11   20
 [13,]   20   11   17   14
 [14,]   20   11   17   14
 [15,]   20   11   17   14
 [ reached getOption("max.print") -- omitted 181 rows ]

To quantify the difference in performance behind the treatment rankings, we can use predict_treatment_outcome to extract the estimated quality of each treatment for each point:

iai::predict_treatment_outcome(grid, train_X)

         11       14       17       20
1  17.02099 14.75048 16.92219 18.28656
2  17.02099 14.75048 16.92219 18.28656
3  17.02099 14.75048 16.92219 18.28656
4  17.02099 14.75048 16.92219 18.28656
5  17.02099 14.75048 16.92219 18.28656
6  17.02099 14.75048 16.92219 18.28656
7  17.02099 14.75048 16.92219 18.28656
8  17.02099 14.75048 16.92219 18.28656
9  17.02099 14.75048 16.92219 18.28656
10 17.02099 14.75048 16.92219 18.28656
11 24.90946 27.00216 27.47972 21.37083
12 24.90946 27.00216 27.47972 21.37083
13 17.02099 14.75048 16.92219 18.28656
14 17.02099 14.75048 16.92219 18.28656
15 17.02099 14.75048 16.92219 18.28656
 [ reached 'max' / getOption("max.print") -- omitted 181 rows ]

We can also extract the standard errors of the outcome estimates with predict_treatment_outcome_standard_error:

iai::predict_treatment_outcome_standard_error(grid, train_X)

          11        14        17        20
1  0.8035298 1.0631867 0.7700922 0.7108964
2  0.8035298 1.0631867 0.7700922 0.7108964
3  0.8035298 1.0631867 0.7700922 0.7108964
4  0.8035298 1.0631867 0.7700922 0.7108964
5  0.8035298 1.0631867 0.7700922 0.7108964
6  0.8035298 1.0631867 0.7700922 0.7108964
7  0.8035298 1.0631867 0.7700922 0.7108964
8  0.8035298 1.0631867 0.7700922 0.7108964
9  0.8035298 1.0631867 0.7700922 0.7108964
10 0.8035298 1.0631867 0.7700922 0.7108964
11 0.5065745 0.9721001 1.4637801 3.2545938
12 0.5065745 0.9721001 1.4637801 3.2545938
13 0.8035298 1.0631867 0.7700922 0.7108964
14 0.8035298 1.0631867 0.7700922 0.7108964
15 0.8035298 1.0631867 0.7700922 0.7108964
 [ reached 'max' / getOption("max.print") -- omitted 181 rows ]

These standard errors can be combined with the outcome estimates to construct confidence intervals in the usual way.

Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

test_rewards <- iai::fit_predict(reward_lnr, test_X, test_treatments,
                                 test_outcomes, treatment_candidates)
test_rewards$predictions$reward

         11        14        17        20
1  18.82648 17.900647  15.27409  16.59605
2  18.58371 17.337311  15.36507  16.85433
3  16.83168 17.759072  15.45162  17.11098
4  14.83714 12.413611  14.33168  16.15389
5  15.88567 13.500115  14.05962  16.24789
6  14.91925 16.836939  14.90004  16.21220
7  28.65420 23.726073  23.45553  23.74469
8  28.76123 26.935723  25.24217  24.94676
9  26.26507 23.819701  30.65151  23.93741
10 26.81575 21.021514  24.82638  21.05603
11 25.18304 25.036279  22.28024  20.38562
12 19.58809 21.393769  21.08518  20.55536
13 14.41206  4.447412  15.31462  18.35806
14 14.15102 13.431400 -10.75589 -20.27758
15 27.03075 26.650123  27.35851  28.31963
 [ reached 'max' / getOption("max.print") -- omitted 181 rows ]

test_rewards$score$propensity

$`17`
[1] 0.1580786

$`20`
[1] 0.2672843

$`11`
[1] 0.3930468

$`14`
[1] 0.217413

test_rewards$score$outcome

$`17`
[1] 0.705786

$`20`
[1] 0.7703811

$`11`
[1] 0.4707763

$`14`
[1] 0.8866579

We see the internal models of our test reward estimator have scores that are similar to those on the test set. As with the training set, this gives us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation.

We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted MPG under the treatments prescribed by the tree for the test set. To do this, we use predict_outcomes which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:

policy_outcomes <- iai::predict_outcomes(grid, test_X,
                                         test_rewards$predictions$reward)

 [1]  16.5960521  16.8543325  17.1109804  16.1538891  16.2478888  16.2121968
 [7]  23.7446893  25.2421742  30.6515066  24.8263822  20.3856155  21.0851790
[13]  18.3580644 -20.2775797  27.3585081  19.5190238  18.1880693  18.5801374
[19]  18.3580644  15.7020594  15.7020594  16.2478888  18.6640016  17.8342597
[25]  22.1556307  30.9552814  23.8479498  22.6997853  -0.2941414  18.3580644
[31]  16.3788466  15.7020594  18.3580644  16.4519758  16.5286969  16.5286969
[37]  19.1528634  16.7992609  25.0225700  17.1740027  15.6819372  13.6194815
[43]  16.2892893  16.2892893  15.6819372  16.5286969  19.7528795  19.9767409
[49]  16.2892893  16.2892893  16.6319255  26.6297141  26.2681754  24.8270815
[55]  24.0698631  19.1903106  16.5286969  26.8281826  24.8245924  13.3755188
 [ reached getOption("max.print") -- omitted 136 entries ]

We can then get the average estimated MPG under our treatments:

mean(policy_outcomes)

[1] 23.93173

We can compare this number to the average estimated MPG under a baseline policy that assigns the same treatment to all observations regardless of their features. Let us compare to a policy that assigns 17 everywhere:

mean(test_rewards$predictions$reward[,"17"])

[1] 23.32036

We can see that the personalization offered by the tree policy indeed improves upon a baseline where each observation receives the same treatment.