Quick Start Guide: Optimal Policy Trees with Categorical Treatment

This is an R version of the corresponding OptimalTrees quick start guide.

In this guide we will give a demonstration of how to use Optimal Policy Trees with categoric treatment options. For this example, we will use the Credit Approval dataset, where the task is to predict the approval outcome of credit card application. Because the features have been anonymized for confidentiality reasons, we will arbitrarily select one of the categoric variables A1 to be the treatment.

Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process.

First we load in the data and drop 37 rows with missing values:

df <- read.csv("crx.data", na.strings = "?", stringsAsFactors = T, header = F,
               col.names = paste("A", as.character(seq(1, 16)), sep = ""))
df <- na.omit(df)
  A1    A2   A3 A4 A5 A6 A7   A8 A9 A10 A11 A12 A13 A14 A15 A16
1  b 30.83 0.00  u  g  w  v 1.25  t   t   1   f   g 202   0   +
2  a 58.67 4.46  u  g  q  h 3.04  t   t   6   f   g  43 560   +
3  a 24.50 0.50  u  g  q  h 1.50  t   f   0   f   g 280 824   +
 [ reached 'max' / getOption("max.print") -- omitted 650 rows ]

Policy trees are trained using a features matrix/dataframe X as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

  • in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
  • more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

Reward Estimation

First, we split into training and testing:

X <- subset(df, select = -c(A1, A16))
treatments <- df$A1
outcomes <- df$A16 == "+"

split <- iai::split_data("policy_maximize", X, treatments, outcomes, seed = 123,
                         train_proportion = 0.5)
train_X <- split$train$X
train_treatments <- split$train$treatments
train_outcomes <- split$train$outcomes
test_X <- split$test$X
test_treatments <- split$test$treatments
test_outcomes <- split$test$outcomes

Note that we have used a training/test split of so that we save more data for testing to ensure high-quality reward estimation on the test set.

The treatment, A1, is a categoric variable, so we follow the process for estimating rewards with categorical treatments.

Our outcome is binary, so we use a categorical_classification_reward_estimator to estimate the approval outcome under each option with a doubly-robust reward estimation method using random forests to estimate both propensity scores and outcomes:

reward_lnr <- iai::categorical_classification_reward_estimator(
    propensity_estimator = iai::random_forest_classifier(),
    outcome_estimator = iai::random_forest_classifier(),
    reward_estimator = "doubly_robust",
    random_seed = 123,

)
train_rewards <- iai::fit_predict(
    reward_lnr, train_X, train_treatments, train_outcomes,
    propensity_score_criterion = "auc", outcome_score_criterion = "auc")
train_rewards$predictions$reward
           a         b
1  1.1609865 1.0000000
2  0.5100000 1.0022072
3  0.4100000 1.2916320
4  0.5900000 1.0829196
5  1.2115021 0.7800000
6  0.3100000 1.0981339
7  0.7000000 1.0522586
8  0.0400000 1.1824658
9  2.8636094 0.5845882
10 0.3133333 1.8828948
11 1.4473910 0.9000000
12 1.4753706 0.9800000
13 0.5500000 1.2755671
14 1.1608481 0.8500000
15 1.7143931 0.9800000
16 1.4989734 0.8650000
17 0.7600000 1.0504100
18 0.7400000 1.0799578
19 2.3958655 0.8466667
20 0.6100000 1.0458142
21 0.5600000 1.1366273
22 0.8600000 1.0093915
23 0.8600000 1.0154764
24 0.6600000 1.0000000
25 1.3401133 0.8600000
26 0.8200000 1.0539866
27 0.1300000 1.4095652
28 0.6100000 1.1105408
29 0.4700000 1.1885233
30 0.5433333 1.0583056
 [ reached 'max' / getOption("max.print") -- omitted 296 rows ]
train_rewards$score
$propensity
[1] 0.6129726

$outcome
$outcome$b
[1] 0.9197003

$outcome$a
[1] 0.9557617

We can see that the internal outcome estimation models have AUCs of 0.92 and 0.96, which gives us confidence that the reward estimates are of decent quality, and good to base our training on. The AUC for the propensity model is lower at 0.61, which is not particularly high, and suggests difficulty in reliably estimating the propensity. The doubly-robust estimation method should help to alleviate this problem, as it is designed to deliver good results if either propensity scores or outcomes are estimated well. However, we should always pay attention to these scores and proceed with caution if the estimation quality is low.

Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes credit approval outcome. We will use a grid_search to fit an optimal_tree_policy_maximizer (note that if we were trying to minimize the outcomes, we would use optimal_tree_policy_minimizer):

grid <- iai::grid_search(
    iai::optimal_tree_policy_maximizer(
        random_seed = 121,
        max_categoric_levels_before_warning = 20,
    ),
    max_depth = 1:5,
)
iai::fit(grid, train_X, train_rewards$predictions$reward)
iai::get_learner(grid)
Fitted OptimalTreePolicyMaximizer:
  1) Split: A8 < 1.438
    2) Split: A6 in [aa,c,cc,ff,j,k,w,x] or is missing
      3) Prescribe: a, 127 points, error 5.262
      4) Prescribe: b, 56 points, error 5.255
    5) Split: A6 in [d,e,x]
      6) Prescribe: a, 21 points, error 4.801
      7) Prescribe: b, 122 points, error 4.942
Optimal Trees Visualization

The resulting tree recommends different values for A1 based on other characteristics: namely, the values of A6 and A8. The intensity of the color in each leaf shows the difference in quality between the best and second-best values. We can see that both a and b values are prescribed by the tree, implying each treatment is better suited to certain subgroups.

We can make treatment prescriptions using predict:

iai::predict(grid, train_X)
 [1] "b" "b" "b" "b" "a" "b" "b" "b" "a" "a" "b" "b" "b" "a" "b" "a" "a" "b" "a"
[20] "b" "a" "b" "b" "b" "b" "b" "b" "a" "a" "b" "b" "a" "b" "a" "b" "b" "b" "a"
[39] "b" "b" "a" "b" "a" "b" "b" "a" "b" "b" "a" "a" "b" "b" "a" "b" "b" "a" "b"
[58] "a" "b" "a"
 [ reached getOption("max.print") -- omitted 266 entries ]

If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with predict_treatment_rank:

iai::predict_treatment_rank(grid, train_X)
       [,1] [,2]
  [1,] "b"  "a"
  [2,] "b"  "a"
  [3,] "b"  "a"
  [4,] "b"  "a"
  [5,] "a"  "b"
  [6,] "b"  "a"
  [7,] "b"  "a"
  [8,] "b"  "a"
  [9,] "a"  "b"
 [10,] "a"  "b"
 [11,] "b"  "a"
 [12,] "b"  "a"
 [13,] "b"  "a"
 [14,] "a"  "b"
 [15,] "b"  "a"
 [16,] "a"  "b"
 [17,] "a"  "b"
 [18,] "b"  "a"
 [19,] "a"  "b"
 [20,] "b"  "a"
 [21,] "a"  "b"
 [22,] "b"  "a"
 [23,] "b"  "a"
 [24,] "b"  "a"
 [25,] "b"  "a"
 [26,] "b"  "a"
 [27,] "b"  "a"
 [28,] "a"  "b"
 [29,] "a"  "b"
 [30,] "b"  "a"
 [ reached getOption("max.print") -- omitted 296 rows ]

To quantify the difference in performance behind the treatment rankings, we can use predict_treatment_outcome to extract the estimated quality of each treatment for each point:

iai::predict_treatment_outcome(grid, train_X)
           a         b
1  0.4081471 0.7451901
2  0.4081471 0.7451901
3  0.4081471 0.7451901
4  0.4081471 0.7451901
5  0.4257366 0.2611338
6  0.4081471 0.7451901
7  0.4081471 0.7451901
8  0.4081471 0.7451901
9  0.4257366 0.2611338
10 0.4257366 0.2611338
11 0.4081471 0.7451901
12 0.4081471 0.7451901
13 0.1286056 0.4327770
14 0.4257366 0.2611338
15 0.4081471 0.7451901
16 0.8862802 0.6930369
17 0.8862802 0.6930369
18 0.4081471 0.7451901
19 0.8862802 0.6930369
20 0.4081471 0.7451901
21 0.4257366 0.2611338
22 0.4081471 0.7451901
23 0.4081471 0.7451901
24 0.4081471 0.7451901
25 0.1286056 0.4327770
26 0.4081471 0.7451901
27 0.4081471 0.7451901
28 0.4257366 0.2611338
29 0.4257366 0.2611338
30 0.4081471 0.7451901
 [ reached 'max' / getOption("max.print") -- omitted 296 rows ]

We can also extract the standard errors of the outcome estimates with predict_treatment_outcome_standard_error:

iai::predict_treatment_outcome_standard_error(grid, train_X)
            a          b
1  0.07868169 0.04520248
2  0.07868169 0.04520248
3  0.07868169 0.04520248
4  0.07868169 0.04520248
5  0.07341455 0.04453778
6  0.07868169 0.04520248
7  0.07868169 0.04520248
8  0.07868169 0.04520248
9  0.07341455 0.04453778
10 0.07341455 0.04453778
11 0.07868169 0.04520248
12 0.07868169 0.04520248
13 0.09160953 0.08040624
14 0.07341455 0.04453778
15 0.07868169 0.04520248
16 0.12819943 0.11603045
17 0.12819943 0.11603045
18 0.07868169 0.04520248
19 0.12819943 0.11603045
20 0.07868169 0.04520248
21 0.07341455 0.04453778
22 0.07868169 0.04520248
23 0.07868169 0.04520248
24 0.07868169 0.04520248
25 0.09160953 0.08040624
26 0.07868169 0.04520248
27 0.07868169 0.04520248
28 0.07341455 0.04453778
29 0.07341455 0.04453778
30 0.07868169 0.04520248
 [ reached 'max' / getOption("max.print") -- omitted 296 rows ]

These standard errors can be combined with the outcome estimates to construct confidence intervals in the usual way.

Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

test_rewards <- iai::fit_predict(
    reward_lnr, test_X, test_treatments, test_outcomes, propensity_score_criterion = "auc", outcome_score_criterion = "auc")
test_rewards$predictions$reward
           a         b
1  0.5980000 1.1682769
2  1.1949552 0.7066667
3  0.7300000 1.1716895
4  0.8200000 1.0807236
5  0.7400000 1.0610863
6  0.6200000 1.1972140
7  0.8200000 1.0133024
8  0.7000000 1.3082608
9  1.1168547 0.5662857
10 1.4906058 0.6350000
11 1.3933441 0.7600000
12 0.9300000 1.0271075
13 0.8700000 1.0411053
14 0.9000000 1.0109207
15 0.7900000 1.1030189
16 1.0502922 0.7700000
17 0.8100000 1.0516286
18 1.5650567 0.7400000
19 0.8900000 1.0895652
20 1.0000000 1.0479828
21 0.6100000 1.1246575
22 0.8600000 1.0867130
23 0.1666667 1.0987173
24 0.0500000 1.3546777
25 2.2272664 0.5403333
26 0.5700000 1.1915573
27 0.5992308 1.0911696
28 0.8200000 1.1856710
29 1.2418337 0.7200000
30 1.1950315 0.6650000
 [ reached 'max' / getOption("max.print") -- omitted 297 rows ]
test_rewards$score
$propensity
[1] 0.6226425

$outcome
$outcome$b
[1] 0.9240666

$outcome$a
[1] 0.9219694

We see the scores are similar to those on the training set, giving us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation. The AUC for the propensity model is again on the low side, but should be compensated for by means of the doubly-robust estimator.

We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted credit approval probability under the treatments prescribed by the tree for the test set. To do this, we use predict_outcomes which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:

policy_outcomes <- iai::predict_outcomes(grid, test_X,
                                         test_rewards$predictions$reward)
 [1]  0.5980000  0.7066667  1.1716895  1.0807236  0.7400000  1.1972140
 [7]  1.0133024  0.7000000  0.5662857  0.6350000  0.7600000  0.9300000
[13]  1.0411053  0.9000000  1.1030189  0.7700000  1.0516286  1.5650567
[19]  1.0895652  1.0479828  0.6100000  1.0867130  1.0987173  0.0500000
[25]  2.2272664  0.5700000  0.5992308  1.1856710  0.7200000  0.6650000
[31]  1.0561451  0.9800000  0.9080000 -0.1526077 -0.6270308 -0.9238978
[37]  0.5992308 -0.3121622 -0.1988931 -0.2028239 -0.1043621 -0.6400294
[43] -0.3122713 -1.3005882 -1.1701211 -0.2432750 -0.9434065  0.6700000
[49]  0.7100000  0.6966190  1.0387439  0.8700000  0.8866667  0.8700000
[55]  0.9600000  1.0514286  1.6871259  3.7718997  1.1604150  1.0239768
 [ reached getOption("max.print") -- omitted 267 entries ]

We can then get the average estimated approval probability under our treatments:

mean(policy_outcomes)
[1] 0.4111989

We can compare this number to the average approval probability under the treatment assignments that were actually observed:

evaluate_reward_mean <- function(treatments, rewards) {
  total <- 0.0
  for (i in seq(treatments)) {
    total <- total + rewards[i, treatments[i]]
  }
  total / length(treatments)
}
evaluate_reward_mean(test_treatments, test_rewards$predictions$reward)
[1] 0.4074035

We see this policy is about the same as the real treatments.