Quick Start Guide: Optimal Policy Trees with Numeric Treatment

In this guide we will give a demonstration of how to use Optimal Policy Trees with numeric treatment options. For this example, we will use the auto-mpg dataset, where the task is usually to predict a car's fuel efficiency (in miles-per-gallon, or MPG) based on other characteristics of the car. To apply a prescriptive lens to this case, we will instead treat the amount of acceleration as a treatment that can be controlled, and try to find the value that optimizes the MPG for a given car.

Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process. For a real-world case study using similar techniques, see the grocery pricing case study.

First we load in the data and drop 6 rows with missing values:

using CSV, DataFrames
df = CSV.read("auto-mpg.csv", DataFrame, missingstring="?")
dropmissing!(df)
392×9 DataFrame
 Row │ mpg      cylinders  displacement  horsepower  weight  acceleration  mod ⋯
     │ Float64  Int64      Float64       Int64       Int64   Float64       Int ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │    18.0          8         307.0         130    3504          12.0      ⋯
   2 │    15.0          8         350.0         165    3693          11.5
   3 │    18.0          8         318.0         150    3436          11.0
   4 │    16.0          8         304.0         150    3433          12.0
   5 │    17.0          8         302.0         140    3449          10.5      ⋯
   6 │    15.0          8         429.0         198    4341          10.0
   7 │    14.0          8         454.0         220    4354           9.0
   8 │    14.0          8         440.0         215    4312           8.5
  ⋮  │    ⋮         ⋮           ⋮            ⋮         ⋮          ⋮            ⋱
 386 │    36.0          4         135.0          84    2370          13.0      ⋯
 387 │    27.0          4         151.0          90    2950          17.3
 388 │    27.0          4         140.0          86    2790          15.6
 389 │    44.0          4          97.0          52    2130          24.6
 390 │    32.0          4         135.0          84    2295          11.6      ⋯
 391 │    28.0          4         120.0          79    2625          18.6
 392 │    31.0          4         119.0          82    2720          19.4
                                                  3 columns and 377 rows omitted

Policy trees are trained using a features matrix/dataframe X as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

  • in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
  • more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

Reward Estimation

First, we split into training and testing:

X = select(df, Not(["mpg", "acceleration", "car name"]))
treatments = df.acceleration
outcomes = df.mpg

(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) =
    IAI.split_data(:policy_maximize, X, treatments, outcomes, seed=1, train_proportion=0.6)

Note that we have used a training/test split of 60%/40% split, so that we save more data for testing to ensure high-quality reward estimation on the test set.

The treatment, acceleration, is a numeric value, so we follow the process for estimating rewards with numeric treatments. We will consider prescribing acceleration values from 8 to 23, in steps of 3:

treatment_candidates = 8:3:23

We use a NumericRewardEstimator to estimate the MPG under our candidate acceleration values using an XGBoost model:

reward_lnr = IAI.NumericRewardEstimator(
    outcome_estimator=IAI.XGBoostRegressor(),
    random_seed=12345,
)

train_rewards, train_reward_score = IAI.fit_predict!(
    reward_lnr, train_X, train_treatments, train_outcomes, treatment_candidates)
train_rewards
235×6 DataFrame
 Row │ 8        11       14       17       20       23
     │ Float64  Float64  Float64  Float64  Float64  Float64
─────┼──────────────────────────────────────────────────────
   1 │ 17.5649  17.9279  16.553   16.0428  16.1734  16.2504
   2 │ 16.2367  15.7576  14.8192  14.8333  14.4565  14.4565
   3 │ 18.7436  19.1067  17.3859  16.6543  16.7998  16.8768
   4 │ 14.8891  13.05    12.356   12.3016  12.3232  12.3077
   5 │ 14.4061  13.2146  12.9017  12.691   12.9269  12.9395
   6 │ 14.8891  13.05    12.356   12.3016  12.3232  12.3077
   7 │ 13.1329  11.9413  11.6595  11.4393  11.6752  11.6878
   8 │ 14.8969  13.133   12.7138  12.5202  12.6249  12.5813
  ⋮  │    ⋮        ⋮        ⋮        ⋮        ⋮        ⋮
 229 │ 34.7262  34.7262  34.8821  31.2253  35.1343  35.1532
 230 │ 21.9048  21.9048  22.2574  21.6184  21.2935  21.1384
 231 │ 21.8871  21.8871  22.2052  21.4101  21.4086  21.2832
 232 │ 27.8975  27.8975  29.6809  28.4205  27.6739  27.5188
 233 │ 30.0696  30.0696  30.1089  29.2705  30.9828  30.957
 234 │ 26.1627  26.1627  26.3938  25.314   24.8147  24.8284
 235 │ 28.9793  28.9793  29.2104  28.1389  27.8686  27.8822
                                            220 rows omitted
train_reward_score
0.8657202076286428

We can see that the R2 of the internal regression model is 0.87, which gives us good confidence that the reward estimates are high quality, and good to base our training on.

Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes MPG. We will use a GridSearch to fit an OptimalTreePolicyMaximizer (note that if we were trying to minimize the outcomes, we would use OptimalTreePolicyMinimizer):

grid = IAI.GridSearch(
    IAI.OptimalTreePolicyMaximizer(
        random_seed=1,
    ),
    max_depth=4:5,
)
IAI.fit!(grid, train_X, train_rewards)
Fitted OptimalTreePolicyMaximizer:
  1) Split: displacement < 264.5
    2) Split: weight < 2162
      3) Split: model year < 78.5
        4) Prescribe: 17, 27 points, error 10.33
        5) Prescribe: 23, 17 points, error 3.412
      6) Split: model year < 72.5
        7) Prescribe: 8, 16 points, error 18.05
        8) Prescribe: 14, 108 points, error 15.89
    9) Prescribe: 8, 67 points, error 24.89
Optimal Trees Visualization

The resulting tree recommends different accelerations based on the characteristics of the car. The intensity of the color in each leaf shows the difference in quality between the best and second-best acceleration values.

We can see that both extremes of our candidate acceleration values are prescribed by the tree. The cars with the greatest displacement received the lowest acceleration in order to maximise the MPG. However, for lower-powered, recent, heavy cars, the highest acceleration value is the best option.

We can make treatment prescriptions using predict:

prescriptions = IAI.predict(grid, train_X)
235-element Array{String,1}:
 "8"
 "8"
 "8"
 "8"
 "8"
 "8"
 "8"
 "8"
 "8"
 "8"
 ⋮
 "23"
 "14"
 "23"
 "14"
 "14"
 "14"
 "14"
 "14"
 "14"

The prescriptions are always returned as strings matching the column names of the input rewards matrix. In our case the treatments are numeric values, and if we want them in numeric form to use later we can convert them to numeric treatments using convert_treatments_to_numeric:

IAI.convert_treatments_to_numeric(prescriptions)
235-element Array{Int64,1}:
  8
  8
  8
  8
  8
  8
  8
  8
  8
  8
  ⋮
 23
 14
 23
 14
 14
 14
 14
 14
 14

If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with predict_treatment_rank:

rank = IAI.predict_treatment_rank(grid, train_X)
235×6 Array{String,2}:
 "8"   "11"  "14"  "20"  "23"  "17"
 "8"   "11"  "14"  "20"  "23"  "17"
 "8"   "11"  "14"  "20"  "23"  "17"
 "8"   "11"  "14"  "20"  "23"  "17"
 "8"   "11"  "14"  "20"  "23"  "17"
 "8"   "11"  "14"  "20"  "23"  "17"
 "8"   "11"  "14"  "20"  "23"  "17"
 "8"   "11"  "14"  "20"  "23"  "17"
 "8"   "11"  "14"  "20"  "23"  "17"
 "8"   "11"  "14"  "20"  "23"  "17"
 ⋮                             ⋮
 "23"  "20"  "14"  "17"  "8"   "11"
 "14"  "8"   "11"  "17"  "20"  "23"
 "23"  "20"  "14"  "17"  "8"   "11"
 "14"  "8"   "11"  "17"  "20"  "23"
 "14"  "8"   "11"  "17"  "20"  "23"
 "14"  "8"   "11"  "17"  "20"  "23"
 "14"  "8"   "11"  "17"  "20"  "23"
 "14"  "8"   "11"  "17"  "20"  "23"
 "14"  "8"   "11"  "17"  "20"  "23"

For each point in the data, this gives the treatments in order of effectiveness. As before, this are returned as strings, but we can convert the treatments to numeric values with convert_treatments_to_numeric:

IAI.convert_treatments_to_numeric(rank)
235×6 Array{Int64,2}:
  8  11  14  20  23  17
  8  11  14  20  23  17
  8  11  14  20  23  17
  8  11  14  20  23  17
  8  11  14  20  23  17
  8  11  14  20  23  17
  8  11  14  20  23  17
  8  11  14  20  23  17
  8  11  14  20  23  17
  8  11  14  20  23  17
  ⋮                   ⋮
 23  20  14  17   8  11
 14   8  11  17  20  23
 23  20  14  17   8  11
 14   8  11  17  20  23
 14   8  11  17  20  23
 14   8  11  17  20  23
 14   8  11  17  20  23
 14   8  11  17  20  23
 14   8  11  17  20  23

To quantify the difference in performance behind the treatment rankings, we can use predict_treatment_outcome to extract the estimated quality of each treatment for each point:

IAI.predict_treatment_outcome(grid, train_X)
235×6 DataFrame
 Row │ 8        11       14       17       20       23
     │ Float64  Float64  Float64  Float64  Float64  Float64
─────┼──────────────────────────────────────────────────────
   1 │ 15.2432  14.6476  14.178   13.9031  13.9395  13.9393
   2 │ 15.2432  14.6476  14.178   13.9031  13.9395  13.9393
   3 │ 15.2432  14.6476  14.178   13.9031  13.9395  13.9393
   4 │ 15.2432  14.6476  14.178   13.9031  13.9395  13.9393
   5 │ 15.2432  14.6476  14.178   13.9031  13.9395  13.9393
   6 │ 15.2432  14.6476  14.178   13.9031  13.9395  13.9393
   7 │ 15.2432  14.6476  14.178   13.9031  13.9395  13.9393
   8 │ 15.2432  14.6476  14.178   13.9031  13.9395  13.9393
  ⋮  │    ⋮        ⋮        ⋮        ⋮        ⋮        ⋮
 229 │ 33.4467  33.4467  35.1713  34.3107  36.6724  36.7188
 230 │ 23.9372  23.9348  24.2459  23.6747  23.2465  23.201
 231 │ 23.9372  23.9348  24.2459  23.6747  23.2465  23.201
 232 │ 23.9372  23.9348  24.2459  23.6747  23.2465  23.201
 233 │ 23.9372  23.9348  24.2459  23.6747  23.2465  23.201
 234 │ 23.9372  23.9348  24.2459  23.6747  23.2465  23.201
 235 │ 23.9372  23.9348  24.2459  23.6747  23.2465  23.201
                                            220 rows omitted

Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

test_rewards, test_reward_score = IAI.fit_predict!(
    reward_lnr, test_X, test_treatments, test_outcomes, treatment_candidates)
test_rewards
157×6 DataFrame
 Row │ 8        11       14       17       20       23
     │ Float64  Float64  Float64  Float64  Float64  Float64
─────┼──────────────────────────────────────────────────────
   1 │ 14.6129  13.8657  13.044   13.7969  13.4663  13.4663
   2 │ 15.3354  15.6313  14.7516  15.3333  13.8839  13.9302
   3 │ 14.3205  14.935   14.2997  14.8618  13.2641  13.9901
   4 │ 14.6497  13.9025  13.044   13.7969  13.4663  13.4663
   5 │ 23.5706  23.8552  23.6826  24.5507  23.7003  23.8979
   6 │ 22.3839  22.8751  21.7781  21.7699  20.2799  20.333
   7 │ 21.596   21.596   21.0442  21.9311  21.6337  21.711
   8 │ 32.7976  32.9805  32.3563  30.4403  28.8442  29.2377
  ⋮  │    ⋮        ⋮        ⋮        ⋮        ⋮        ⋮
 151 │ 35.5521  35.7302  35.7228  37.1855  36.3021  42.1312
 152 │ 26.0676  26.0676  25.8442  25.957   26.4951  26.5813
 153 │ 31.1706  31.2501  31.2614  27.4992  27.1757  27.2066
 154 │ 28.5079  28.6777  27.5714  27.4361  25.8763  25.9877
 155 │ 38.871   38.9028  38.8375  37.9445  38.0162  38.0376
 156 │ 30.5398  30.6071  30.6185  28.6492  28.3257  28.3542
 157 │ 29.0757  29.143   29.1544  26.8099  26.4595  26.4803
                                            142 rows omitted
test_reward_score
0.7929507332526184

We see the internal model of our test reward estimator has an R2 of 0.79. As with the training set, this gives us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation.

We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted MPG under the treatments prescribed by the tree for the test set. To do this, we use predict_outcomes which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:

policy_outcomes = IAI.predict_outcomes(grid, test_X, test_rewards)
157-element Array{Float64,1}:
 14.612932205200195
 15.335400581359863
 14.320535659790039
 14.649663925170898
 23.570629119873047
 22.38387680053711
 21.59600257873535
 30.440296173095703
 24.090347290039062
 14.063908576965332
  ⋮
 33.2572021484375
 42.182464599609375
 42.13115692138672
 25.844161987304688
 31.261442184448242
 27.57144546508789
 38.037628173828125
 30.61852264404297
 29.154422760009766

We can then get the average estimated MPG under our treatments:

using Statistics
mean(policy_outcomes)
25.186681510536534

We can compare this number to the average MPG observed:

mean(test_outcomes)
24.687261146496816

We see a small improvement over the real accelerations, although it is important to point out that the real accelerations were not restricted to the same discrete increments that we used.

Another point of comparison is a baseline policy that assigns the treatment that is best on average across all training points, which we can see from the root node of the tree is an acceleration of 14:

mean(test_rewards[:, "14"])
24.680347600560278

We can see that the personalization offered by the tree policy indeed improves upon a baseline where each observation receives the same treatment.