Quick Start Guide: Optimal Policy Trees with Numeric Treatment

In this guide we will give a demonstration of how to use Optimal Policy Trees with numeric treatment options. For this example, we will use the auto-mpg dataset, where the task is usually to predict a car's fuel efficiency (in miles-per-gallon, or MPG) based on other characteristics of the car. To apply a prescriptive lens to this case, we will instead treat the amount of acceleration as a treatment that can be controlled, and try to find the value that optimizes the MPG for a given car.

Note: this case is not intended to serve as a practical application of policy trees, but rather to serve as an illustration of the training and evaluation process. For a real-world case study using similar techniques, see the grocery pricing case study.

First we load in the data and drop 6 rows with missing values:

using CSV, DataFrames
df = CSV.read("auto-mpg.csv", DataFrame, missingstring="?")
dropmissing!(df)
392×9 DataFrame
 Row │ mpg      cylinders  displacement  horsepower  weight  acceleration  mod ⋯
     │ Float64  Int64      Float64       Int64       Int64   Float64       Int ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │    18.0          8         307.0         130    3504          12.0      ⋯
   2 │    15.0          8         350.0         165    3693          11.5
   3 │    18.0          8         318.0         150    3436          11.0
   4 │    16.0          8         304.0         150    3433          12.0
   5 │    17.0          8         302.0         140    3449          10.5      ⋯
   6 │    15.0          8         429.0         198    4341          10.0
   7 │    14.0          8         454.0         220    4354           9.0
   8 │    14.0          8         440.0         215    4312           8.5
  ⋮  │    ⋮         ⋮           ⋮            ⋮         ⋮          ⋮            ⋱
 386 │    36.0          4         135.0          84    2370          13.0      ⋯
 387 │    27.0          4         151.0          90    2950          17.3
 388 │    27.0          4         140.0          86    2790          15.6
 389 │    44.0          4          97.0          52    2130          24.6
 390 │    32.0          4         135.0          84    2295          11.6      ⋯
 391 │    28.0          4         120.0          79    2625          18.6
 392 │    31.0          4         119.0          82    2720          19.4
                                                  3 columns and 377 rows omitted

Policy trees are trained using a features matrix/dataframe X as usual and a rewards matrix that has one column for each potential treatment that contains the outcome for each sample under that treatment.

There are two ways to get this rewards matrix:

  • in rare cases, the problem may have full information about the outcome associated with each treatment for each sample
  • more commonly, we have observational data, and use this partial data to train models to estimate the outcome associated with each treatment

Refer to the documentation on data preparation for more information on the data format.

In this case, the dataset is observational, and so we will use RewardEstimation to estimate our rewards matrix.

Reward Estimation

Warning

Please refer to the Reward Estimation documentation for a detailed description on how to perform reward estimation for numeric treatments properly. For simplicity and to keep the focus on Optimal Policy Trees, this quick start guide does not cover tuning the reward estimation parameters, but in real problems this tuning is an important step.

First, we split into training and testing:

X = select(df, Not(["mpg", "acceleration", "car name"]))
treatments = df.acceleration
outcomes = df.mpg

(train_X, train_treatments, train_outcomes), (test_X, test_treatments, test_outcomes) =
    IAI.split_data(:policy_maximize, X, treatments, outcomes, seed=123, train_proportion=0.5)

Note that we have used a training/test split of 60%/40%, so that we save more data for testing to ensure high-quality reward estimation on the test set.

The treatment, acceleration, is a numeric value, so we follow the process for estimating rewards with numeric treatments. We will consider prescribing acceleration values from 11 to 20, in steps of 3:

treatment_candidates = 11:3:20

The outcome is continuous, so we use a NumericRegressionRewardEstimator to estimate the MPG under our candidate acceleration values using a Random Forest model:

reward_lnr = IAI.NumericRegressionRewardEstimator(
    propensity_estimator=IAI.RandomForestRegressor(),
    outcome_estimator=IAI.RandomForestRegressor(),
    reward_estimator=:doubly_robust,
    propensity_min_value=0.1,
    random_seed=1,
)

train_rewards, train_reward_score = IAI.fit_predict!(
    reward_lnr, train_X, train_treatments, train_outcomes, treatment_candidates)
train_rewards
196×4 DataFrame
 Row │ 11       14       17       20
     │ Float64  Float64  Float64  Float64
─────┼────────────────────────────────────
   1 │ 15.6118  14.3241  14.9834  15.8865
   2 │ 17.8902  16.7067  15.3445  16.376
   3 │ 15.2109  11.3134  11.7976  16.0573
   4 │ 13.5523  11.0376  11.7976  16.0573
   5 │ 14.0292  11.4518  11.6567  16.3451
   6 │ 15.0426  15.6616  14.9762  16.3087
   7 │ 15.033   13.5642  13.4928  16.2574
   8 │ 13.2789  15.0676  18.1299  17.5095
  ⋮  │    ⋮        ⋮        ⋮        ⋮
 190 │ 25.0931  23.2679  21.857   27.8922
 191 │ 32.061   36.0154  34.4012  31.4615
 192 │ 36.3476  34.7843  42.7636  29.9423
 193 │ 31.9912  39.3689  32.1998  31.1547
 194 │ 35.6538  40.1882  34.9135  32.5599
 195 │ 19.6372  19.0582  35.7369  20.3832
 196 │ 28.3762  26.4423  26.6039  38.98
                          181 rows omitted
train_reward_score[:propensity]
Dict{String, Float64} with 4 entries:
  "17" => 0.120533
  "20" => 0.200617
  "11" => 0.287145
  "14" => 0.165352
train_reward_score[:outcome]
Dict{String, Float64} with 4 entries:
  "17" => 0.872743
  "20" => 0.322845
  "11" => 0.660307
  "14" => 0.835289

We can see that the R2 of the internal outcome models for each candidate treatment are between 0.32 and 0.87, whereas the propensity models have R2 between 0.12 and 0.29. The non-zero propensity scores tell us that there is a small amount of treatment assignment bias in this data (keeping in mind that we arbitrarily selected a feature to use as the treatment column). It seems we can predict the outcomes reasonably well, so we have a good base for the reward estimates.

Optimal Policy Trees

Now that we have a complete rewards matrix, we can train a tree to learn an optimal prescription policy that maximizes MPG. We will use a GridSearch to fit an OptimalTreePolicyMaximizer (note that if we were trying to minimize the outcomes, we would use OptimalTreePolicyMinimizer):

grid = IAI.GridSearch(
    IAI.OptimalTreePolicyMaximizer(
        random_seed=1,
        minbucket=15,
    ),
    max_depth=4:5,
)
IAI.fit!(grid, train_X, train_rewards)
Fitted OptimalTreePolicyMaximizer:
  1) Split: model year < 75.5
    2) Split: horsepower < 91
      3) Prescribe: 11, 34 points, error 41.72
      4) Prescribe: 20, 63 points, error 50.4
    5) Split: weight < 2710
      6) Split: displacement < 90.5
        7) Prescribe: 20, 15 points, error 31.02
        8) Prescribe: 14, 41 points, error 36.18
      9) Prescribe: 20, 43 points, error 47.2
Optimal Trees Visualization

The resulting tree recommends different accelerations based on the characteristics of the car. The intensity of the color in each leaf shows the difference in quality between the best and second-best acceleration values.

We can see that both extremes of our candidate acceleration values are prescribed by the tree. Older cars with lower horsepower receive the lowest acceleration, whereas the highest acceleration is best for many different groups of cars.

We can make treatment prescriptions using predict:

prescriptions = IAI.predict(grid, train_X)
196-element Vector{String}:
 "20"
 "20"
 "20"
 "20"
 "20"
 "20"
 "20"
 "20"
 "20"
 "20"
 ⋮
 "14"
 "14"
 "20"
 "14"
 "14"
 "14"
 "14"
 "20"
 "20"

The prescriptions are always returned as strings matching the column names of the input rewards matrix. In our case the treatments are numeric values, and if we want them in numeric form to use later we can convert them to numeric treatments using convert_treatments_to_numeric:

IAI.convert_treatments_to_numeric(prescriptions)
196-element Vector{Int64}:
 20
 20
 20
 20
 20
 20
 20
 20
 20
 20
  ⋮
 14
 14
 20
 14
 14
 14
 14
 20
 20

If we want more information about the relative performance of treatments for these points, we can predict the full treatment ranking with predict_treatment_rank:

rank = IAI.predict_treatment_rank(grid, train_X)
196×4 Matrix{String}:
 "20"  "11"  "17"  "14"
 "20"  "11"  "17"  "14"
 "20"  "11"  "17"  "14"
 "20"  "11"  "17"  "14"
 "20"  "11"  "17"  "14"
 "20"  "11"  "17"  "14"
 "20"  "11"  "17"  "14"
 "20"  "11"  "17"  "14"
 "20"  "11"  "17"  "14"
 "20"  "11"  "17"  "14"
 ⋮
 "14"  "17"  "11"  "20"
 "14"  "17"  "11"  "20"
 "20"  "11"  "17"  "14"
 "14"  "17"  "11"  "20"
 "14"  "17"  "11"  "20"
 "14"  "17"  "11"  "20"
 "14"  "17"  "11"  "20"
 "20"  "11"  "17"  "14"
 "20"  "11"  "17"  "14"

For each point in the data, this gives the treatments in order of effectiveness. As before, this are returned as strings, but we can convert the treatments to numeric values with convert_treatments_to_numeric:

IAI.convert_treatments_to_numeric(rank)
196×4 Matrix{Int64}:
 20  11  17  14
 20  11  17  14
 20  11  17  14
 20  11  17  14
 20  11  17  14
 20  11  17  14
 20  11  17  14
 20  11  17  14
 20  11  17  14
 20  11  17  14
  ⋮
 14  17  11  20
 14  17  11  20
 20  11  17  14
 14  17  11  20
 14  17  11  20
 14  17  11  20
 14  17  11  20
 20  11  17  14
 20  11  17  14

To quantify the difference in performance behind the treatment rankings, we can use predict_treatment_outcome to extract the estimated quality of each treatment for each point:

IAI.predict_treatment_outcome(grid, train_X)
196×4 DataFrame
 Row │ 11       14       17       20
     │ Float64  Float64  Float64  Float64
─────┼────────────────────────────────────
   1 │ 16.9546  15.7356  16.4441  18.3366
   2 │ 16.9546  15.7356  16.4441  18.3366
   3 │ 16.9546  15.7356  16.4441  18.3366
   4 │ 16.9546  15.7356  16.4441  18.3366
   5 │ 16.9546  15.7356  16.4441  18.3366
   6 │ 16.9546  15.7356  16.4441  18.3366
   7 │ 16.9546  15.7356  16.4441  18.3366
   8 │ 16.9546  15.7356  16.4441  18.3366
  ⋮  │    ⋮        ⋮        ⋮        ⋮
 190 │ 19.4255  18.6651  19.2955  21.5347
 191 │ 29.8578  32.561   30.1077  28.93
 192 │ 29.8578  32.561   30.1077  28.93
 193 │ 29.8578  32.561   30.1077  28.93
 194 │ 29.8578  32.561   30.1077  28.93
 195 │ 19.4255  18.6651  19.2955  21.5347
 196 │ 19.4255  18.6651  19.2955  21.5347
                          181 rows omitted

Evaluating Optimal Policy Trees

It is critical for a fair evaluation that we do not evaluate the quality of the policy using rewards from our existing reward estimator trained on the training set. This is to avoid any information from the training set leaking through to the out-of-sample evaluation.

Instead, what we need to do is to estimate a new set of rewards using only the test set, and evaluate the policy against these rewards:

test_rewards, test_reward_score = IAI.fit_predict!(
    reward_lnr, test_X, test_treatments, test_outcomes, treatment_candidates)
test_rewards
196×4 DataFrame
 Row │ 11       14       17       20
     │ Float64  Float64  Float64  Float64
─────┼────────────────────────────────────
   1 │ 18.8761  17.3487  15.4426  13.0237
   2 │ 18.5489  17.1773  16.065   13.1019
   3 │ 16.9064  17.1959  16.4088  13.2654
   4 │ 14.7006  12.306   11.4709  12.9517
   5 │ 15.6508  13.3365  11.5539  12.9941
   6 │ 15.099   16.8191  13.5693  12.6735
   7 │ 24.3365  23.7899  23.5999  22.839
   8 │ 26.4464  27.1766  25.2994  24.027
  ⋮  │    ⋮        ⋮        ⋮        ⋮
 190 │ 29.7355  38.5779  28.4061  27.4744
 191 │ 28.3353  46.667   32.2657  28.3823
 192 │ 27.27    26.9207  26.418   26.5607
 193 │ 27.8558  23.5933  26.9591  27.39
 194 │ 33.5529  34.7929  35.2612  30.7266
 195 │ 63.93    32.1007  32.2657  28.3823
 196 │ 32.3054  31.5629  25.0507  27.6828
                          181 rows omitted
test_reward_score[:propensity]
Dict{String, Float64} with 4 entries:
  "17" => 0.158079
  "20" => 0.267284
  "11" => 0.393047
  "14" => 0.217413
test_reward_score[:outcome]
Dict{String, Float64} with 4 entries:
  "17" => 0.716403
  "20" => 0.747099
  "11" => 0.508526
  "14" => 0.889527

We see the internal models of our test reward estimator have scores that are similar to those on the test set. As with the training set, this gives us confidence that the estimated rewards are a fair reflection of reality, and will serve as a good basis for evaluation.

We can now evaluate the quality using these new estimated rewards. First, we will calculate the average predicted MPG under the treatments prescribed by the tree for the test set. To do this, we use predict_outcomes which uses the model to make prescriptions and looks up the predicted outcomes under these prescriptions:

policy_outcomes = IAI.predict_outcomes(grid, test_X, test_rewards)
196-element Vector{Float64}:
 13.02366889
 13.1019369
 13.26536812
 12.9517449
 12.99406544
 12.67350622
 22.83901263
 26.44638436
 23.16949622
 23.68354799
  ⋮
 20.03357808
 26.42930003
 38.57785239
 46.66702847
 26.56066784
 27.39004761
 34.79286695
 32.10071157
 31.56293754

We can then get the average estimated MPG under our treatments:

using Statistics
mean(policy_outcomes)
23.98875902

We can compare this number to the average estimated MPG under a baseline policy that assigns the same treatment to all observations regardless of their features. Let us compare to a policy that assigns 17 everywhere:

mean(test_rewards[:, "17"])
23.11359676

We can see that the personalization offered by the tree policy indeed improves upon a baseline where each observation receives the same treatment.