In this example, we aim to learn an optimal diabetes management policy from observational data. We use a cleaned dataset based on an Electronic Medical Records database (for patient confidentiality reasons, we cannot share the dataset publically).
We show how Optimal Policy Trees can be used to inform interpretable diabetes management policies. In particular, we demonstrate that by treating treatments as combination of numeric doses instead of categorical, mutually independent options, we achieve much better prescription outcomes in unseen data.
This case is a simplified version of what is presented in Amram et al. 2020. At the end of the case, we describe some of the modifications done in the paper to make the setting more realistic.
First, we load the prepared data and separate out the features, treatments, and outcomes (HbA1c, the lower the better). The treatments are coded as a vector of three numbers, corresponding to the amount in each of tree distinct drug classes: insulin, metformin, and oral drugs. We then separate the data into training and testing.
using CSV, DataFrames using Statistics df = CSV.read("diabetes.csv", DataFrame) transform!(df, [:raceBucket, :SEX_CD] .=> categorical, renamecols=false) train_df = df[df.train, :] test_df = df[(!).(df.train), :] train_X = train_df[:, 1:23] train_T = train_df[:, 24:26] train_y = train_df[:, :x1] test_X = test_df[:, 1:23] test_T = test_df[:, 24:26] test_y = test_df[:, :x1]
Because not all treatment combinations are feasible from a medical perspective, we need to also load in the candidate prescription options:
candidates = CSV.read("candidates.csv", DataFrame)
13×3 DataFrame Row │ ins met oral │ Float64 Float64 Float64 ─────┼─────────────────────────── 1 │ 0.0 0.0 2.0 2 │ 0.0 1.0 1.0 3 │ 0.0 0.0 1.0 4 │ 1.0 1.0 1.0 5 │ 0.0 1.0 0.0 6 │ 1.0 0.0 1.0 7 │ 0.0 0.0 3.0 8 │ 0.1 0.1 0.1 9 │ 0.0 0.0 0.0 10 │ 0.0 1.0 2.0 11 │ 1.0 0.0 0.0 12 │ 1.0 1.0 0.0 13 │ 1.0 0.0 2.0
Optimal Policy Trees separate the tasks of estimating the counterfactuals and learning the prescription policy. This means that in order to train the tree, we require the input of a reward matrix - that is, the outcome under each of the possible policies for each sample in the data.
The first step is to create the reward matrix. One of the key advantages that Optimal Policy Trees has over Optimal Prescriptive Trees and regress-per-treatment-and-compare methods is that it does not need to onehot-encode treatments into distinct options. In other words, it preserves the order of treatment intensities and hence has a sense of meaningful similarity. For example, we have that "insulin + metformin" (encoded as
[1,1,0]) is closer to "insulin + metformin + oral drug" (
[1,1,1]) than "2 oral drugs" (
[0,0,2]). Note that this is not the case if we have to onehot-encode the treatments into 13 distinct options.
Since the treatments are numeric, we use
IAI.NumericRewardEstimator to estimate the counterfactual reward matrix under each candidate treatment option:
seed = 1 reward_lnr = IAI.NumericRewardEstimator( outcome_estimator=IAI.XGBoostRegressor(), random_seed=seed, ) train_rewards, train_score = IAI.fit_predict!( reward_lnr, train_X, train_T, train_y, candidates) train_score
The R-squared score of the internal estimator is reasonable, giving us confidence that there is sufficient signal in the data to use for making prescriptions.
With the reward matrix as the input in addition to the features, we can now learn an Optimal Policy Tree:
policy_grid = IAI.GridSearch( IAI.OptimalTreePolicyMinimizer( random_seed=seed, ), max_depth=1:5, ) IAI.fit!(policy_grid, train_X, train_rewards)