Optimal Prescription for Diabetes Management

In this example, we aim to learn an optimal diabetes management policy from observational data. We use a cleaned dataset based on an Electronic Medical Records database (for patient confidentiality reasons, we cannot share the dataset publically).

We show how Optimal Policy Trees can be used to inform interpretable diabetes management policies. In particular, we demonstrate that by treating treatments as combination of numeric doses instead of categorical, mutually independent options, we achieve much better prescription outcomes in unseen data.

This case is a simplified version of what is presented in Amram et al. 2020. At the end of the case, we describe some of the modifications done in the paper to make the setting more realistic.

Preparing the dataset

First, we load the prepared data and separate out the features, treatments, and outcomes (HbA1c, the lower the better). The treatments are coded as a vector of three numbers, corresponding to the amount in each of tree distinct drug classes: insulin, metformin, and oral drugs. We then separate the data into training and testing.

using CSV, DataFrames
using Statistics

df = CSV.read("diabetes.csv", DataFrame)
transform!(df, [:raceBucket, :SEX_CD] .=> categorical, renamecols=false)

train_df = df[df.train, :]
test_df = df[(!).(df.train), :]
train_X = train_df[:, 1:23]
train_T = train_df[:, 24:26]
train_y = train_df[:, :x1]
test_X = test_df[:, 1:23]
test_T = test_df[:, 24:26]
test_y = test_df[:, :x1]

Because not all treatment combinations are feasible from a medical perspective, we need to also load in the candidate prescription options:

candidates = CSV.read("candidates.csv", DataFrame)

13×3 DataFrame
 Row │ ins      met      oral
     │ Float64  Float64  Float64
─────┼───────────────────────────
   1 │     0.0      0.0      2.0
   2 │     0.0      1.0      1.0
   3 │     0.0      0.0      1.0
   4 │     1.0      1.0      1.0
   5 │     0.0      1.0      0.0
   6 │     1.0      0.0      1.0
   7 │     0.0      0.0      3.0
   8 │     0.1      0.1      0.1
   9 │     0.0      0.0      0.0
  10 │     0.0      1.0      2.0
  11 │     1.0      0.0      0.0
  12 │     1.0      1.0      0.0
  13 │     1.0      0.0      2.0

Optimal Policy Trees with Multiple Numeric Treatments

Optimal Policy Trees separate the tasks of estimating the counterfactuals and learning the prescription policy. This means that in order to train the tree, we require the input of a reward matrix - that is, the outcome under each of the possible policies for each sample in the data.

The first step is to create the reward matrix. One of the key advantages that Optimal Policy Trees has over Optimal Prescriptive Trees and regress-per-treatment-and-compare methods is that it does not need to onehot-encode treatments into distinct options. In other words, it preserves the order of treatment intensities and hence has a sense of meaningful similarity. For example, we have that "insulin + metformin" (encoded as [1,1,0]) is closer to "insulin + metformin + oral drug" ([1,1,1]) than "2 oral drugs" ([0,0,2]). Note that this is not the case if we have to onehot-encode the treatments into 13 distinct options.

Since the treatments are numeric, we use IAI.NumericRewardEstimator to estimate the counterfactual reward matrix under each candidate treatment option:

seed = 1
reward_lnr = IAI.NumericRewardEstimator(
    outcome_estimator=IAI.XGBoostRegressor(),
    random_seed=seed,
)
train_rewards, train_score = IAI.fit_predict!(
      reward_lnr, train_X, train_T, train_y, candidates)
train_score

0.527520818023596

The R-squared score of the internal estimator is reasonable, giving us confidence that there is sufficient signal in the data to use for making prescriptions.

With the reward matrix as the input in addition to the features, we can now learn an Optimal Policy Tree:

policy_grid = IAI.GridSearch(
    IAI.OptimalTreePolicyMinimizer(
      random_seed=seed,
    ),
    max_depth=1:5,
)
IAI.fit!(policy_grid, train_X, train_rewards)

Optimal Trees Visualization

We obtain a simple tree with just four splits. If the past HbA1c level is low, only metformin monotherapy is prescribed, consistent with the medical knowledge that this is one of the least intensive treatment options. Otherwise, depending the years since the diabetes diagnosis (DMRxAge), the 25th percentile of past HbA1c (HbA1c_25th), and the number of visits so far (visitNo), we would prescribe a range of therapies from insulin plus others to purely oral drugs.

Evaluation in the Test Set

Since we do not know the true outcome under each treatment in the test set, we need another procedure for estimation so we can evaluate in the test set. We use the same procedure as before to estimate the revenue under each candidate treatment, but now on the test set:

test_rewards, test_score = IAI.fit_predict!(
      reward_lnr, test_X, test_T, test_y, candidates)
test_score

0.5501950667768514

Again, the test score is reasonable, allowing us to trust these values for out-of-sample evaluation. We evaluate the method by calculating the predicted HbA1c outcome under the new diabetes treatment strategy:

outcomes = IAI.predict_outcomes(policy_grid, test_X, test_rewards)
mean(outcomes)

7.801433161174445

And compare to what is being prescribed in reality:

mean(test_y)

7.87042669172932

We see that there is a clinically meaningful reduction of 0.07 in HbA1c under the optimal policy.

Optimal Policy Trees with Categorical Treatments

We mentioned previously the advantage of not having to onehot encode the treatments in Optimal Policy Trees. To verify if there is indeed such an advantage, let's try generating the reward matrix with treatments encoded as distinct categories with IAI.CategoricalRewardEstimator:

train_T_categorical = string.("[", train_T.ins, ",", train_T.met, ",", train_T.ins, "]")
categorical_reward_lnr = IAI.CategoricalRewardEstimator(
    outcome_estimator=IAI.RandomForestRegressor(),
    propensity_estimator=IAI.RandomForestClassifier(),
    reward_estimator=:doubly_robust,
    propensity_min_value=0.001,
    random_seed=seed,
)
train_rewards, train_score = IAI.fit_predict!(
      categorical_reward_lnr, train_X, train_T_categorical, train_y)
train_score

Dict{Symbol,Any} with 2 entries:
  :propensity => 0.761337
  :outcome    => Dict("[0.0,1.0,0.0]"=>0.521647,"[1.0,0.0,1.0]"=>0.489265,"[1.0…

Notice that we have decent scores for both the propensity estimator and the outcome estimators. Let's now train an Optimal Policy Tree:

categorical_policy_grid = IAI.GridSearch(
    IAI.OptimalTreePolicyMinimizer(
      random_seed=seed,
    ),
    max_depth=1:5,
)
IAI.fit!(categorical_policy_grid, train_X, train_rewards)

Optimal Trees Visualization

Similarly, we can look at the outcome under the prescription in test data:

outcomes = IAI.predict_outcomes(categorical_policy_grid, test_X, test_rewards)
mean(outcomes)

7.871868864082752

The results are worse than Optimal Policy Tree with numeric treatments, and about the same the current prescription. This suggests that by encoding the treaments as distinct categorical options, we lose the relationship between different treatments when estimating the reward matrix, and therefore we do not obtain a good policy tree in the end.

Optimal Prescriptive Trees with Categorical Treatments

Similarly, with categorical treatments, we can train an Optimal Prescriptive Tree (which is not possible with numeric treatments):

prescriptive_grid = IAI.GridSearch(
    IAI.OptimalTreePrescriptionMinimizer(
      random_seed=seed,
    ),
    max_depth=1:5,
)
IAI.fit!(prescriptive_grid, train_X, train_T_categorical, train_y)

Optimal Trees Visualization

We similarly evaluate the outcomes in test data:

pred_treatments, _ = IAI.predict(prescriptive_grid, test_X)
pred_outcomes = [test_rewards[i, pred_treatments[i]] for i in 1:length(pred_treatments)]
mean(pred_outcomes)

7.949112643478508

Which again is worse than the results from the initial Optimal Policy Trees with numeric treatments approach and worse than current prescription.

Additional Modifications

In the paper, we made some adjustments to adapt to the practical considerations in the medical setting. For example, some of the treatments cannot be assigned to a specific patient due to medical reasons (some drugs have contra-indications, some are not allowed based on kidney conditions, or some combinations are some too drastic a change from current therapy, etc.) To incorporate these considerations, we can use the IAI.predict_treatment_rank function to obtain an ordered list of treatments and find the best feasible one from the set.

Another modification made was that sometimes an adjustment should only be made when we have sufficient confidence that the expected improvement is large. We then introduce a threshold delta where change is recommended only if the predicted outcome under the best treatment is better than current one by at least delta. Otherwise, we recommend the person should stay on their current regimen.

The paper additionally compares against regress-and-compare approaches and other baselines and found similar patterns.

Conclusions

This case shows that by using an Optimal Policy Trees approach, we can learn an interpretable policy to recommend treatments for diabetes patients that significantly improves their HbA1c compared to the standard of care.

Comparing against other options, Optimal Policy Trees allow us to directly use the treatments as a combination of numeric doses instead of having to encode them as categorical options. This leads to stronger reward estimates and much better prescribed outcomes in unseen data.