# Optimal Prescription for Diabetes Management

In this example, we aim to learn an optimal diabetes management policy from observational data. We use a cleaned dataset based on an Electronic Medical Records database (for patient confidentiality reasons, we cannot share the dataset publically).

We show how Optimal Policy Trees can be used to inform interpretable diabetes management policies. In particular, we demonstrate that by treating treatments as combination of numeric doses instead of categorical, mutually independent options, we achieve much better prescription outcomes in unseen data.

This case is a simplified version of what is presented in Amram et al. 2020. At the end of the case, we describe some of the modifications done in the paper to make the setting more realistic.

## Preparing the dataset

First, we load the prepared data and separate out the features, treatments, and outcomes (HbA1c, the lower the better). The treatments are coded as a vector of three numbers, corresponding to the amount in each of tree distinct drug classes: insulin, metformin, and oral drugs. We then separate the data into training and testing.

```
using CSV, DataFrames, CategoricalArrays
using Statistics
using CategoricalArrays
df = CSV.read("diabetes.csv", DataFrame)
transform!(df, [:raceBucket, :SEX_CD] .=> categorical, renamecols=false)
train_df = df[df.train, :]
test_df = df[(!).(df.train), :]
train_X = train_df[:, 1:23]
train_T = train_df[:, 24:26]
train_y = train_df[:, :x1]
test_X = test_df[:, 1:23]
test_T = test_df[:, 24:26]
test_y = test_df[:, :x1]
```

Because not all treatment combinations are feasible from a medical perspective, we need to also load in the candidate prescription options:

`candidates = CSV.read("candidates.csv", DataFrame)`

```
13×3 DataFrame
Row │ ins met oral
│ Float64 Float64 Float64
─────┼───────────────────────────
1 │ 0.0 0.0 2.0
2 │ 0.0 1.0 1.0
3 │ 0.0 0.0 1.0
4 │ 1.0 1.0 1.0
5 │ 0.0 1.0 0.0
6 │ 1.0 0.0 1.0
7 │ 0.0 0.0 3.0
8 │ 0.1 0.1 0.1
9 │ 0.0 0.0 0.0
10 │ 0.0 1.0 2.0
11 │ 1.0 0.0 0.0
12 │ 1.0 1.0 0.0
13 │ 1.0 0.0 2.0
```

## Optimal Policy Trees with Multiple Numeric Treatments

Optimal Policy Trees separate the tasks of estimating the counterfactuals and learning the prescription policy. This means that in order to train the tree, we require the input of a reward matrix - that is, the outcome under each of the possible policies for each sample in the data.

The first step is to create the reward matrix. One of the key advantages that Optimal Policy Trees has over Optimal Prescriptive Trees and regress-per-treatment-and-compare methods is that it does not need to onehot-encode treatments into distinct options. In other words, it preserves the order of treatment intensities and hence has a sense of meaningful similarity. For example, we have that "insulin + metformin" (encoded as `[1,1,0]`

) is closer to "insulin + metformin + oral drug" (`[1,1,1]`

) than "2 oral drugs" (`[0,0,2]`

). Note that this is not the case if we have to onehot-encode the treatments into 13 distinct options.

Since the treatments are numeric, we use `IAI.NumericRegressionRewardEstimator`

to estimate the counterfactual reward matrix under each candidate treatment option:

```
seed = 1
reward_lnr = IAI.NumericRegressionRewardEstimator(
outcome_estimator=IAI.RandomForestRegressor(),
outcome_insample_num_folds=2,
propensity_estimator=IAI.RandomForestRegressor(),
propensity_insample_num_folds=2,
propensity_min_value=0.001,
reward_estimator=:doubly_robust,
estimation_kernel_bandwidth=2,
reward_kernel_bandwidth=1,
random_seed=seed,
)
train_rewards, train_score = IAI.fit_predict!(
reward_lnr, train_X, train_T, train_y, candidates)
train_score[:outcome]
```

```
Dict{String, Float64} with 13 entries:
"[1.0,1.0,0.0]" => 0.445348
"[0.0,1.0,0.0]" => 0.548832
"[1.0,0.0,1.0]" => 0.48917
"[1.0,0.0,0.0]" => 0.48427
"[0.0,0.0,1.0]" => 0.534863
"[0.0,1.0,2.0]" => 0.270437
"[1.0,0.0,2.0]" => 0.376345
"[0.0,1.0,1.0]" => 0.442873
"[0.0,0.0,2.0]" => 0.529354
"[0.0,0.0,3.0]" => 0.228115
"[1.0,1.0,1.0]" => 0.187835
"[0.0,0.0,0.0]" => 0.508141
"[0.1,0.1,0.1]" => 0.497965
```

`train_score[:propensity]`

```
Dict{String, Float64} with 13 entries:
"[1.0,1.0,0.0]" => 0.84882
"[0.0,1.0,0.0]" => 0.875854
"[1.0,0.0,1.0]" => 0.760222
"[1.0,0.0,0.0]" => 0.86632
"[0.0,0.0,1.0]" => 0.821752
"[0.0,1.0,2.0]" => 0.820384
"[1.0,0.0,2.0]" => 0.610686
"[0.0,1.0,1.0]" => 0.827628
"[0.0,0.0,2.0]" => 0.663635
"[0.0,0.0,3.0]" => 0.726541
"[1.0,1.0,1.0]" => 0.784475
"[0.0,0.0,0.0]" => 0.886931
"[0.1,0.1,0.1]" => 0.918676
```

The R-squared scores of the internal estimators are reasonable, giving us confidence that the value for `estimation_kernel_bandwidth`

is fine and that there is sufficient signal in the data to use for making prescriptions.

Let's now tune the `reward_kernel_bandwidth`

. Following the guide to bandwidth tuning, we inspect the tuned bandwidth based on a series on input values:

```
using Plots
in = 0.5:0.5:5
out = IAI.tune_reward_kernel_bandwidth(reward_lnr, in)
plot(in, out)
```

The figure does not show a plateau or peak, so there is no real guidance as to the best bandwidth to use. As per the tuning guide, in such cases a good rule of thumb is to set the bandwidth to the same value as the `estimation_kernel_bandwidth`

:

`train_rewards = IAI.set_reward_kernel_bandwidth!(reward_lnr, 2)`

With the reward matrix as the input in addition to the features, we can now learn an Optimal Policy Tree:

```
policy_grid = IAI.GridSearch(
IAI.OptimalTreePolicyMinimizer(
random_seed=seed,
),
max_depth=1:5,
)
IAI.fit!(policy_grid, train_X, train_rewards)
```

We obtain a tree that splits using information such as time since diagnosis, age, BMI, and past HbA1c. For example, we would recommend metformin monotherapy if the time since diagnosis is short, BMI is low, and past HbA1c is low. This is consistent with the medical knowledge that this is one of the least intensive treatment options. In contrast, we would prescribe insulin plus metformin, or a combination of three when past HbA1c is high or BMI is high.

## Evaluation in the Test Set

Since we do not know the true outcome under each treatment in the test set, we need another procedure for estimation so we can evaluate in the test set. We use the same procedure as before to estimate the revenue under each candidate treatment, but now on the test set:

```
test_rewards, test_score = IAI.fit_predict!(
reward_lnr, test_X, test_T, test_y, candidates)
test_score[:outcome]
```

```
Dict{String, Float64} with 13 entries:
"[1.0,1.0,0.0]" => 0.408406
"[0.0,1.0,0.0]" => 0.483051
"[1.0,0.0,1.0]" => 0.286657
"[1.0,0.0,0.0]" => 0.369523
"[0.0,0.0,1.0]" => 0.454017
"[0.0,1.0,2.0]" => 0.297262
"[1.0,0.0,2.0]" => 0.541786
"[0.0,1.0,1.0]" => 0.348789
"[0.0,0.0,2.0]" => 0.467515
"[0.0,0.0,3.0]" => 0.249605
"[1.0,1.0,1.0]" => 0.494716
"[0.0,0.0,0.0]" => 0.496574
"[0.1,0.1,0.1]" => 0.490733
```

Again, the test scores are reasonable, allowing us to trust these values for out-of-sample evaluation. We evaluate the method by calculating the predicted HbA1c outcome under the new diabetes treatment strategy:

```
test_rewards = IAI.set_reward_kernel_bandwidth!(reward_lnr, 2)
outcomes = IAI.predict_outcomes(policy_grid, test_X, test_rewards)
mean(outcomes)
```

`7.68342687604616`

And compare to what is being prescribed in reality:

```
test_T_categorical = Symbol.("[", test_T.ins, ",", test_T.met, ",", test_T.ins, "]")
curr_outcomes = [test_rewards[i, test_T_categorical[i]] for i in 1:length(test_T_categorical)]
mean(curr_outcomes)
```

`7.79197313372671`

We see that there is a significant reduction of 0.11 in HbA1c under the optimal policy, much higher than a clinically meaningful threshold of 0.05.

## Optimal Policy Trees with Categorical Treatments

We mentioned previously the advantage of not having to onehot encode the treatments in Optimal Policy Trees. To verify if there is indeed such an advantage, let's try generating the reward matrix with treatments encoded as distinct categories with `IAI.CategoricalRegressionRewardEstimator`

:

```
train_T_categorical = string.("[", train_T.ins, ",", train_T.met, ",", train_T.ins, "]")
categorical_reward_lnr = IAI.CategoricalRegressionRewardEstimator(
propensity_estimator=IAI.RandomForestClassifier(),
outcome_estimator=IAI.RandomForestRegressor(),
reward_estimator=:doubly_robust,
propensity_min_value=0.001,
random_seed=seed,
)
train_rewards, train_score = IAI.fit_predict!(
categorical_reward_lnr, train_X, train_T_categorical, train_y)
train_score[:outcome]
```

```
Dict{String, Float64} with 5 entries:
"[0.0,1.0,0.0]" => 0.521162
"[1.0,0.0,1.0]" => 0.485017
"[1.0,1.0,1.0]" => 0.371461
"[0.0,0.0,0.0]" => 0.55718
"[0.1,0.1,0.1]" => 0.340644
```

`train_score[:propensity]`

`0.7349132918721478`

Notice that we have decent scores for both the propensity estimator and the outcome estimators. Let's now train an Optimal Policy Tree:

```
categorical_policy_grid = IAI.GridSearch(
IAI.OptimalTreePolicyMinimizer(
random_seed=seed,
),
max_depth=1:5,
)
IAI.fit!(categorical_policy_grid, train_X, train_rewards)
```