Advanced

Learning Treatment Effects Rather Than Policies

Reward Estimation is usually used during the policy learning process, where we are aiming to prescribe treatments that maximize the quality of rewards as estimated from observational data. However, Reward Estimation can be also be used for the related task of predicting the treatment effect from observational data.

One case where this problem often arises is that of over-treatment (e.g. in medicine), where we may already know that a treatment is generally beneficial in a population (or subset of the population), but has side effects that may be hard to quantify. In this case, we are not interested in simply learning in which groups the treatment is beneficial, but rather we would like to estimate how good the treatment across this population.

In this setting, we are still seeking to learn from observational data, so Reward Estimation still proves useful in dealing with the potential biases and difficulties that we may encounter. However, instead of using Optimal Policy Trees on the estimated rewards to learn a prescription policy, we will now use other models to estimate the treatment effects.

To illustrate this, let us consider a synthetic example with a single treatment whose effect depends simply on one of the features:

\[y_{treatment} - y_{control} = 3 x_1 - 1\]

We generate data and randomly assign treatments and outcomes according to this model:

using DataFrames, StableRNGs
n = 500
p = 5
rng = StableRNG(123)  # for consistent output across Julia versions
X = DataFrame(randn(rng, n, p), :auto)
T = rand(rng, 'A':'B', n)
y = rand(rng, n) .+ (T .== 'B') .* (3 * X.x1 .- 1)

With this data, we can conduct the Reward Estimation process as normal:

reward_lnr = IAI.CategoricalRegressionRewardEstimator(
    propensity_estimator=IAI.RandomForestClassifier(),
    outcome_estimator=IAI.RandomForestRegressor(),
    reward_estimator=:doubly_robust,
    random_seed=123,
)
predictions, reward_score = IAI.fit_predict!(reward_lnr, X, T, y)
rewards = predictions[:reward]

First, let us train an Optimal Policy Tree on this data:

policy_grid = IAI.GridSearch(
    IAI.OptimalTreePolicyMaximizer(
        random_seed=123,
    ),
    max_depth=1:3,
)
IAI.fit!(policy_grid, X, rewards)

Optimal Trees Visualization

We can see that indeed the resulting tree only has a single split; from the problem setup we know that below this value the treatment is harmful, and above this value it is beneficial. However, as mentioned above, if our goal is to understand how harmful or beneficial the treatment is for different subsets of the population, this policy tree may not prove very useful with just a single split.

To address this, we can instead train a different model altogether. We will treat the problem as a regression problem instead, and train a model to predict the difference in rewards between the treated and untreated groups. We can do this with any regression model, in this case we will use an Optimal Regression Tree:

regression_grid = IAI.GridSearch(
    IAI.OptimalTreeRegressor(
        random_seed=123,
    ),
    max_depth=1:3,
)
IAI.fit!(regression_grid, X, rewards.B - rewards.A)

Optimal Trees Visualization

The resulting tree provides a much more refined view of how the treatment effect varies among the population, dividing the population into different groups with similar treatment effects.

We can also conduct this process in a multi-treatment case using a multi-task regression model to estimate multiple treatment effects at once. Let us illustrate this with the following example with two treatments:

\[y_{treatment 1} - y_{control} = 3 x_1 - 1\\ y_{treatment 2} - y_{control} = x_1^2 - 1\]

As before, we generate data according to this model:

T = rand(rng, 'A':'C', n)
y = rand(rng, n) .+
    (T .== 'B') .* (3 * X.x1 .- 1) .+
    (T .== 'C') .* (X.x1 .^ 2 .- 1)

Next, we conduct Reward Estimation:

reward_lnr = IAI.CategoricalRegressionRewardEstimator(
    propensity_estimator=IAI.RandomForestClassifier(),
    outcome_estimator=IAI.RandomForestRegressor(),
    reward_estimator=:doubly_robust,
    random_seed=123,
)
predictions, reward_score = IAI.fit_predict!(reward_lnr, X, T, y)
rewards = predictions[:reward]

With these rewards, we train an Optimal Policy Tree:

policy_grid = IAI.GridSearch(
    IAI.OptimalTreePolicyMaximizer(
        random_seed=123,
    ),
    max_depth=1:3,
)
IAI.fit!(policy_grid, X, rewards)

Optimal Trees Visualization

As before, we see that the tree is very simple, as only two splits are needed to achieve the optimal policy of treatment assignment.

In constrast to this, we can frame the problem as a multi-task regression problem. To do this, our target will be to predict the treatment effect of each treatment as a separate task:

treatment_effects = DataFrame(
    EffectB=(rewards.B - rewards.A),
    EffectC=(rewards.C - rewards.A),
)

500×2 DataFrame
 Row │ EffectB    EffectC
     │ Float64    Float64
─────┼───────────────────────
   1 │ -0.700143  -0.436596
   2 │  1.19983   -1.54195
   3 │ -5.7172     0.726624
   4 │ -0.587503  -0.967339
   5 │ -2.45711   -2.37495
   6 │  0.549091  -0.115654
   7 │ -2.7009    -0.541476
   8 │ -2.13606   -0.58574
  ⋮  │     ⋮          ⋮
 494 │  2.96157    1.73995
 495 │ -1.23689   -3.10641
 496 │  6.17125    4.38214
 497 │ -1.3626    -1.74992
 498 │ -3.90075   -0.395094
 499 │  1.85513    0.966435
 500 │ -8.21526    0.568705
             485 rows omitted

We can then train a multi-task Optimal Regression Tree using this target:

regression_grid = IAI.GridSearch(
    IAI.OptimalTreeMultiRegressor(
        random_seed=123,
        minbucket=5,
    ),
    max_depth=1:3,
)
IAI.fit!(regression_grid, X, treatment_effects)

Optimal Trees Visualization

We see that the resulting tree provides a more granular view into how the treatment effects vary across the population, identifying cohorts that exhibit similar responses to the treatments.