Reward Estimation with Numeric Treatments

Numeric Reward Estimation Process

When we have numeric treatments, the reward estimation process consists of two steps:

1. Estimating Outcomes

The predicted outcome $f_i(t)$ is the outcome that we would predict to occur if point $i$ with features $\mathbf{x}_i$ were assigned treatment $t$.

We can estimate the outcomes by training a model that predicts the outcome based on both the features and the treatment(s) assigned to each point. This approach handles both a single numeric treatment as well as multiple numeric treatments, as each treatment option is simply another feature in the estimation process.

The type of model used for outcome estimation depends on the type of outcome:

  • numeric outcomes are estimated using regression models to predict the outcome directly
  • binary outcomes are estimated using classification models to predict the probability of success

It is important that the predictions of the trained model vary as the treatments are changed. If the model does not make regular use of the treatment features, then the rewards will not vary under different treatment options, making further use of these rewards complicated. For this reason, we strongly recommend using a complex black-box method for outcome estimation to increase the chance that the model learns the nuances of how the outcome varies with treatment. In contrast, a simpler model that uses few variables (like a decision tree or linear regression) may not generate rewards that are as useful, even if the predictive performance of this model is still good.

2. Estimating Rewards under Candidate Treatments

Given a model for estimating outcomes as a function of the features and treatments, we can generate rewards by estimating the outcome for each point under a variety of candidate treatments with the features fixed.

The set of candidate treatments for which to generate rewards is determined by the treatment_candidates argument to fit_predict! and predict.

Learner for Numeric Reward Estimation

RewardEstimation provides the NumericRewardEstimator learner to easily conduct reward estimation on data with numeric treatments. In addition to the shared learner parameters, the following parameters are used to control the reward estimation procedure.


The learner to use for outcome estimation. The type of learner depends on the type of outcome in the problem:

To conduct parameter validation during the outcome learner fitting, a GridSearch over the appropriate learner can also be used.


A positive Integer indicating the number of folds to use when estimating the outcomes in-sample during fit_predict!. Defaults to 5. Set to nothing to disable in-sample cross-validation (which should only be done if you are confident the model will not overfit).