Reward Estimation Learners

RewardEstimation provides a variety of learners for conducting reward estimation, which we describe on this page along with a guide to the available parameters.

The type of learner to use depends on the characteristics of the problem:

Shared Parameters

In all cases, there are a number of parameters (in addition to the standard shared learner parameters) that are shared by all flavors of reward estimation learner.


The method to use for reward estimation. The following options are available:

Depending on the selected method, you will also have to specify the following parameters:


The learner to use for propensity score estimation. The type of learner depends on the type of outcome in the problem:

To conduct parameter validation during the propensity score fitting, a GridSearch over the appropriate learner can also be used.


A positive Integer indicating the number of folds to use when estimating the propensity scores in-sample during fit_predict!. Defaults to 5. Set to nothing to disable in-sample cross-validation (which should only be done if you are confident the model will not overfit).


A Real value between 0 and 1 that specifies the minimum propensity score that can be predicted for any treatment. Defaults to 0. This parameter can be used to clip small propensity estimates to increase stability.


The learner to use for outcome estimation. The type of learner depends on the type of outcome in the problem:

To conduct parameter validation during the outcome learner fitting, a GridSearch over the appropriate learner can also be used.


A positive Integer indicating the number of folds to use when estimating the outcomes in-sample during fit_predict!. Defaults to 5. Set to nothing to disable in-sample cross-validation (which should only be done if you are confident the model will not overfit).


For problems with survival outcomes only

The method to use for adjusting rewards to account for censoring. The following options are available:


For problems with survival outcomes only

Controls which survival target is used as the outcome during the estimation process. The following options are available:

  • nothing to use the expected survival time as the outcome, which is the default.
  • a Real to specify a time at which to evaluate the survival probability and use this as the outcome. For example, setting evaluation_time=5 will use the probability of surviving past time 5 as the outcome in the estimation process.

Learners for Categorical Reward Estimation

The type of learner to use for conducting reward estimation with categorical treatments depends on the outcome type:

There are no additional parameters beyond the shared parameters.

Learners for Numeric Reward Estimation

The type of learner to use for conducting reward estimation with numeric treatments depends on the outcome type:

In addition to the shared learner parameters, the following parameters are used to control the reward estimation procedure.


A Symbol or String specifying the kernel function to use while estimating the propensity scores and outcomes. Defaults to :gaussian which uses the Gaussian kernel.


The bandwidth to use in conjunction with estimation_kernel. This parameter is a Real scaling factor that is applied to the rule-of-thumb bandwidth estimate. The default value is 1, implying the rule-of-thumb estimate is used as the bandwidth. We recommend tuning this parameter using the procedure outlined in the guide to bandwidth tuning.


A Symbol or String specifying the kernel function to use while calculating the rewards as part of the propensity scores adjustment. Defaults to :epanechnikov which uses the Epanechnikov kernel.


The bandwidth to use in conjunction with reward_kernel. This parameter is a Real scaling factor that is applied to the rule-of-thumb bandwidth estimate. The default value is 1, implying the rule-of-thumb estimate is used as the bandwidth. We recommend tuning this parameter using the procedure outlined in the guide to bandwidth tuning.