Reward Estimation Learners

RewardEstimation provides a variety of learners for conducting reward estimation, which we describe on this page along with a guide to the available parameters.

The type of learner to use depends on the characteristics of the problem:

for problems with categorical treatments, use one of the learners for categorical reward estimation
for problems with numeric treatments, use one of the learners for numeric reward estimation

Shared Parameters

In all cases, there are a number of parameters (in addition to the standard shared learner parameters) that are shared by all flavors of reward estimation learner.

`reward_estimator`

The method to use for reward estimation. The following options are available:

:doubly_robust to use the doubly robust estimator, which is the default
:direct_method to use the direct method estimator
:inverse_propensity_weighting to use the inverse propensity weighting estimator

Depending on the selected method, you will also have to specify the following parameters:

propensity_estimator (for IPW and DR methods)
outcome_estimator (for DM and DR methods)

`propensity_estimator`

The learner to use for propensity score estimation. The type of learner depends on the type of outcome in the problem:

if the treatments are categorical, we estimate the propensities using a ClassificationLearner such as RandomForestClassifier or XGBoostClassifier. We can also use EqualPropensityEstimator to estimate equal probabilities for each treatment (for data from randomized experiments where treatments are randomly assigned)
if the treatments are numeric, we estimate the propensities using a RegressionLearner such as RandomForestRegressor, XGBoostRegressor

To conduct parameter validation during the propensity score fitting, a GridSearch over the appropriate learner can also be used.

`propensity_insample_num_folds`

A positive Integer indicating the number of folds to use when estimating the propensity scores in-sample during fit_predict!. Defaults to 5. Set to nothing to disable in-sample cross-validation (which should only be done if you are confident the model will not overfit).

`propensity_min_value`

A Real value between 0 and 1 that specifies the minimum propensity score that can be predicted for any treatment. Defaults to 0. This parameter can be used to clip small propensity estimates to increase stability.

`outcome_estimator`

The learner to use for outcome estimation. The type of learner depends on the type of outcome in the problem:

if the outcomes $y_i$ are numeric, we estimate the outcome directly using a RegressionLearner such as RandomForestRegressor, XGBoostRegressor, or GLMNetCVRegressor
if the outcomes $y_i$ are binary, we estimate the probability of success using a ClassificationLearner such as RandomForestClassifier, or XGBoostClassifier
if the outcomes $y_i$ are survival outcomes, we estimate the probability of success using a SurvivalLearner such as RandomForestSurvivalLearner, or XGBoostSurvivalLearner

To conduct parameter validation during the outcome learner fitting, a GridSearch over the appropriate learner can also be used.

`outcome_insample_num_folds`

A positive Integer indicating the number of folds to use when estimating the outcomes in-sample during fit_predict!. Defaults to 5. Set to nothing to disable in-sample cross-validation (which should only be done if you are confident the model will not overfit).

`censoring_adjustment_method`

For problems with survival outcomes only

The method to use for adjusting rewards to account for censoring. The following options are available:

:complete_cases to use the inverse-probability of censoring weighting (IPCW) as outlined in Section 2.2 of Cui et al. (2020)
:increased_efficiency to use the increased efficiency estimation approach as outlined in Section 2.3 of Cui et al. (2020) and Section 10.4 of Tsiatis (2007), which is the default

`evaluation_time`

For problems with survival outcomes only

Controls which survival target is used as the outcome during the estimation process. The following options are available:

nothing to use the expected survival time as the outcome, which is the default.
a Real to specify a time at which to evaluate the survival probability and use this as the outcome. For example, setting evaluation_time=5 will use the probability of surviving past time 5 as the outcome in the estimation process.

Learners for Categorical Reward Estimation

The type of learner to use for conducting reward estimation with categorical treatments depends on the outcome type:

for numeric outcomes: CategoricalRegressionRewardEstimator
for binary outcomes: CategoricalClassificationRewardEstimator
for survival outcomes: CategoricalSurvivalRewardEstimator

There are no additional parameters beyond the shared parameters.

Learners for Numeric Reward Estimation

The type of learner to use for conducting reward estimation with numeric treatments depends on the outcome type:

for numeric outcomes: NumericRegressionRewardEstimator
for binary outcomes: NumericClassificationRewardEstimator
for survival outcomes: NumericSurvivalRewardEstimator

In addition to the shared learner parameters, the following parameters are used to control the reward estimation procedure.

`estimation_kernel`

A Symbol or String specifying the kernel function to use while estimating the propensity scores and outcomes. Defaults to :gaussian which uses the Gaussian kernel.

`estimation_kernel_bandwidth`

The bandwidth to use in conjunction with estimation_kernel. This parameter is a Real scaling factor that is applied to the rule-of-thumb bandwidth estimate. The default value is 1, implying the rule-of-thumb estimate is used as the bandwidth. We recommend tuning this parameter using the procedure outlined in the guide to bandwidth tuning.

`reward_kernel`

A Symbol or String specifying the kernel function to use while calculating the rewards as part of the propensity scores adjustment. Defaults to :epanechnikov which uses the Epanechnikov kernel.

`reward_kernel_bandwidth`

The bandwidth to use in conjunction with reward_kernel. This parameter is a Real scaling factor that is applied to the rule-of-thumb bandwidth estimate. The default value is 1, implying the rule-of-thumb estimate is used as the bandwidth. We recommend tuning this parameter using the procedure outlined in the guide to bandwidth tuning.