Reward Estimation Learners
RewardEstimation provides a variety of learners for conducting reward estimation, which we describe on this page along with a guide to the available parameters.
The type of learner to use depends on the characteristics of the problem:
- for problems with categorical treatments, use one of the learners for categorical reward estimation
- for problems with numeric treatments, use one of the learners for numeric reward estimation
Shared Parameters
In all cases, there are a number of parameters (in addition to the standard shared learner parameters) that are shared by all flavors of reward estimation learner.
reward_estimator
The method to use for reward estimation. The following options are available:
:doubly_robust
to use the doubly robust estimator, which is the default:direct_method
to use the direct method estimator:inverse_propensity_weighting
to use the inverse propensity weighting estimator
Depending on the selected method, you will also have to specify the following parameters:
propensity_estimator
(for IPW and DR methods)outcome_estimator
(for DM and DR methods)
propensity_estimator
The learner to use for propensity score estimation. The type of learner depends on the type of outcome in the problem:
if the treatments are categorical, we estimate the propensities using a
ClassificationLearner
such asRandomForestClassifier
orXGBoostClassifier
. We can also useEqualPropensityEstimator
to estimate equal probabilities for each treatment (for data from randomized experiments where treatments are randomly assigned)if the treatments are numeric, we estimate the propensities using a
RegressionLearner
such asRandomForestRegressor
,XGBoostRegressor
To conduct parameter validation during the propensity score fitting, a GridSearch
over the appropriate learner can also be used.
propensity_insample_num_folds
A positive Integer
indicating the number of folds to use when estimating the propensity scores in-sample during fit_predict!
. Defaults to 5. Set to nothing
to disable in-sample cross-validation (which should only be done if you are confident the model will not overfit).
propensity_min_value
A Real
value between 0 and 1 that specifies the minimum propensity score that can be predicted for any treatment. Defaults to 0. This parameter can be used to clip small propensity estimates to increase stability.
outcome_estimator
The learner to use for outcome estimation. The type of learner depends on the type of outcome in the problem:
- if the outcomes $y_i$ are numeric, we estimate the outcome directly using a
RegressionLearner
such asRandomForestRegressor
,XGBoostRegressor
, orGLMNetCVRegressor
- if the outcomes $y_i$ are binary, we estimate the probability of success using a
ClassificationLearner
such asRandomForestClassifier
, orXGBoostClassifier
- if the outcomes $y_i$ are survival outcomes, we estimate the probability of success using a
SurvivalLearner
such asRandomForestSurvivalLearner
, orXGBoostSurvivalLearner
To conduct parameter validation during the outcome learner fitting, a GridSearch
over the appropriate learner can also be used.
outcome_insample_num_folds
A positive Integer
indicating the number of folds to use when estimating the outcomes in-sample during fit_predict!
. Defaults to 5. Set to nothing
to disable in-sample cross-validation (which should only be done if you are confident the model will not overfit).
censoring_adjustment_method
For problems with survival outcomes only
The method to use for adjusting rewards to account for censoring. The following options are available:
:complete_cases
to use the inverse-probability of censoring weighting (IPCW) as outlined in Section 2.2 of Cui et al. (2020):increased_efficiency
to use the increased efficiency estimation approach as outlined in Section 2.3 of Cui et al. (2020) and Section 10.4 of Tsiatis (2007), which is the default
evaluation_time
For problems with survival outcomes only
Controls which survival target is used as the outcome during the estimation process. The following options are available:
nothing
to use the expected survival time as the outcome, which is the default.- a
Real
to specify a time at which to evaluate the survival probability and use this as the outcome. For example, settingevaluation_time=5
will use the probability of surviving past time5
as the outcome in the estimation process.
Learners for Categorical Reward Estimation
The type of learner to use for conducting reward estimation with categorical treatments depends on the outcome type:
- for numeric outcomes:
CategoricalRegressionRewardEstimator
- for binary outcomes:
CategoricalClassificationRewardEstimator
- for survival outcomes:
CategoricalSurvivalRewardEstimator
There are no additional parameters beyond the shared parameters.
Learners for Numeric Reward Estimation
The type of learner to use for conducting reward estimation with numeric treatments depends on the outcome type:
- for numeric outcomes:
NumericRegressionRewardEstimator
- for binary outcomes:
NumericClassificationRewardEstimator
- for survival outcomes:
NumericSurvivalRewardEstimator
In addition to the shared learner parameters, the following parameters are used to control the reward estimation procedure.
estimation_kernel
A Symbol
or String
specifying the kernel function to use while estimating the propensity scores and outcomes. Defaults to :gaussian
which uses the Gaussian kernel.
estimation_kernel_bandwidth
The bandwidth to use in conjunction with estimation_kernel
. This parameter is a Real
scaling factor that is applied to the rule-of-thumb bandwidth estimate. The default value is 1
, implying the rule-of-thumb estimate is used as the bandwidth. We recommend tuning this parameter using the procedure outlined in the guide to bandwidth tuning.
reward_kernel
A Symbol
or String
specifying the kernel function to use while calculating the rewards as part of the propensity scores adjustment. Defaults to :epanechnikov
which uses the Epanechnikov kernel.
reward_kernel_bandwidth
The bandwidth to use in conjunction with reward_kernel
. This parameter is a Real
scaling factor that is applied to the rule-of-thumb bandwidth estimate. The default value is 1
, implying the rule-of-thumb estimate is used as the bandwidth. We recommend tuning this parameter using the procedure outlined in the guide to bandwidth tuning.