Tips and Tricks
This page contains some tips and tricks for getting the best results out of OptImpute.
Correct training and testing setup
When using imputation for machine learning tasks with training and testing data, it is important to only impute on the training set instead of the entire data to avoid information from the testing set leaking into the training set.
In order to simulate a fair out-of-sample evaluation, you should train the imputation learner using fit_transform!
on the training data. You can then use transform
with the trained imputation learner on the test data to give complete data that can be used to evaluate the trained model.
The Hepatitis Mortality Prediction case study contains a complete example of this pipeline in action.