Tips and Tricks
This page contains some tips and tricks for getting the best results out of OptImpute.
Correct training and testing setup
When using imputation for machine learning tasks with training and testing data, it is important to only impute on the training set instead of the entire data to avoid information from the training set leaking into the testing set.
In order to simulate a fair out-of-sample evaluation, you should train the imputation learner using
fit! on the training data. You can then use
transform with the trained imputation learner on the test data to give complete data that can be used to evaluate the trained model.
The Hepatitis Mortality Prediction case study contains a complete example of this pipeline in action.