This page contains some tips and tricks for getting the best results out of OptImpute.
When using imputation for machine learning tasks with training and testing data, it is important to only impute on the training set instead of the entire data to avoid information from the testing set leaking into the training set.
In order to simulate a fair out-of-sample evaluation, you should train the imputation learner using
fit_transform! on the training data. You can then use
transform with the trained imputation learner on the test data to give complete data that can be used to evaluate the trained model.
The Hepatitis Mortality Prediction case study contains a complete example of this pipeline in action.