Case Studies and Examples

This is a collection of case studies that demonstrate the application of IAI modules to real-world problems and datasets:

  • Loan Default Risk - Interpretability

    We compare and contrast the interpretability of Optimal Trees against methods for model explanability (LIME/SHAP) using the dataset from the FICO Explainable Machine Learning Challenge.

  • Hepatitis Mortality Prediction - Missing Values

    We use Optimal Trees to make mortality predictions for patients with hepatitis. The dataset contains a large number of missing values, so we examine the performance of the final predictive model under a variety of schemes for handling missing values.

  • Supreme Court Outcomes - Optimal Trees

    We revisit a case study from The Analytics Edge where CART is used to predict the outcomes of Supreme Court votes. We apply Optimal Trees to the same dataset to investigate the improvement over CART.

  • House Sale Prices - Optimal Regression Trees with Linear Predictions

    We use a house price dataset to show when regression trees with constant predictions can be inadequate when there is a strong linear relationship in the data. We show how to build Optimal Regression Trees with linear predictions to improve the model performance and interpretability.

  • Mercedes-Benz Testing - Optimal Feature Selection

    We compare and contrast Optimal Feature Selection and elastic net regression as methods for conducting feature selection on the Mercedes-Benz Greener Manufacturing Kaggle competition.

  • Turbofan Predictive Maintenance - Optimal Classification and Survival Trees

    We study a concrete case of predictive maintenance using Optimal Classification and Survival Trees. We compare our approach to classical models (e.g. XGBoost, CART) and we illustrate how interpretability helps to understand the underlying failure mechanisms.

  • Online Imputation for Production Pipelines - Optimal Imputation

    We use the breast-cancer dataset as the context to investigate the potential impact of missing data in the online setting, and illustrate some potential remedies such as using Optimal Imputation.