Case Studies and Examples

This is a collection of case studies that demonstrate the application of IAI modules to real-world problems and datasets:

  • Loan Default Risk - Interpretability

    We compare and contrast the interpretability of Optimal Trees against methods for model explanability (LIME/SHAP) using the dataset from the FICO Explainable Machine Learning Challenge.

  • Hepatitis Mortality Prediction - Missing Values

    We use Optimal Trees to make mortality predictions for patients with hepatitis. The dataset contains a large number of missing values, so we examine the performance of the final predictive model under a variety of schemes for handling missing values.

  • Supreme Court Outcomes - Optimal Trees

    We revisit a case study from The Analytics Edge where CART is used to predict the outcomes of Supreme Court votes. We apply Optimal Trees to the same dataset to investigate the improvement over CART.

  • House Sale Prices - Optimal Regression Trees with Linear Predictions

    We use a house price dataset to show when regression trees with constant predictions can be inadequate when there is a strong linear relationship in the data. We show how to build Optimal Regression Trees with linear predictions to improve the model performance and interpretability.

  • Mercedes-Benz Testing - Optimal Feature Selection

    We compare and contrast Optimal Feature Selection and elastic net regression as methods for conducting feature selection on the Mercedes-Benz Greener Manufacturing Kaggle competition.

  • Turbofan Predictive Maintenance - Optimal Classification and Survival Trees

    We study a concrete case of predictive maintenance using Optimal Classification and Survival Trees. We compare our approach to classical models (e.g. XGBoost, CART) and we illustrate how interpretability helps to understand the underlying failure mechanisms.

  • Online Imputation for Production Pipelines - Optimal Imputation

    We use the breast-cancer dataset as the context to investigate the potential impact of missing data in the online setting, and illustrate some potential remedies such as using Optimal Imputation.

  • Detecting Racial Bias in Jury Selection - Optimal Feature Selection and Optimal Trees

    We use interpretable methods to investigate the presence of racial biases in jury selection, using data released as part of the 2019 U.S. Supreme Court case "Flowers v. Mississippi".

  • Revenue Optimization for Grocery Pricing - Optimal Prescriptive Trees and Optimal Policy Trees

    We utilize two prescriptive methods to develop interpretable pricing strategies for grocery items based on demographic information, resulting in an estimated 60-70% lift in revenue.

  • Optimal Prescription for Diabetes Management - Optimal Prescriptive Trees and Optimal Policy Trees

    We learn an interpretable diabetes management policy from observational data, with various treatment combination and dosing options. We show that Optimal Policy Trees use the data very efficiently and lead to a clinically significant improvement in health outcomes.

  • Interpretable Clustering - Optimal Trees

    We study two ways of using Optimal Trees to identify meaningful clusters from a credit card usage behavior dataset. The first approach consists of training an Optimal Tree to predict the cluster assignments made by a traditional clustering method. The second approach consists of considering the clustering problem as a supervised learning problem by selecting a feature as a relevant target variable to guide the clustering process.