Case Studies and Examples

This is a collection of case studies that demonstrate the application of IAI modules to real-world problems and datasets:

  • Loan Default Risk  Optimal Classification Trees  

    We compare and contrast the interpretability of Optimal Trees against methods for model explanability (LIME/SHAP) using the dataset from the FICO Explainable Machine Learning Challenge.

  • Hepatitis Mortality Prediction  OptImpute    Optimal Classification Trees  

    We use Optimal Trees to make mortality predictions for patients with hepatitis. The dataset contains a large number of missing values, so we examine the performance of the final predictive model under a variety of schemes for handling missing values.

  • Supreme Court Outcomes  Optimal Classification Trees  

    We revisit a case study from The Analytics Edge where CART is used to predict the outcomes of Supreme Court votes. We apply Optimal Trees to the same dataset to investigate the improvement over CART.

  • House Sale Prices  Optimal Regression Trees  

    We use a house price dataset to show when regression trees with constant predictions can be inadequate when there is a strong linear relationship in the data. We show how to build Optimal Regression Trees with linear predictions to improve the model performance and interpretability.

  • Mercedes-Benz Testing  Optimal Feature Selection  

    We compare and contrast Optimal Feature Selection and elastic net regression as methods for conducting feature selection on the Mercedes-Benz Greener Manufacturing Kaggle competition.

  • Turbofan Predictive Maintenance  Optimal Classification Trees    Optimal Survival Trees  

    We study a concrete case of predictive maintenance using Optimal Classification and Survival Trees. We compare our approach to classical models (e.g. XGBoost, CART) and we illustrate how interpretability helps to understand the underlying failure mechanisms.

  • Online Imputation for Production Pipelines  OptImpute    Optimal Classification Trees  

    We use the breast-cancer dataset as the context to investigate the potential impact of missing data in the online setting, and illustrate some potential remedies such as using Optimal Imputation.

  • Detecting Racial Bias in Jury Selection  Optimal Feature Selection    Optimal Classification Trees  

    We use interpretable methods to investigate the presence of racial biases in jury selection, using data released as part of the 2019 U.S. Supreme Court case "Flowers v. Mississippi".

  • Revenue Optimization for Grocery Pricing  Optimal Prescriptive Trees    Optimal Policy Trees  

    We utilize two prescriptive methods to develop interpretable pricing strategies for grocery items based on demographic information, resulting in an estimated 60-70% lift in revenue.

  • Optimal Prescription for Diabetes Management  Optimal Prescriptive Trees    Optimal Policy Trees  

    We learn an interpretable diabetes management policy from observational data, with various treatment combination and dosing options. We show that Optimal Policy Trees use the data very efficiently and lead to a clinically significant improvement in health outcomes.

  • Interpretable Clustering  Optimal Classification Trees    Optimal Policy Trees  

    We study two ways of using Optimal Trees to identify meaningful clusters from a credit card usage behavior dataset. The first approach consists of training an Optimal Tree to predict the cluster assignments made by a traditional clustering method. The second approach consists of considering the clustering problem as a supervised learning problem by selecting a feature as a relevant target variable to guide the clustering process.

  • Reducing Churn  Reward Estimation    Optimal Policy Trees  

    We demonstrate how to segment customers and prescribe optimal interventions to reduce churn over time. We use Reward Estimation with Survival Outcomes to estimate the counterfactual outcomes, and Optimal Policy Trees to construct cohorts of customers with similar response to interventions and find the optimal pricing policy for each cohort.