Predicting Supreme Court Outcomes

In this example, we revisit the supreme court case from The Analytics Edge, where CART is used to predict the outcomes of supreme court votes. We will apply Optimal Trees to the same problem and compare both approaches.

The dataset is available in the supplementary materials accompanying the book:

using CSV, DataFrames
df = DataFrame(CSV.File("Stevens.csv", pool=true))

566×9 DataFrame. Omitted printing of 4 columns
│ Row │ Docket  │ Term  │ Circuit │ Issue             │ Petitioner │
│     │ String  │ Int64 │ String  │ String            │ String     │
├─────┼─────────┼───────┼─────────┼───────────────────┼────────────┤
│ 1   │ 93-1408 │ 1994  │ 2nd     │ EconomicActivity  │ BUSINESS   │
│ 2   │ 93-1577 │ 1994  │ 9th     │ EconomicActivity  │ BUSINESS   │
│ 3   │ 93-1612 │ 1994  │ 5th     │ EconomicActivity  │ BUSINESS   │
│ 4   │ 94-623  │ 1994  │ 1st     │ EconomicActivity  │ BUSINESS   │
│ 5   │ 94-1175 │ 1995  │ 7th     │ JudicialPower     │ BUSINESS   │
│ 6   │ 95-129  │ 1995  │ 9th     │ EconomicActivity  │ BUSINESS   │
│ 7   │ 95-728  │ 1996  │ FED     │ EconomicActivity  │ BUSINESS   │
⋮
│ 559 │ 97-1192 │ 1997  │ DC      │ JudicialPower     │ OTHER      │
│ 560 │ 97-1985 │ 1998  │ 11th    │ CriminalProcedure │ OTHER      │
│ 561 │ 98-1828 │ 1999  │ 2nd     │ CriminalProcedure │ OTHER      │
│ 562 │ 99-5153 │ 1999  │ 6th     │ CriminalProcedure │ OTHER      │
│ 563 │ 99-804  │ 2000  │ 5th     │ CriminalProcedure │ OTHER      │
│ 564 │ 99-8508 │ 2000  │ 9th     │ CriminalProcedure │ OTHER      │
│ 565 │ 97-29   │ 1997  │ DC      │ CivilRights       │ STATE      │
│ 566 │ 00-189  │ 2000  │ 9th     │ CivilRights       │ STATE      │

We split out the features and target and split the data into training and testing:

X = df[:, 2:(end - 1)]
y = [val == 0 ? "Affirm" : "Reverse" for val in df.Reverse]
seed = 1212
(train_X, train_y), (test_X, test_y) = IAI.split_data(:classification, X, y,
                                                      seed=seed)

We will start by using the same approach as in the book. We train a CART model, using a GridSearch to validate over max_depth while tuning cp automatically:

cart = IAI.GridSearch(
    IAI.OptimalTreeClassifier(
        criterion=:gini,
        localsearch=false,
        random_seed=seed,
    ),
    max_depth=1:10,
)
IAI.fit_cv!(cart, train_X, train_y, validation_criterion=:auc)
IAI.get_learner(cart)

Optimal Trees Visualization

This tree uses the lower court decision, the type of issue being considered, the petitioner and respondent, and which circuit court heard the case to make predictions. We can test the performance of the tree by evaluating misclassification and AUC both in and out of sample:

results = DataFrame(
    method=:cart,
    ins_acc=IAI.score(cart, train_X, train_y, criterion=:misclassification),
    oos_acc=IAI.score(cart, test_X,  test_y,  criterion=:misclassification),
    ins_auc=IAI.score(cart, train_X, train_y, criterion=:auc),
    oos_auc=IAI.score(cart, test_X,  test_y,  criterion=:auc),
)

1×5 DataFrame
│ Row │ method │ ins_acc  │ oos_acc  │ ins_auc  │ oos_auc  │
│     │ Symbol │ Float64  │ Float64  │ Float64  │ Float64  │
├─────┼────────┼──────────┼──────────┼──────────┼──────────┤
│ 1   │ cart   │ 0.739899 │ 0.594118 │ 0.805298 │ 0.657869 │

Now we can try the same task using Optimal Classification Trees:

oct = IAI.GridSearch(
  IAI.OptimalTreeClassifier(
      criterion=:gini,
      random_seed=seed,
  ),
  max_depth=2:8,
)
IAI.fit_cv!(oct, train_X, train_y, validation_criterion=:auc)
IAI.get_learner(oct)

Optimal Trees Visualization

This tree is significantly smaller than the CART tree, and just says that the court tends to vote liberal, i.e. conservative decisions in the lower court are reversed and liberal decisions are affirmed. The follow-up splits simply refine the predicted probabilities. We can then evaluate the OCT solution in the same way to see if this simpler model performs well:

append!(results, DataFrame(
    method=:oct,
    ins_acc=IAI.score(oct, train_X, train_y, criterion=:misclassification),
    oos_acc=IAI.score(oct, test_X,  test_y,  criterion=:misclassification),
    ins_auc=IAI.score(oct, train_X, train_y, criterion=:auc),
    oos_auc=IAI.score(oct, test_X,  test_y,  criterion=:auc)
))

2×5 DataFrame
│ Row │ method │ ins_acc  │ oos_acc  │ ins_auc  │ oos_auc  │
│     │ Symbol │ Float64  │ Float64  │ Float64  │ Float64  │
├─────┼────────┼──────────┼──────────┼──────────┼──────────┤
│ 1   │ cart   │ 0.739899 │ 0.594118 │ 0.805298 │ 0.657869 │
│ 2   │ oct    │ 0.674242 │ 0.652941 │ 0.74555  │ 0.672322 │

We see that despite the much larger tree fit by CART, the OCT model is superior in out-of-sample performance, indicating that CART is overfitting and not learning true patterns in the data.