Predicting Supreme Court Outcomes

In this example, we revisit the supreme court case from The Analytics Edge, where CART is used to predict the outcomes of supreme court votes. We will apply Optimal Trees to the same problem and compare both approaches.

The dataset is available in the supplementary materials accompanying the book:

using CSV, DataFrames
df = CSV.read("Stevens.csv", DataFrame, pool=true)
566×9 DataFrame
 Row │ Docket   Term   Circuit  Issue              Petitioner  Respondent  Low ⋯
     │ String   Int64  String   String             String      String      Str ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ 93-1408   1994  2nd      EconomicActivity   BUSINESS    BUSINESS    lib ⋯
   2 │ 93-1577   1994  9th      EconomicActivity   BUSINESS    BUSINESS    lib
   3 │ 93-1612   1994  5th      EconomicActivity   BUSINESS    BUSINESS    lib
   4 │ 94-623    1994  1st      EconomicActivity   BUSINESS    BUSINESS    con
   5 │ 94-1175   1995  7th      JudicialPower      BUSINESS    BUSINESS    con ⋯
   6 │ 95-129    1995  9th      EconomicActivity   BUSINESS    BUSINESS    con
   7 │ 95-728    1996  FED      EconomicActivity   BUSINESS    BUSINESS    con
   8 │ 96-1768   1997  9th      EconomicActivity   BUSINESS    BUSINESS    con
  ⋮  │    ⋮       ⋮       ⋮             ⋮              ⋮           ⋮           ⋱
 560 │ 97-1985   1998  11th     CriminalProcedure  OTHER       US          con ⋯
 561 │ 98-1828   1999  2nd      CriminalProcedure  OTHER       US          con
 562 │ 99-5153   1999  6th      CriminalProcedure  OTHER       US          con
 563 │ 99-804    2000  5th      CriminalProcedure  OTHER       US          lib
 564 │ 99-8508   2000  9th      CriminalProcedure  OTHER       US          con ⋯
 565 │ 97-29     1997  DC       CivilRights        STATE       US          con
 566 │ 00-189    2000  9th      CivilRights        STATE       US          lib
                                                  3 columns and 551 rows omitted

We split out the features and target and split the data into training and testing:

X = df[:, 2:(end - 1)]
y = [val == 0 ? "Affirm" : "Reverse" for val in df.Reverse]
seed = 234
(train_X, train_y), (test_X, test_y) = IAI.split_data(:classification, X, y,
                                                      seed=seed)

Using CART

We will start by using the same approach as in the book. We train a CART model, using a GridSearch to validate over max_depth while tuning cp automatically:

cart = IAI.GridSearch(
    IAI.OptimalTreeClassifier(
        criterion=:gini,
        localsearch=false,
        random_seed=seed,
    ),
    max_depth=1:10,
)
IAI.fit_cv!(cart, train_X, train_y, validation_criterion=:auc)
IAI.get_learner(cart)
Optimal Trees Visualization

This tree uses the lower court decision, the type of issue being considered, the petitioner and respondent, and which circuit court heard the case to make predictions. We can test the performance of the tree by evaluating misclassification and AUC both in and out of sample:

results = DataFrame(
    method=:cart,
    ins_acc=IAI.score(cart, train_X, train_y, criterion=:misclassification),
    oos_acc=IAI.score(cart, test_X,  test_y,  criterion=:misclassification),
    ins_auc=IAI.score(cart, train_X, train_y, criterion=:auc),
    oos_auc=IAI.score(cart, test_X,  test_y,  criterion=:auc),
)
1×5 DataFrame
 Row │ method  ins_acc   oos_acc   ins_auc   oos_auc
     │ Symbol  Float64   Float64   Float64   Float64
─────┼────────────────────────────────────────────────
   1 │ cart    0.704545  0.594118  0.783513  0.633152

Using Optimal Classification Trees

Now we can try the same task using Optimal Classification Trees:

oct = IAI.GridSearch(
  IAI.OptimalTreeClassifier(
      criterion=:gini,
      random_seed=seed,
  ),
  max_depth=2:8,
)
IAI.fit_cv!(oct, train_X, train_y, validation_criterion=:auc)
IAI.get_learner(oct)
Optimal Trees Visualization

This tree is significantly smaller than the CART tree, and just says that the court tends to vote liberal, i.e. conservative decisions in the lower court are reversed and liberal decisions are affirmed. The follow-up splits simply refine the predicted probabilities. We can then evaluate the OCT solution in the same way to see if this simpler model performs well:

append!(results, DataFrame(
    method=:oct,
    ins_acc=IAI.score(oct, train_X, train_y, criterion=:misclassification),
    oos_acc=IAI.score(oct, test_X,  test_y,  criterion=:misclassification),
    ins_auc=IAI.score(oct, train_X, train_y, criterion=:auc),
    oos_auc=IAI.score(oct, test_X,  test_y,  criterion=:auc)
))
2×5 DataFrame
 Row │ method  ins_acc   oos_acc   ins_auc   oos_auc
     │ Symbol  Float64   Float64   Float64   Float64
─────┼────────────────────────────────────────────────
   1 │ cart    0.704545  0.594118  0.783513  0.633152
   2 │ oct     0.671717  0.658824  0.736214  0.699763

We see that despite the much larger tree fit by CART, the OCT model is superior in out-of-sample performance, indicating that CART is overfitting and not learning true patterns in the data.