Predicting Supreme Court Outcomes
In this example, we revisit the supreme court case from The Analytics Edge, where CART is used to predict the outcomes of supreme court votes. We will apply Optimal Trees to the same problem and compare both approaches.
The dataset is available in the supplementary materials accompanying the book:
using CSV, DataFrames
df = CSV.read("Stevens.csv", DataFrame, pool=true)
566×9 DataFrame
Row │ Docket Term Circuit Issue Petitioner Respondent Low ⋯
│ String Int64 String String String String Str ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ 93-1408 1994 2nd EconomicActivity BUSINESS BUSINESS lib ⋯
2 │ 93-1577 1994 9th EconomicActivity BUSINESS BUSINESS lib
3 │ 93-1612 1994 5th EconomicActivity BUSINESS BUSINESS lib
4 │ 94-623 1994 1st EconomicActivity BUSINESS BUSINESS con
5 │ 94-1175 1995 7th JudicialPower BUSINESS BUSINESS con ⋯
6 │ 95-129 1995 9th EconomicActivity BUSINESS BUSINESS con
7 │ 95-728 1996 FED EconomicActivity BUSINESS BUSINESS con
8 │ 96-1768 1997 9th EconomicActivity BUSINESS BUSINESS con
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
560 │ 97-1985 1998 11th CriminalProcedure OTHER US con ⋯
561 │ 98-1828 1999 2nd CriminalProcedure OTHER US con
562 │ 99-5153 1999 6th CriminalProcedure OTHER US con
563 │ 99-804 2000 5th CriminalProcedure OTHER US lib
564 │ 99-8508 2000 9th CriminalProcedure OTHER US con ⋯
565 │ 97-29 1997 DC CivilRights STATE US con
566 │ 00-189 2000 9th CivilRights STATE US lib
3 columns and 551 rows omitted
We split out the features and target and split the data into training and testing:
X = df[:, 2:(end - 1)]
y = [val == 0 ? "Affirm" : "Reverse" for val in df.Reverse]
seed = 234
(train_X, train_y), (test_X, test_y) = IAI.split_data(:classification, X, y,
seed=seed)
Using CART
We will start by using the same approach as in the book. We train a CART model, using a GridSearch
to validate over max_depth
while tuning cp
automatically:
cart = IAI.GridSearch(
IAI.OptimalTreeClassifier(
criterion=:gini,
localsearch=false,
random_seed=seed,
),
max_depth=1:10,
)
IAI.fit_cv!(cart, train_X, train_y, validation_criterion=:auc)
IAI.get_learner(cart)
This tree uses the lower court decision, the type of issue being considered, the petitioner and respondent, and which circuit court heard the case to make predictions. We can test the performance of the tree by evaluating misclassification and AUC both in and out of sample:
results = DataFrame(
method=:cart,
ins_acc=IAI.score(cart, train_X, train_y, criterion=:misclassification),
oos_acc=IAI.score(cart, test_X, test_y, criterion=:misclassification),
ins_auc=IAI.score(cart, train_X, train_y, criterion=:auc),
oos_auc=IAI.score(cart, test_X, test_y, criterion=:auc),
)
1×5 DataFrame
Row │ method ins_acc oos_acc ins_auc oos_auc
│ Symbol Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────
1 │ cart 0.704545 0.594118 0.783513 0.633152
Using Optimal Classification Trees
Now we can try the same task using Optimal Classification Trees:
oct = IAI.GridSearch(
IAI.OptimalTreeClassifier(
criterion=:gini,
random_seed=seed,
),
max_depth=2:8,
)
IAI.fit_cv!(oct, train_X, train_y, validation_criterion=:auc)
IAI.get_learner(oct)
This tree is significantly smaller than the CART tree, and just says that the court tends to vote liberal, i.e. conservative decisions in the lower court are reversed and liberal decisions are affirmed. The follow-up splits simply refine the predicted probabilities. We can then evaluate the OCT solution in the same way to see if this simpler model performs well:
append!(results, DataFrame(
method=:oct,
ins_acc=IAI.score(oct, train_X, train_y, criterion=:misclassification),
oos_acc=IAI.score(oct, test_X, test_y, criterion=:misclassification),
ins_auc=IAI.score(oct, train_X, train_y, criterion=:auc),
oos_auc=IAI.score(oct, test_X, test_y, criterion=:auc)
))
2×5 DataFrame
Row │ method ins_acc oos_acc ins_auc oos_auc
│ Symbol Float64 Float64 Float64 Float64
─────┼────────────────────────────────────────────────
1 │ cart 0.704545 0.594118 0.783513 0.633152
2 │ oct 0.671717 0.658824 0.736214 0.699763
We see that despite the much larger tree fit by CART, the OCT model is superior in out-of-sample performance, indicating that CART is overfitting and not learning true patterns in the data.