Racial Bias in Jury Selection

In this example, we use interpretable methods to investigate the presence of human biases in decision making. In particular, we consider the role of race in jury selection. In 1986, the U.S. Supreme Court ruled that using race as a reason to remove potential jurors is unconsititutional. Despite this ruling, a large disparity in juror strike rates across races appears to remain.

This disparity was the focus of the 2019 U.S. Supreme Court case "Flowers v. Mississippi", where it was ruled that the District Attorney Doug Evans from the Fifth Circuit Court District in Mississippi had discriminated based on race during jury selection in the six trials of Curtis Flowers.

To support the case, APM Reports collected and published court records of jury strikes in the Fifth Circuit Court District and conducted analysis to assess if there was a systematic racial bias in jury selection in this district. The data included information on each trial, juror, and the voir dire answers by the jurors between 1992 and 2017. As part of their analyses, they used a logistic regression model and concluded that there was significant racial disparity in jury strike rates by the State, even after accounting for other factors in the dataset.

We will use our methods to investigate:

  1. Whether we reach the same conclusion that there is significant racial disparity in strike rates
  2. Whether the racial disparity is the same across the board, or there are specific subgroups where the disparity is most prominent.

Data Preparation

We follow the same data preparation as the methodology in the report to ensure consistency. First, we prepare the data so that each row corresponds to a juror at a particular trial:

using CSV, DataFrames

jurors = CSV.read("jury-data-master/jurors.csv", DataFrame)
trials = CSV.read("jury-data-master/trials.csv", DataFrame)
answers = CSV.read("jury-data-master/voir_dire_answers.csv", DataFrame)
select!(answers, Not([:id, :notes]))

data = leftjoin(jurors, trials, on=(:trial__id => :id))
data = innerjoin(data, answers, matchmissing=:equal,
                 on=[(:id => :juror_id), (:trial__id => :juror_id__trial__id)])
┌ Warning: thread = 1 warning: only found 69 / 70 columns around data row: 3547. Filling remaining columns with `missing`
└ @ CSV ~/.julia/packages/CSV/lIuxi/src/file.jl:603
3545×112 DataFrame
  Row │ id     trial                             trial__id  race    gender   r ⋯
      │ Int64  String                            Int64      String  String   S ⋯
    1 │   107  2004-0257--Sparky Watson                  3  White   Male     J ⋯
    2 │   108  2004-0257--Sparky Watson                  3  Black   Female   J
    3 │   109  2004-0257--Sparky Watson                  3  Black   Female   J
    4 │   110  2004-0257--Sparky Watson                  3  Black   Female   J
    5 │   111  2004-0257--Sparky Watson                  3  White   Male     J ⋯
    6 │   112  2004-0257--Sparky Watson                  3  Black   Female   J
    7 │   113  2004-0257--Sparky Watson                  3  Black   Male     J
    8 │   114  2004-0257--Sparky Watson                  3  White   Male     J
  ⋮   │   ⋮                   ⋮                      ⋮        ⋮        ⋮       ⋱
 3539 │   262  1994-9942--Suzanne Ilene Tavares          6  White   Female   J ⋯
 3540 │  1094  2002-0067--Deondray Johnson              22  White   Female   J
 3541 │  3478  2010-0012--Jerome Patterson              70  White   Female   J
 3542 │  3485  2010-0012--Jerome Patterson              70  White   Female   J
 3543 │  3487  2010-0012--Jerome Patterson              70  Black   Female   J ⋯
 3544 │  2980  1995-2258--Robert Bingham                60  Black   Female   J
 3545 │  2386  2001-0003--Lawrence Branch               47  White   Male     J
                                               107 columns and 3530 rows omitted

We are interested in understanding what leads to a juror being struck by the State. For this purpose, we subset to only jurors eligible to be struck by the State.

data = data[(data.strike_eligibility .== "State") .+
            (data.strike_eligibility .== "Both State and Defense") .> 0, :]

Next, we assemble the features, which include the juror's gender, race, and the defendant's race. In addition, we have the voir dire answers to 65 questions.

data.is_black = data.race .== "Black"
data.same_race = data.race .== data.defendant_race

categorical_vars = [["is_black", "gender", "defendant_race", "same_race"];
                    names(answers, Not(["juror_id", "juror_id__trial__id"]))]

using CategoricalArrays
X = select(data, categorical_vars .=> categorical, renamecols=false)

The target is whether the juror was struck by the State:

y = [v == "Struck by the state" ? "Strike" : "No strike"
     for v in data.struck_by]

Finally, we can split into training and testing:

seed = 1
(X_train, y_train), (X_test, y_test) = IAI.split_data(:classification, X, y,

Optimal Feature Selection

The first model we apply is Optimal Feature Selection. This is similar to the backward stepwise logistic regression model used in the original study, except that instead of iteratively selecting and removing variables that are insignificant, the Optimal Feature Selection will pick the optimal set of variables in a single step.

We run the Optimal Feature Selection, considering all possible combinations of up to 15 features, and selecting the best combination based on the AUC on a hold-out validation set:

ofs_grid = IAI.GridSearch(
IAI.fit_cv!(ofs_grid, X_train, y_train, validation_criterion=:auc)
Fitted OptimalFeatureSelectionClassifier:
  Constant: -1.97038
    accused==true:           2.60471
    death_hesitation==true:  1.81951
    fam_accused==true:       1.44206
    is_black==true:          1.62918
    know_def==true:          1.25579
    medical==true:           2.40397
    no_death==true:          4.39454
  (Higher score indicates stronger prediction for class `Strike`)

We see that is_black is among the 7 features selected in the best model, in addition to other variables such as know_def (juror has prior familiarity with defendant through either personal or professional channels) and fam_accused (the juror has friends or family accused of being involved in criminal activity). This reaffirms the finding of the previous analysis that is_black is a useful feature in the logistic regression model for predicting the probability of strike.

IAI.score(ofs_grid, X_test, y_test, criterion=:auc)

The model also has strong predictive performance, with an out-of-sample AUC of 0.826.

To augment these findings, we can visualize the variable importance across all sparsity levels. The importance is normalized so the most important variable has a value of 1 at each sparsity. The variables are incrementally included from the bottom as they become selected under higher sparsity.

using Plots
plot(ofs_grid, type=:importance, size=(600, 600))

We can see that is_black is evaluated as the most important variable for every level of sparsity, which demonstrates that it has roughly the same predictive power regardless of what other features are added. This gives evidence that it is capturing signal not present in the other variables.

To further confirm that the race of the juror being black is important in explaining the strike decision, we can build a model without the race variables and compare the performance:

ofs_grid_no_race = IAI.GridSearch(
IAI.fit_cv!(ofs_grid_no_race, select(X_train, Not([:is_black, :same_race])),
            y_train, validation_criterion=:auc)

IAI.score(ofs_grid_no_race, X_test, y_test, criterion=:auc)

We see that AUC falls by around 12% when we remove race from the model, a very strong indication that the race being black is highly explanatory to being struck, and that we cannot proxy this signal using other features in the dataset.

As a small side note, if the two models had similar performance this would not be sufficient evidence to conclude that race has no impact on the decision, as in that case other variables in the dataset may still be proxying for the race. However, in our case the large decrease in performance that we see upon removing race is strong evidence that race has unique predictive power in explaining the outcome that cannot be proxied with other variables.

Identifying Subgroups with Disparity

We have strong evidence that the race of the juror plays a strong role in predicting the probability of being struck by the State. Next, we would like to investigate if there are specific subpopulations where this effect is more or less pronounced. To do this, we will move away from linear models and use Optimal Classification Trees as a tool to identify subpopulations with statistically significant differences in strike rate based on race.

To do this, we first train an Optimal Classification Tree without the race variable:

grid = IAI.GridSearch(
        split_features=Not([:is_black, :same_race]),
IAI.fit!(grid, X_train, y_train, validation_criterion=:auc)
lnr = IAI.get_learner(grid)
IAI.set_display_label!(lnr, "Strike")
Optimal Trees Visualization

The resulting tree has identified six subgroups of jurors that have similar probabilities of being struck by the State. Importantly, these subgroups are defined without considering the race of the juror. For example, node 2 contains jurors that have been accused of a crime in the past, who understandably have a very high strike rate of 93%.

Next, we test if there is a significant difference in strike rate between black and non-black jurors in each subgroup using Fisher's Exact Test.

group = [x == true ? "Black" : "Non-black" for x in X.is_black]
outputs = IAI.compare_group_outcomes(lnr, X, y, group, positive_label="Strike")
pvalues = [o.p_value["vs-rest"]["Black"] for o in outputs]

Because we are simultaneously conducting several hypothesis tests, we use the Holm-Bonferroni method to adjust the p-values to avoid false positives.

using MultipleTesting
pvalues = adjust(pvalues, Holm())

We display the strike rate for each group in the node, and the p-value from the test. If a node shows a statistically significant difference between the two groups, it is colored red or green depending on the sign of the difference.

extras = map(1:length(outputs)) do i
  summary = outputs[i].summary
  p_value = pvalues[i]
  node_color = if p_value > 0.05
  elseif summary.prob[1] > summary.prob[2]

  node_summary = "Strike rate for Black: $(round(Int, summary.prob[1] * 100))%; " *
                 "For non-Black: $(round(Int, summary.prob[2] * 100))% " *
                 "(p=$(round(p_value, digits=3)))"

  node_details = IAI.make_html_table(summary)

  Dict(:node_summary_include_default => false,
       :node_details_include_default => false,
       :node_summary_extra => node_summary,
       :node_details_extra => node_details,
       :node_color => node_color)
IAI.TreePlot(lnr, extra_content=extras)
Optimal Trees Visualization