Racial Bias in Jury Selection

In this example, we use interpretable methods to investigate the presence of human biases in decision making. In particular, we consider the role of race in jury selection. In 1986, the U.S. Supreme Court ruled that using race as a reason to remove potential jurors is unconsititutional. Despite this ruling, a large disparity in juror strike rates across races appears to remain.

This disparity was the focus of the 2019 U.S. Supreme Court case "Flowers v. Mississippi", where it was ruled that the District Attorney Doug Evans from the Fifth Circuit Court District in Mississippi had discriminated based on race during jury selection in the six trials of Curtis Flowers.

To support the case, APM Reports collected and published court records of jury strikes in the Fifth Circuit Court District and conducted analysis to assess if there was a systematic racial bias in jury selection in this district. The data included information on each trial, juror, and the voir dire answers by the jurors between 1992 and 2017. As part of their analyses, they used a logistic regression model and concluded that there was significant racial disparity in jury strike rates by the State, even after accounting for other factors in the dataset.

We will use our methods to investigate:

  1. Whether we reach the same conclusion that there is significant racial disparity in strike rates
  2. Whether the racial disparity is the same across the board, or there are specific subgroups where the disparity is most prominent.

Data Preparation

We follow the same data preparation as the methodology in the report to ensure consistency. First, we prepare the data so that each row corresponds to a juror at a particular trial:

using CSV, DataFrames

jurors = CSV.read("jury-data-master/jurors.csv", DataFrame)
trials = CSV.read("jury-data-master/trials.csv", DataFrame)
answers = CSV.read("jury-data-master/voir_dire_answers.csv", DataFrame)
select!(answers, Not([:id, :notes]))

data = leftjoin(jurors, trials, on=(:trial__id => :id))
data = innerjoin(data, answers, matchmissing=:equal,
                 on=[(:id => :juror_id), (:trial__id => :juror_id__trial__id)])
┌ Warning: thread = 1 warning: only found 69 / 70 columns around data row: 3547. Filling remaining columns with `missing`
└ @ CSV ~/.julia/packages/CSV/lIuxi/src/file.jl:603
3545×112 DataFrame
  Row │ id     trial                             trial__id  race    gender   r ⋯
      │ Int64  String                            Int64      String  String   S ⋯
──────┼─────────────────────────────────────────────────────────────────────────
    1 │   107  2004-0257--Sparky Watson                  3  White   Male     J ⋯
    2 │   108  2004-0257--Sparky Watson                  3  Black   Female   J
    3 │   109  2004-0257--Sparky Watson                  3  Black   Female   J
    4 │   110  2004-0257--Sparky Watson                  3  Black   Female   J
    5 │   111  2004-0257--Sparky Watson                  3  White   Male     J ⋯
    6 │   112  2004-0257--Sparky Watson                  3  Black   Female   J
    7 │   113  2004-0257--Sparky Watson                  3  Black   Male     J
    8 │   114  2004-0257--Sparky Watson                  3  White   Male     J
  ⋮   │   ⋮                   ⋮                      ⋮        ⋮        ⋮       ⋱
 3539 │   262  1994-9942--Suzanne Ilene Tavares          6  White   Female   J ⋯
 3540 │  1094  2002-0067--Deondray Johnson              22  White   Female   J
 3541 │  3478  2010-0012--Jerome Patterson              70  White   Female   J
 3542 │  3485  2010-0012--Jerome Patterson              70  White   Female   J
 3543 │  3487  2010-0012--Jerome Patterson              70  Black   Female   J ⋯
 3544 │  2980  1995-2258--Robert Bingham                60  Black   Female   J
 3545 │  2386  2001-0003--Lawrence Branch               47  White   Male     J
                                               107 columns and 3530 rows omitted

We are interested in understanding what leads to a juror being struck by the State. For this purpose, we subset to only jurors eligible to be struck by the State.

data = data[(data.strike_eligibility .== "State") .+
            (data.strike_eligibility .== "Both State and Defense") .> 0, :]

Next, we assemble the features, which include the juror's gender, race, and the defendant's race. In addition, we have the voir dire answers to 65 questions.

data.is_black = data.race .== "Black"
data.same_race = data.race .== data.defendant_race

categorical_vars = [["is_black", "gender", "defendant_race", "same_race"];
                    names(answers, Not(["juror_id", "juror_id__trial__id"]))]

using CategoricalArrays
X = select(data, categorical_vars .=> categorical, renamecols=false)

The target is whether the juror was struck by the State:

y = [v == "Struck by the state" ? "Strike" : "No strike"
     for v in data.struck_by]

Finally, we can split into training and testing:

seed = 1
(X_train, y_train), (X_test, y_test) = IAI.split_data(:classification, X, y,
                                                      seed=seed)

Optimal Feature Selection

The first model we apply is Optimal Feature Selection. This is similar to the backward stepwise logistic regression model used in the original study, except that instead of iteratively selecting and removing variables that are insignificant, the Optimal Feature Selection will pick the optimal set of variables in a single step.

We run the Optimal Feature Selection, considering all possible combinations of up to 15 features, and selecting the best combination based on the AUC on a hold-out validation set:

ofs_grid = IAI.GridSearch(
    IAI.OptimalFeatureSelectionClassifier(
      random_seed=seed,
    ),
    sparsity=1:15,
)
IAI.fit_cv!(ofs_grid, X_train, y_train, validation_criterion=:auc)
IAI.get_learner(ofs_grid)
Fitted OptimalFeatureSelectionClassifier:
  Constant: -1.99072
  Weights:
    accused=true:              2.54637
    death_hesitation=true:     1.86823
    defendant_race=Unknown:    0.589933
    fam_accused=true:          1.47964
    fam_crime_victim=true:     0.534126
    fam_law_enforcement=true: -0.365057
    is_black=true:             1.41219
    know_def=true:             1.273
    law_enforcement=true:     -0.686454
    leans_defense=true:        1.76438
    married=married:          -0.605019
    medical=true:              2.2713
    no_death=true:             4.29094
    same_race=true:            0.374411
  (Higher score indicates stronger prediction for class `Strike`)

We see that is_black is among the 14 features selected in the best model, in addition to other variables such as know_def (juror has prior familiarity with defendant through either personal or professional channels) and fam_accused (the juror has friends or family accused of being involved in criminal activity). This reaffirms the finding of the previous analysis that is_black is a useful feature in the logistic regression model for predicting the probability of strike.

IAI.score(ofs_grid, X_test, y_test, criterion=:auc)
0.8253396158517383

The model also has strong predictive performance, with an out-of-sample AUC of 0.826.

To augment these findings, we can visualize the variable importance across all sparsity levels. The importance is normalized so the most important variable has a value of 1 at each sparsity. The variables are incrementally included from the bottom as they become selected under higher sparsity.

using Plots
plot(ofs_grid, type=:importance, size=(600, 600))

We can see that is_black is evaluated as the most important variable for every level of sparsity, which demonstrates that it has roughly the same predictive power regardless of what other features are added. This gives evidence that it is capturing signal not present in the other variables.

To further confirm that the race of the juror being black is important in explaining the strike decision, we can build a model without the race variables and compare the performance:

ofs_grid_no_race = IAI.GridSearch(
    IAI.OptimalFeatureSelectionClassifier(
      random_seed=seed,
    ),
    sparsity=1:15,
)
IAI.fit_cv!(ofs_grid_no_race, select(X_train, Not([:is_black, :same_race])),
            y_train, validation_criterion=:auc)

IAI.score(ofs_grid_no_race, X_test, y_test, criterion=:auc)
0.6900949125095581

We see that AUC falls by around 12% when we remove race from the model, a very strong indication that the race being black is highly explanatory to being struck, and that we cannot proxy this signal using other features in the dataset.

As a small side note, if the two models had similar performance this would not be sufficient evidence to conclude that race has no impact on the decision, as in that case other variables in the dataset may still be proxying for the race. However, in our case the large decrease in performance that we see upon removing race is strong evidence that race has unique predictive power in explaining the outcome that cannot be proxied with other variables.

Identifying Subgroups with Disparity

We have strong evidence that the race of the juror plays a strong role in predicting the probability of being struck by the State. Next, we would like to investigate if there are specific subpopulations where this effect is more or less pronounced. To do this, we will move away from linear models and use Optimal Classification Trees as a tool to identify subpopulations with statistically significant differences in strike rate based on race.

To do this, we first train an Optimal Classification Tree without the race variable:

grid = IAI.GridSearch(
    IAI.OptimalTreeClassifier(
        criterion=:gini,
        random_seed=seed,
        missingdatamode=:separate_class,
        split_features=Not([:is_black, :same_race]),
    ),
    max_depth=2:5,
)
IAI.fit!(grid, X_train, y_train, validation_criterion=:auc)
lnr = IAI.get_learner(grid)
IAI.set_display_label!(lnr, "Strike")
Optimal Trees Visualization

The resulting tree has identified six subgroups of jurors that have similar probabilities of being struck by the State. Importantly, these subgroups are defined without considering the race of the juror. For example, node 2 contains jurors that have been accused of a crime in the past, who understandably have a very high strike rate of 93%.

Next, we test if there is a significant difference in strike rate between black and non-black jurors in each subgroup using Fisher's Exact Test.

group = [x == true ? "Black" : "Non-black" for x in X.is_black]
outputs = IAI.compare_group_outcomes(lnr, X, y, group, positive_label="Strike")
pvalues = [o.p_value["vs-rest"]["Black"] for o in outputs]

Because we are simultaneously conducting several hypothesis tests, we use the Holm-Bonferroni method to adjust the p-values to avoid false positives.

using MultipleTesting
pvalues = adjust(pvalues, Holm())

We display the strike rate for each group in the node, and the p-value from the test. If a node shows a statistically significant difference between the two groups, it is colored red or green depending on the sign of the difference.

extras = map(1:length(outputs)) do i
  summary = outputs[i].summary
  p_value = pvalues[i]
  node_color = if p_value > 0.05
    "#FFFFFF"
  elseif summary.prob[1] > summary.prob[2]
    "#FFADB4"
  else
    "#92D86F"
  end

  node_summary = "Strike rate for Black: $(round(Int, summary.prob[1] * 100))%; " *
                 "For non-Black: $(round(Int, summary.prob[2] * 100))% " *
                 "(p=$(round(p_value, digits=3)))"

  node_details = IAI.make_html_table(summary)

  Dict(:node_summary_include_default => false,
       :node_details_include_default => false,
       :node_summary_extra => node_summary,
       :node_details_extra => node_details,
       :node_color => node_color)
end
IAI.TreePlot(lnr, extra_content=extras)
Optimal Trees Visualization

There are two cohorts where there is no statistically significant difference in strike rate between black and non-black jurors:

  • Node 2: potential jurors that have been accused of a crime
  • Node 10: potential jurors with reservations about the death penalty

However, there is statistically significant distinction in strike rate between black and non-black potential jurors in nodes 4, 6, and 9:

  • Node 4: someone who has never been accused of a crime, but has family in law enforcement. If this person is black, the strike rate is 47% compared to 8% if this person is white. It appears that this reason for striking jurors is disproportionally applied to black jurors at a rate more than five times higher than non-black jurors.
  • Node 6: someone who has never been accused of a crime, does not have family in law enforcement, but has family or friends that have been accused of crime. If this person is black, the strike rate is 90% compared to 35% if this person is white. It makes sense that having friends or family that have been accused of a crime is a reason to strike potential jurors, yet it seems that this reason for exclusion is applied to non-black jurors at a much lower rate.
  • Node 8: someone who has never been accused of a crime, does not have family in law enforcement, does not have family or friends accused of crime, but does know the defendant. In this group, the strike rate is 86% for black and 24% for non-black jurors. Again, at an intutitive level, the behavior of the State striking a juror that knows the defendant makes sense, but we see this line of reasoning is being used to strike black jurors at a far higher rate.
  • Node 11: someone has never been accused of a crime, does not have family in law enforcement, has no family or friends accused of crime, does not know the defendant, and has no reservation about death penalty. This pool of candidates do not exhibit any of the more obvious reasons for striking and the model is unable to explain further, and so the reasons for striking are likely to be more nuanced and specific to the individual. There is still a higher strike rate of 40% for black jurors compared to 9% for non-black, but the model does not offer further evidence that might explain this.

In summary, the Optimal Classification Tree finds five clear lines of reasoning that are used to exclude jurors from the pool. Of these five groups, two (nodes 2 and 10) are applied regardless of race, but the others (nodes 4, 6, and 8) are applied to black jurors significantly more than non-black jurors, which is strongly suggestive of racial biases factoring into the strike decisions of the State.

Conclusions

The original study used a backward stepwise logistic regression to conclude that there is significant racial disparity in juror strike outcome. This method is only an approximation to variable selection and has the potential to lead to a misleading characterization of variables as important or not.

With Optimal Feature Selection, we still find that race is consistently selected as the most predictive variable of juror strike outcome at all levels of sparsity. This strengthens the conclusion in the original study, as we knowing that the variable selection is exact. Furthermore, we observe a significant decrease in model performance when the race variable is removed, indicating this feature is providing unique signal in the data.

In addition, we used Optimal Classification Trees to systematically identify subgroups of the population with similar chances of being struck, and found that in three of these groups there was a significant disparity in strike rate between black and non-black jurors. These subgroups suggest systemic patterns of racial bias in the strike process, and provide direct characterizations of the situations in which black jurors are likely to have experienced discrimination.