Small Town Policing Accountability
In fall 2021, I joined the Institute for the Quantitative Study of Inclusion, Diversity, and Equity (QSIDE) research lab on Small Town Policing Accountability (SToPA). The goal of SToPA is to develop a toolkit for procuring, structuring, and analyzing policing data in small towns that lack resources and systems to make their own data public. The lab consists primarily of mathematicians and data scientists, but the nature of the project means we also work closely with lawyers, sociologists, and activists. SToPA currently serves several towns, and is working to expand accountability in small towns across the United States.
My contribution to SToPA has mainly involved police call logs from a specific town. These logs contain an abundance of information about every occurrence of police action in this town in 2020 and 2021, such as times, location, officer involvement, outcomes, and police-written narratives. Through the help of local undergraduates and a large group of volunteers during QSIDE's Datathon4Justice, we at the SToPA lab were able to take the hard-copy documents provided by law enforcement and use Tessaract, an optical character recognition engine which "reads" PDFs to extract text, to create the DataFrame from which we now work.
A bipartite graph representing call logs (on the right) and the words they consist of (on the left), clustered by common theme.
SToPA has a lot of exciting concurrent projects, but my focus is on identifying and extracting global information from the police-written narratives. My primary goal is to reasonably sort the tens of thousands of logs into subsets based on topical structure. This is mainly done by following the methodology of Gerlach, Peixoto, and Altmann in A network approach to topic models: First, from the text corpora we construct a bipartite graph whose vertices are words and logs, adding an edge between a word and a log if it appears in that log. Using a stochastic block model with nonparametric priors, an established community-detection method for graphs, we cluster both words and documents. Essentially, this means words are grouped if they frequently occur together, and logs are simultaneously grouped depending on the relative distribution of word-groups they use. The image to the left is an example of what such a clustering looks like: vertices on the left represent call logs, with vertices on the right representing the words within those logs.
For example, one subset of logs might consist of those which use both a lot of "student/college" words and a lot of "party/noise" words, versus another subset which uses a lot of "student/college" words but not many "party/noise" words. The grouping thus distinguishes college party noise complaint calls from other types of calls to campus. This preliminary work still requires a pipeline verification comparing human annotated logs to the results before it can be generalized to other small towns, but that is the goal. The results of such clusterings can then be very carefully and consciously analyzed for patterns, leading to insight about policing in small towns.
No background is needed for students who wish to help the SToPA lab. Facility with R studio or python is of course preferred for students wishing to assist in the data analysis, but any student interested in social justice is welcome to sift through the data and check where our optical character reader is failing to detect the correct letters, assist with pipeline verification by labeling call logs, and contribute to the discussions of what patterns we see or should look for in the data.
QSIDE also has many conferences, datathons, and other events which are accessible to students wanting to learn more or get directly involved. Their website is an excellent place to start investigating the intersection of mathematics with social justice.