High-frequency location data show that race affects citations and fines for speeding

Data files

Mar 20, 2025 version files 11.76 MB

README.md
4.50 KB
speedingcites_accidents_no_pii_3orlessindirect.zip
11.75 MB

Abstract

Prior research finds that in encounters with law enforcement minorities are punished more severely than white civilians. Less is known about the effect of race on encounters and its implications for research on racial profiling. Using high-frequency location data of rideshare Lyft drivers in Florida (N=222,838), we estimate the effect of driver race on citations and fines for speeding across 19,356,683 location pings. Compared to a white driver traveling the same speed, we find that racial/ethnic minority drivers are 24 to 33 percent more likely to be cited for speeding and pay 23 to 34 percent more money in fines. We find no evidence that accident and re-offense rates can explain these estimates, suggesting that underlying our results is an animus against minorities.

https://doi.org/10.5061/dryad.4f4qrfjnk

Description of Data

08_estimate_stop_time_algorithm.R

Estimate true stop time based on when the driver starts slowing down around the police-reported stop time
Pick the slowdown time segment as the closest slowdown period before the police-reported stop time that was not a passenger pickup/drop-off

09_merge_stops_with_estimated_times_with_periods.ipynb

This notebook matches stops to a Lyft driving period (a Lyft 'period' is defined as when the driver is online and either waiting to be matched to a ride, driving to pick up a passenger, or dropping off a passenger)
Section 3 contains code to match on driver's license number and the timestamp of the stop
We first try a strict match where the stop timestamp must fall within the start and end time of the Lyft period. If that fails, we try a weak match where the stop time must fall within 5 minutes before the period start time and 5 minutes after the period end time

pings_data.ipynb

Create a GPS-ping-level dataset for all Lyft drivers (cited and uncited)
Section 1 outlines the functions used to create the dataset:
The insert_one_ds_just_pings_table function pulls GPS pings data for all drivers and adds on driver and car variables
The insert_one_ds_pings_w_spatial_table function adds road feature data to each ping using spatial matching
The insert_one_ds_final_table_w_lags function calculates the driving speed at each observation by measuring the distance traveled between consecutive pings 10 seconds apart
Sections 2-4 execute the code in batches
Section 5 identifies a single ping for each citation as the ping responsible for the citation
Consider all pings that occur in X minutes leading up to the stop time
Pick 'cited ping' by ranking pings in preference of

first prioritize pings where the road speed limit is the same as the speed limit marked by the police
prioritize pings in descending order of speed
prioritize pings that occurred nearest to the stop time

match_voter_data.R

Match FL voter records to Lyft drivers using first name, last name, DOB, and gender

estimate_fe_models.R

Regressions for the main FE models for citations and fines

Generate Data + DML Run.ipynb

Run the DML model and conduct SHAP analysis

plot_figures.R
code for main results plots

summary_stats.R

code to pull summary stats in Appendix

robustness_checks_fe_model.R

code for all robustness checks related to alternate specifications/data inputs for the FE model

recidivism_analysis.R

code for creating recidivism dataset, running the regression, and plotting results

race-prediction-code: All code related to training and applying the race-prediction model

model-fitting: training the race prediction model
01-load-data-users-vggface.ipynb: fine-tunes a convolutional neural network (CNN), based on a pre-trained VGGFace model, for race prediction
02-ethnicolr-retrained.ipynb: trains a long-short term memory (LSTM) model to predict race from first and last name
03-ethnicolr-applied.ipynb: fine-tunes the LSTM model on Lyft user data
04-bisg-final.ipynb: applies Bayesian improved surname geocoding (BISG) to first name, last name, and ZIP code/Census block group data to predict race; trains a “stacked”/”ensembled” XGBoost model that takes predictions from the CNN, LSTM, and BISG models, along with user device information, and outputs a final race prediction
prediction-template: applying the race prediction model to a sample of users
1-process-drivers.ipynb: Loads in profile pictures, home location, and device data of user sample
2-crop-driver-images.ipynb: Crops profile pictures to faces
3-predict-driver-images.ipynb: Applies CNN prediction model to cropped face pictures
4-predict-driver-names.ipynb: Applies BISG model to user name data and location and LSTM model to user name data; feeds CNN, LSTM, and BISG predictions as well as device data into XGBoost model to generate final race prediction

speedingcites_accidents_no_pii_3orlessindirect.zip

Heavily redacted data on citations for speeding and motor vehicle accidents in Florida were used to conduct our analyses. Please contact the corresponding author (Alec Brandon) to inquire about accessing unredacted versions of these data.

High-frequency location data show that race affects citations and fines for speeding

Data files

Abstract

README: High-frequency location data show that race affects citations and fines for speeding

Description of Data

Works referencing this dataset