High-frequency location data show that race affects citations and fines for speeding
Data files
Mar 20, 2025 version files 11.76 MB
-
README.md
4.50 KB
-
speedingcites_accidents_no_pii_3orlessindirect.zip
11.75 MB
Abstract
Prior research finds that in encounters with law enforcement minorities are punished more severely than white civilians. Less is known about the effect of race on encounters and its implications for research on racial profiling. Using high-frequency location data of rideshare Lyft drivers in Florida (N=222,838), we estimate the effect of driver race on citations and fines for speeding across 19,356,683 location pings. Compared to a white driver traveling the same speed, we find that racial/ethnic minority drivers are 24 to 33 percent more likely to be cited for speeding and pay 23 to 34 percent more money in fines. We find no evidence that accident and re-offense rates can explain these estimates, suggesting that underlying our results is an animus against minorities.
https://doi.org/10.5061/dryad.4f4qrfjnk
Description of Data
08_estimate_stop_time_algorithm.R
- Estimate true stop time based on when the driver starts slowing down around the police-reported stop time
- Pick the slowdown time segment as the closest slowdown period before the police-reported stop time that was not a passenger pickup/drop-off
09_merge_stops_with_estimated_times_with_periods.ipynb
- This notebook matches stops to a Lyft driving period (a Lyft ‘period’ is defined as when the driver is online and either waiting to be matched to a ride, driving to pick up a passenger, or dropping off a passenger)
- Section 3 contains code to match on driver’s license number and the timestamp of the stop
- We first try a strict match where the stop timestamp must fall within the start and end time of the Lyft period. If that fails, we try a weak match where the stop time must fall within 5 minutes before the period start time and 5 minutes after the period end time
pings_data.ipynb
- Create a GPS-ping-level dataset for all Lyft drivers (cited and uncited)
- Section 1 outlines the functions used to create the dataset:
- The insert_one_ds_just_pings_table function pulls GPS pings data for all drivers and adds on driver and car variables
- The insert_one_ds_pings_w_spatial_table function adds road feature data to each ping using spatial matching
- The insert_one_ds_final_table_w_lags function calculates the driving speed at each observation by measuring the distance traveled between consecutive pings 10 seconds apart
- Sections 2-4 execute the code in batches
- Section 5 identifies a single ping for each citation as the ping responsible for the citation
- Consider all pings that occur in X minutes leading up to the stop time
- Pick ‘cited ping’ by ranking pings in preference of
- first prioritize pings where the road speed limit is the same as the speed limit marked by the police
- prioritize pings in descending order of speed
- prioritize pings that occurred nearest to the stop time
match_voter_data.R
- Match FL voter records to Lyft drivers using first name, last name, DOB, and gender
estimate_fe_models.R
- Regressions for the main FE models for citations and fines
Generate Data + DML Run.ipynb
- Run the DML model and conduct SHAP analysis
plot_figures.R - code for main results plots
summary_stats.R
- code to pull summary stats in Appendix
robustness_checks_fe_model.R
- code for all robustness checks related to alternate specifications/data inputs for the FE model
recidivism_analysis.R
- code for creating recidivism dataset, running the regression, and plotting results
race-prediction-code: All code related to training and applying the race-prediction model
- model-fitting: training the race prediction model
- 01-load-data-users-vggface.ipynb: fine-tunes a convolutional neural network (CNN), based on a pre-trained VGGFace model, for race prediction
- 02-ethnicolr-retrained.ipynb: trains a long-short term memory (LSTM) model to predict race from first and last name
- 03-ethnicolr-applied.ipynb: fine-tunes the LSTM model on Lyft user data
- 04-bisg-final.ipynb: applies Bayesian improved surname geocoding (BISG) to first name, last name, and ZIP code/Census block group data to predict race; trains a “stacked”/”ensembled” XGBoost model that takes predictions from the CNN, LSTM, and BISG models, along with user device information, and outputs a final race prediction
- prediction-template: applying the race prediction model to a sample of users
- 1-process-drivers.ipynb: Loads in profile pictures, home location, and device data of user sample
- 2-crop-driver-images.ipynb: Crops profile pictures to faces
- 3-predict-driver-images.ipynb: Applies CNN prediction model to cropped face pictures
- 4-predict-driver-names.ipynb: Applies BISG model to user name data and location and LSTM model to user name data; feeds CNN, LSTM, and BISG predictions as well as device data into XGBoost model to generate final race prediction
speedingcites_accidents_no_pii_3orlessindirect.zip
Heavily redacted data on citations for speeding and motor vehicle accidents in Florida were used to conduct our analyses. Please contact the corresponding author (Alec Brandon) to inquire about accessing unredacted versions of these data.