Skip to main content

GPS data from cellphones for Mechanical Analog


Makris, Nicos; Moghimi, Reza; Godat, Eric; Vu, Tue (2023), GPS data from cellphones for Mechanical Analog , Dryad, Dataset,


Motivated by the increasing need to develop a quantitative, science-based, predictive understanding of the dynamics and response of cities when subjected to hazards, in this paper we apply concepts from statistical mechanics and microrheology to develop mechanical analogs for cities with predictive capabilities. We envision a city to be a matrix where people (cell-phone users) are driven by the city’s economy and other associated incentives while using the collection of its infrastructure networks in a similar way that thermally driven Brownian probe particles are moving within a complex viscoelastic material. Mean-square displacements (ensemble averages) of thousands of cell-phone users are computed from GPS location data to establish the creep compliance and the resulting impulse response function of a city. The derivation of these time-response functions allows the synthesis of simple mechanical analogs that model satisfactorily the city’s behavior under normal conditions. Our study concentrates on predicting the response of cities to acute shocks (natural hazards that stress the entire urban area) that are approximated with a rectangular pulse with finite duration, and we show that the solid-like mechanical analogs for cities that we derived predict that cities revert immediately to their pre-event response suggesting that they are inherently resilient. Our findings are in remarkably good agreement with the recorded response of the Dallas metroplex following the February 2021 North American winter storm, which happened at a time for which we have dependable GPS location data.


Data Acquisition

For this preliminary study, we limit our data search to 2 metropolitan cities: Dallas Fort Worth metroplex and San Francisco Bay area.

To acquire the raw data, we purchased the individual cell phone location data from IRYS ( and NEAR ( organizations for the study period of 2 months from Feb 1st 2021–March 31st 2021. This time frame covers the snowmageddon that invaded DFW during a week in Feb 2021 and resulted in more than 500K claims and $10B total loss.

The raw data received in MATLAB data files and TSV (tab-separated-value) format for every ping in seconds for the area of (97.98W -95.94W, 32.17N -33.58N) for DFW and (122.56W -121.74W, 37.21N -37.92N) for San Francisco Bay area, both inland and on water surface. Each individual’s cellphone is given a particular ID and is unique throughout the study period.

Data Cleaning

The raw data was given in the tabular format of Device ID, Unix time, Long, Lat of that current visit

We use Dask distributed computing library to process the data in ManeFrame II HPC in order to utilize the multiple processors to accelerate the computation process,

First, we convert the Unix time to local date and local time.

For each of the metroplexes examined in this work, we processed longitude and latitude data (in degrees from the Greenwich meridian and from the equator) at various times for thousands of anonymous cellphone users (IDs).

We retain only the IDs that appear on top of the hour (for DFW) and top of the 2-hour (for SF) during the study period.

Finally, we shortlisted 13k IDs from DFW and ~4k IDs from the San Francisco area for Feb and Mar 2021. We checked and ensured that NO missing data are in the final processed dataset submitted to Dryad.

Usage notes

Usage Notes

The data is in tabular CSV format

Columns are the anonymous/unique IDs

Rows are the top of the hour/2hr during the study period

The corresponding values are Long/Lat for each ID at a particular hour

To open the file, you can use Microsoft Excel, R, Python, or any other program


Hunt Institute for Engineering and Humanity, Award: Using New Data Sources

Lyle interdisciplinary seed funding initiative