Data from: Data collection of Odometer images via WhatsApp to measure vehicle miles traveled report
Data files
Nov 14, 2025 version files 57.48 KB
-
posts_pii_removed.csv
18.01 KB
-
README.md
7.42 KB
-
survey_pii_removed.csv
23.50 KB
-
users_pii_removed.csv
8.55 KB
Abstract
This dataset contains relevant information used in the study of collecting odometer images via WhatsApp to measure vehicle miles traveled (VMT) in California. In recent years, policymakers have increasingly suggested interventions for sustainable transportation, directly or indirectly aiming to reduce vehicle miles traveled. Researchers in transportation have used various ways to collect VMT data for their analysis, but there are pros and cons to each method. Self-reported VMTs from travel surveys are traditionally used to understand factors affecting VMT or estimate policy effects, yet such data suffer from recall error, rounding, and heaping biases in reporting numbers by which researchers derive VMT. Surveys with the latest technologies, such as a dedicated GPS device, can provide more accurate VMT but can be intrusive on participants’ privacy and too expensive to scale. Passively collected data, such as traffic counts or states’ odometer reading data, rarely capture travelers’ characteristics, limiting the capability of travel behavior analysis. In this study, we developed a new VMT data collection method using WhatsApp through three tasks: 1) a comprehensive literature review of the data collection methods for VMT-related arguments, assessing their strengths and shortcomings; 2) the development of an automation system to collect odometer images from participants via WhatsApp, and 3) a pilot deployment of a real-world study and evaluation of the results, with a discussion of circumstances that this method could provide benefits compared to other existing approaches and of limitations or challenges. The results will provide guidelines to planners, policymakers, and researchers in choosing and implementing strategies for VMT data collection, including those presented here, depending on their analysis and practical needs.
Dataset DOI: 10.5061/dryad.g79cnp62m
Description of the data and file structure
This dataset contains relevant information used in the study "Data collection of Odometer images via WhatsApp to measure vehicle miles traveled", an NCST-supported project conducted by the Institute of Transportation Studies, UC Davis.
This repository contains three files:
- posts_pii_removed.csv
- survey_pii_removed.csv
- users_pii_removed.csv
These files represent the odometer images collected during the study, data of the prescreening survey, and the user statistics, respectively, with all personal identifiable information removed.
Files and variables
File: posts_pii_removed.csv
Description: This data contains information about the posts (odometer images submitted during the study).
Variables
- id: ID. String.
- image_create_date: The datetime when the image was submitted. String in the "YYYY-MM-DDTHH:mm:ss.SSS[Z]" format in UTC.
- vmt: The value of the vehicle miles traveled. Integer.
- status: The status of the post. String.
- is_deleted: Indicator of whether the image binary has been deleted by the user. Boolean.
- posted_by_id: The id of the user who posted this image. String.
- notes: Any notes added by the admin. String.
- image_size: The image size in bytes. Integer.
- visible: Indicator of whether the image was visible on the frontend UI. Boolean.
File: survey_pii_removed.csv
Description: This data contains information about the prescreening survey.
Variables
- StartDate: The datetime when the survey participant started the survey. String in the format of "YYYY-MM-DD" in US Mountain Time.
- EndDate: The datetime when the survey participant ended the survey. String in the format of "YYYY-MM-DD" in US Mountain Time.
- Status: The status of the survey response. Integer.
- Progress: The progress of the survey response, assuming that 100 is completed. Integer.
- Duration__in_seconds_: Time taken to complete the survey. Integer in seconds.
- Finished: Indicator of whether the survey response is completed or not. Integer.
- RecordedDate: The datetime when the survey response is recorded. String in the format of "YYYY-MM-DD HH:mm:ss" in US Mountain Time.
- ResponseId: ID. String.
- ExternalReference: Referencial ID. String
- DistributionChannel: The distribution of the survey by which the response was made. String.
- UserLanguage: The language used in the survey. String.
- Q2: The gender identity of the participant. 1="Woman", 2="Man", 3="Prefer to self-describe".
- Q2_3_TEXT: The gender identity details if the participant selected 3 in Q2. String.
- Q3: The household income level of the participant, in USD. 1="Less than $50,000", 2="$50,000 to $99,999", 3="$100,000 to $149,999", 6="$150,000 or more", 5= "Prefer not to answer".
- Q4: The educational attainment of the participant. 1="Some grade/high school", 2="Completed high school or GED", 3="Some college/technical school", 4="Bachelor’s degree(s)", 5="Graduate degree(s) (e.g., MS, PhD, MBA)", 6="Professional degree(s) (e.g., JD, MD, DDS)", 7="Prefer not to answer".
- Q5: The possession of the driver's license of the participant. 1="Yes", 2="No"
- Q6_1: The number of the household vehicles that the participant's household possesses. Integer.
- Q8_1: The number of the people who would drive the vehicle that the participant would report the VMT for. Integer.
- Q9_6: An estimate of the VMT that the vehicle made in the last seven days. Integer.
- Q10_1: The confidence in the estimate made in Q9_6. 1="It’s very hard to recall it", 2="I struggled a bit to estimate it", 3="I think I remembered it okay", 4="I’m pretty sure I got it right", 5="Yes, I remember it exactly".
- Q11_1: An estimate of the VMT that the vehicle would make today, counting from the date when the participant took the survey. Integer.
- Q11_2: An estimate of the VMT that the vehicle would make one day later, counting from the date when the participant took the survey. Integer.
- Q11_3: An estimate of the VMT that the vehicle would make two day later, counting from the date when the participant took the survey. Integer.
- Q11_4: An estimate of the VMT that the vehicle would make three day later, counting from the date when the participant took the survey. Integer.
- Q11_5: An estimate of the VMT that the vehicle would make four day later, counting from the date when the participant took the survey. Integer.
- Q11_6: An estimate of the VMT that the vehicle would make five day later, counting from the date when the participant took the survey. Integer.
- Q11_7: An estimate of the VMT that the vehicle would make six day later, counting from the date when the participant took the survey. Integer.
- Q12_1: The confidence in the estimate made in Q11_1 to Q11_7. 1="Not confident at all", 2="Slightly confident", 3="Moderately confident", 4="Very confident", 5="Completely confident".
- age: Categorized age (for deidentification purpose) that falls into one of "Under 25", "25-39", or "40 or more".
File: users_pii_removed.csv
Description: This data contains information about the users of the study.
Variables
- id: ID. String in the UUID (v4) format.
- createdAt: The datetime when the user data was created on the database. String in the "YYYY-MM-DD" format in UTC.
- updatedAt: The datetime when the user data was last updated on the database. String in the "YYYY-MM-DD" format in UTC.
- activatedAt: The datetime when the user registered to the study by using the access code. String in the "YYYY-MM-DD" format in UTC.
- deletedAt: The datetime when the user deleted their account for the study. String in the "YYYY-MM-DD" format in UTC.
- accessCode: The unique access code used for registration and deletion of the account. String.
- userStatusId: The status of the user. String.
- invitations: Not used (a referential variable for another database table).
- posts: Not used (a referential variable for another database table).
- lastInvitationSentAt: The datetime when the last instruction email was sent to the user. Note that some rows has null value due to issues in a security update of the university email address. String in the "YYYY-MM-DD HH:mm:ss.SSS" format in UTC.
- invitationCount: The number of instruction emails sent to the user. Integer.
- postCount: The number of posts that the user submitted. Integer.
- submittedPostCount: The number of posts that the user submitted and not processed by the admin or user afterwards. Integer.
- readPostCount: The number of posts that the user submitted and the admin added the odometer reading to it, yet not approved. Integer.
- approvedPostCount: The number of posts that the user submitted and the admin approved. Integer.
- rejectedPostCount: The number of posts that the user submitted and the admin rejected. Integer.
- deletedPostCount: The number of posts that the user submitted but later deleted. Integer.
- visible: Indicator of whether the image was visible on the frontend UI. Boolean.
Code/software
Text viewer or Microsoft Excel can open the files.
Human subjects data
For the "survey" subdataset, we applied a grouping of the age of the study participants.
For the "users" subdataset, we removed the exact time of the day.
