Skip to main content

Caltrans PEMS highway sensor average flows by occupancy

Cite this dataset

Fitzgerald, Clark; Zhang, Michael (2018). Caltrans PEMS highway sensor average flows by occupancy [Dataset]. Dryad.


This data summarizes average vehicle flow as a function of occupancy for traffic sensor data available from CalTrans Performance Management System (PEMS).

It's useful because it shows the behavior of traffic in congested regimes, without requiring the preprocessing of several hundred GB of the raw data.

Open the pdf files to see what this data looks like.


First open the pdf files to see what this data looks like.

Traffic engineers model the flow of traffic (vehicles per hour) as a function of traffic density (vehicles per
mile). This model dictates how traffic will flow in a given stretch of road, so it is known as the fundamental
diagram Daganzo (1997).

Flow is the number of vehicles
that pass over the detector in a 30 second period, and occupancy is the fraction of time that a vehicle is
over the detector.
We downloaded 10 months of 30 second loop detector data in 2016 from the CalTrans Performance Measurement
System (PEMS) website. We chose Caltrans district 3, the San Francisco Bay Area,
because this area contains many observations of high traffic activity and it’s large enough to motivate the
computational techniques.

We used a nonparametric method based on dynamically binning the data using the values of
the occupancy and then computing the mean flow in each bin. We started out with a fixed minimum bin
width of w = 0.01, which means that there will be no more than 1/w = 100 bins in total. We chose 0.01
because it provides sufficient resolution for the fundamental diagram in areas of low density. Furthermore,
we required that each bin has at least k observations in each bin. Some experimentation for a few different
stations showed that choosing k = 200 provided a visually smooth fundamental diagram.



Usage notes

First open the pdf files to see what this data looks like.

The following R command will load the data:

    fd_shape = read.table("fd_shape.tsv"
        , col.names = c("station", "right_end_occ", "mean_flow", "sd_flow", "number_observed")
        , colClasses = c("integer", "numeric", "numeric", "numeric", "integer")
        , na.strings = "NULL"

The columns are as follows:

  1. station: station ID from PEMS
  2. right_end_occ: right end of the occupancy bin where the means are observed. Ranges from 0 to 1
  3. sd_flow: standard deviation of vehicle flow in bin
  4. mean_flow: mean vehicle flow in bin
  5. number_observed: the number of vehicles in bin


National Science Foundation, Award: 1650042


SW 37.081476, -123.250122
NW 38.453589, -121.272583