Input data for short-term water level forecasting at 3 stations near HWY 37, Sonoma/Marin County, California
Cite this dataset
Munger, Sophie; Largier, John (2022). Input data for short-term water level forecasting at 3 stations near HWY 37, Sonoma/Marin County, California [Dataset]. Dryad. https://doi.org/10.25338/B8WS8H
Abstract
Low-lying coastal highways are susceptible to flooding as the sea level rises. Flooding events already impact some highways, like Highway 37 which runs across the lowlands at the northern end of San Francisco Bay and is crossed by several creeks/rivers. Short-term operational forecasts are required to enable planning for traffic disruption, evacuation, and protection of property and infrastructure. Traditional physically based numerical models have great predictive capability but require extensive datasets and are computationally expensive which limits their ability to do short-term forecasting. Here we develop a data-driven, site-specific method that can be implemented at multiple vulnerable sites throughout San Francisco Bay and other low-lying coastal areas across the State of California. This method is based on direct observations of the water level at the site and is independent of large computer simulations. For this study, we use a relatively simple statistical model (multiple-linear regression) combined with a forecast error correction inspired by an autoregressive moving average method (ARMA) commonly used in time-series forecasting. The model is then used to produce a 4-day water level forecast at 3 stations near HWY 37, Sonoma/Marin County, California.
Methods
The input files for the model are grouped into three different datasets: a training dataset, a water level observations dataset, and a weather forecast dataset. All data within those files are sourced from public data servers.
Training Dataset
Description: This dataset contains the time series of the four parameters that are used to train the model. It consists of hourly observed meteorological data such as wind, atmospheric pressure, and flow for the period of 2019-01-01 to 2022-09-27. The dataset consists of 4 fields: Ocean Wind, Local Wind, Atmospheric Pressure and River flow. The raw data was collected from publicly available sources. The data was downloaded and resampled to hourly time intervals. Small data gaps were filled by linear interpolation. The wind data was transformed from a polar coordinate system of wind speed and direction to principal component x-y vectors. The principal components were oriented so that the alongshore (y-component) is oriented at 60 degrees North for the wind at Gnoss Field and 100 degrees north for the wind at the NDBC buoy. The listed onshore wind is the shorenormal (x-component) for the 2 locations.
Source:
Column Name |
Location |
Data Type, Unit |
Agency Source |
Web link to raw data |
AtmPres |
Buoy 46026 |
Atmospheric Pressure, mBar |
NOAA NDBC |
|
Gnoss_onshorewind |
Gnoss Field Airport |
Shore-normal component of the wind, m/s |
Sonoma County |
https://sonoma.onerain.com/site/?site_id=155&site=b4e33d63-e909-4ecd-bb2b-1ee2c587bb00 |
napa_flow_cfs |
Napa River |
River flow, cfs |
USGS NWIS |
|
ocean_onshorewind |
Buoy 46026 |
Shore-normal component of the wind, m/s |
NOAA NDBC |
Water Level Datasets
This dataset consists of three individual files each with 3 fields. The stage_m field is the raw data collected from the water level gauge station, the predicted_m field is the predicted tide as calculated below and the residual_m field is the difference between the two.
Description: The raw water level data were collected from 3 stage stations for the period of 2019-01-01 to 2022-09-27 when available.
Field stage_m: The data was downloaded, detrended by removing the mean value, and resampled to hourly time intervals. Small data gaps were filled by linear interpolation.
Field predicted_m: The predicted tide was calculated using a publicly available Python routine based on a well-documented Matlab routine called Utide (http://www.po.gso.uri.edu/~codiga/utide/utide.htm).
Field residual: The residual is the stage-predicted time. It represents the variation of the water level due to non-tidal forcing.
Source:
The stage data was downloaded from the following sources:
File Name |
Location |
Data Type, Unit |
Agency Source |
Web link to raw data |
novato_wl_1hr_up.csv |
Mouth of Novato Creek |
Stage, m |
Marin Co |
https://marin.onerain.com/site/?site_id=16808&site=a88e57c5-06b1-4855-a65c-92ef0063e6bb |
rowland_wl_1hr.csv |
Novato Creek at Rowland Bridge |
Stage, m |
Marin Co |
https://marin.onerain.com/site/?site_id=16809&site=82b05ca8-3c86-49cc-9660-63ca3abd3e35 |
petaluma_wl_1hr.csv |
Petaluma River at Horse Ranch |
Stage, m |
UC Davis, BML |
Weather Forecast Datasets
This dataset is the weather forecast for the 4 parameters used by the model.
Description: This dataset contains forecasted meteorological data as obtained from NOAA data servers. The atmospheric pressure forecast was obtained from openweathermap, an open-source weather forecast app.
Source:
Column Name |
Location |
Data Type, Unit |
Agency Source |
Web link to raw data |
AtmPres |
Buoy 46026 |
Atmospheric Pressure, mBar |
- |
|
Gnoss_onshorewind |
Gnoss Field Airport |
Shore-normal component of the wind, m/s |
NOAA NWS |
|
napa_flow_cfs |
Napa River |
River flow, cfs |
NOAA AHPS |
https://water.weather.gov/ahps2/hydrograph.php?gage=apcc1&wfo=mtr |
ocean_onshorewind |
Buoy 46026 |
Shore-normal component of the wind, m/s |
NOAA NWS |
Usage notes
All data can be open with a simple text editor.
Funding
United States Department of Transportation
California Department of Transportation
Institute of Transportation Studies