Skip to main content
Dryad logo

Input data for short-term water level forecasting at 3 stations near HWY 37, Sonoma/Marin County, California

Citation

Munger, Sophie; Largier, John (2022), Input data for short-term water level forecasting at 3 stations near HWY 37, Sonoma/Marin County, California, Dryad, Dataset, https://doi.org/10.25338/B8WS8H

Abstract

Low-lying coastal highways are susceptible to flooding as the sea level rises.  Flooding events already impact some highways, like Highway 37 which runs across the lowlands at the northern end of San Francisco Bay and is crossed by several creeks/rivers. Short-term operational forecasts are required to enable planning for traffic disruption, evacuation, and protection of property and infrastructure. Traditional physically based numerical models have great predictive capability but require extensive datasets and are computationally expensive which limits their ability to do short-term forecasting. Here we develop a data-driven, site-specific method that can be implemented at multiple vulnerable sites throughout San Francisco Bay and other low-lying coastal areas across the State of California.  This method is based on direct observations of the water level at the site and is independent of large computer simulations.  For this study, we use a relatively simple statistical model (multiple-linear regression) combined with a forecast error correction inspired by an autoregressive moving average method (ARMA) commonly used in time-series forecasting. The model is then used to produce a 4-day water level forecast at 3 stations near HWY 37, Sonoma/Marin County, California. 

Methods

The input files for the model are grouped into three different datasets: a training dataset, a water level observations dataset, and a weather forecast dataset. All data within those files are sourced from public data servers.   

Training Dataset

Description: This dataset contains the time series of the four parameters that are used to train the model. It consists of hourly observed meteorological data such as wind, atmospheric pressure, and flow for the period of 2019-01-01 to 2022-09-27. The dataset consists of 4 fields: Ocean Wind, Local Wind, Atmospheric Pressure and River flow. The raw data was collected from publicly available sources. The data was downloaded and resampled to hourly time intervals. Small data gaps were filled by linear interpolation. The wind data was transformed from a polar coordinate system of wind speed and direction to principal component x-y vectors. The principal components were oriented so that the alongshore (y-component) is oriented at 60 degrees North for the wind at Gnoss Field and 100 degrees north for the wind at the NDBC buoy. The listed onshore wind is the shorenormal (x-component) for the 2 locations. 

Source:

Column Name

Location

Data Type, Unit

Agency Source

Web link to raw data

AtmPres

Buoy 46026

Atmospheric Pressure, mBar

NOAA NDBC

https://www.ndbc.noaa.gov/station_page.php?station=46026

Gnoss_onshorewind

Gnoss Field Airport

Shore-normal component of the wind, m/s

Sonoma County

https://sonoma.onerain.com/site/?site_id=155&site=b4e33d63-e909-4ecd-bb2b-1ee2c587bb00

napa_flow_cfs

Napa River

River flow, cfs

USGS NWIS

https://waterdata.usgs.gov/ca/nwis/uv?site_no=11458000

ocean_onshorewind

Buoy 46026

Shore-normal component of the wind, m/s

NOAA NDBC

https://www.ndbc.noaa.gov/station_page.php?station=46026

 

Water Level Datasets

This dataset consists of three individual files each with 3 fields. The stage_m field is the raw data collected from the water level gauge station, the predicted_m field is the predicted tide as calculated below and the residual_m field is the difference between the two.

Description: The raw water level data were collected from 3 stage stations for the period of 2019-01-01 to 2022-09-27 when available. 

Field stage_m: The data was downloaded, detrended by removing the mean value, and resampled to hourly time intervals. Small data gaps were filled by linear interpolation. 

Field predicted_m: The predicted tide was calculated using a publicly available Python routine based on a well-documented Matlab routine called Utide (http://www.po.gso.uri.edu/~codiga/utide/utide.htm). 

 Field residual: The residual is the stage-predicted time. It represents the variation of the water level due to non-tidal forcing.

Source:

The stage data was downloaded from the following sources: 

File  Name

Location

Data Type, Unit

Agency Source

Web link to raw data

novato_wl_1hr_up.csv

Mouth of Novato Creek

Stage, m

Marin Co

https://marin.onerain.com/site/?site_id=16808&site=a88e57c5-06b1-4855-a65c-92ef0063e6bb

rowland_wl_1hr.csv

Novato Creek at Rowland Bridge

Stage, m

Marin Co

https://marin.onerain.com/site/?site_id=16809&site=82b05ca8-3c86-49cc-9660-63ca3abd3e35

petaluma_wl_1hr.csv

Petaluma River at Horse Ranch

Stage, m

UC Davis, BML

https://coastalocean.ucdavis.edu/ocean-observing/hwy37

 

Weather Forecast Datasets

This dataset is the weather forecast for the 4 parameters used by the model. 

Description: This dataset contains forecasted meteorological data as obtained from NOAA data servers. The atmospheric pressure forecast was obtained from openweathermap, an open-source weather forecast app. 

 Source:

Column Name

Location

Data Type, Unit

Agency Source

Web link to raw data

AtmPres

Buoy 46026

Atmospheric Pressure, mBar

-

https://openweathermap.org/ 

Gnoss_onshorewind

Gnoss Field Airport

Shore-normal component of the wind, m/s

NOAA NWS

https://www.weather.gov/documentation/services-web-api

napa_flow_cfs

Napa River

River flow, cfs

NOAA AHPS

https://water.weather.gov/ahps2/hydrograph.php?gage=apcc1&wfo=mtr

ocean_onshorewind

Buoy 46026

Shore-normal component of the wind, m/s

NOAA NWS

https://www.weather.gov/documentation/services-web-api

 

Usage Notes

All data can be open with a simple text editor.

Funding

U.S. Department of Transportation

California Department of Transportation

Institute of Transportation Studies