Skip to main content
Dryad

Channel types predictions for the Sacramento River basin

Cite this dataset

Guillon, Hervé et al. (2020). Channel types predictions for the Sacramento River basin [Dataset]. Dryad. https://doi.org/10.25338/B8031W

Abstract

Hydrologic and geomorphic classifications have gained traction in response to the increasing need for basin-wide water resources management. Regardless of the selected classification scheme, an open scientific challenge is how to extend information from limited field sites to classify tens of thousands to millions of channel reaches across a basin. To address this spatial scaling challenge, we leveraged machine learning to predict reach-scale geomorphic channel types using publicly available geospatial data.

Methods

A bottom-up machine learning approach selects the most accurate and stable model among ~96,000 and derives the relationship between 147 predictors and labels corresponding to regional channel types in a three-tiered framework which: (i) define a tractable problem; assess model performance (ii) in statistical learning; and (iii) in prediction. In the present application to the Sacramento River basin (California, USA), the developed framework selects a Random Forest model to predict 10 channel types previously determined from 290 field-surveys over 108,943 200-m stream interval.

Performance in statistical learning is high with a 65% median cross-validation accuracy and a 0.91 mean multiclass Area Under Curve value. Furthermore, the predictions coherently capture the expected geomorphic organization of the landscape. As main metric of uncertainty, we include for each stream-segment the entropy calculated from the posterior probabilities output from the machine learning algorithm. For completeness, evenness and richness are also reported.

The predictions included in this dataset corresponds to an aggregated version of the output from the machine learning framework. Each initial 200-m stream interval was aggregated by using their COMIDs from the National Hydrography Dataset corresponding to the most common identifier used by stake-holders. For each resulting NHD stream line, the 200-m scale probabilities associated to each channel types are summed and normalized to sum to one.

Usage notes

  • Format: shapefile (.shp)
  • Spatial reference:
    • Projection: California Albers
    • Datum: NAD83
    • Units: meters
  • Attributes:
    • COMID: Common identifier of the NHD feature     
    • FDATE: Feature Currency Date     
    • RESOLUTION: Always "Medium"
    • GNIS_ID: Geographic Names Information System ID for the value in GNIS_Name   
    • GNIS_NAME: Feature Name from the Geographic Names Information System
    • LENGTHKM:  Feature length in kilometers
    • REACHCODE: Reach Code assigned to feature
    • FLOWDIR: Flow direction is “WithDigitized” or “Uninitialized”  
    • WBAREACOMI: ComID of an NHD polygonal water feature through which an NHD “Artificial Path” flowline flows
    • FTYPE: NHD Feature Type    
    • FCODE: Numeric codes for various feature attributes in the NHDFCode lookup table    
    • SHAPE_LENG: Feature length in decimal degrees
    • ENABLED: Always "True"   
    • GNIS_NBR: Internal field for data processing
    • group: most probable channel types    
    • shannon: Shannon's entropy calculated from the posterior probabilities  
    • richness: Richness calculated from posterior probabilities
    • evenness: Evenness calculated from posterior probabilities
    • SAC01: posterior probability for channel type SAC01 (i.e. membership)     
    • SAC02: posterior probability for channel type SAC02 (i.e. membership)     
    • SAC03: posterior probability for channel type SAC03 (i.e. membership)     
    • SAC04: posterior probability for channel type SAC04 (i.e. membership)     
    • SAC05: posterior probability for channel type SAC05 (i.e. membership)     
    • SAC06: posterior probability for channel type SAC06 (i.e. membership)     
    • SAC07: posterior probability for channel type SAC07 (i.e. membership)  
    • SAC08: posterior probability for channel type SAC08 (i.e. membership)  
    • SAC09: posterior probability for channel type SAC09 (i.e. membership)  
    • SAC10: posterior probability for channel type SAC10 (i.e. membership) 

The ten channel types are:

  • SAC01: Unconfined, boulder-bedrock, bed-undulating
  • SAC02: Confined, boulder, high-gradient, step-pool/cascade
  • SAC03: Confined, boulder-bedrock, uniform
  • SAC04: Confined, boulder-bedrock, low-gradient step-pool
  • SAC05: Confined, gravel-cobble, uniform
  • SAC06: Partly-confined, low width-to-depth, gravel-cobble, riffle-pool
  • SAC07: Partly-confined, cobble-boulder, uniform
  • SAC08: Partly-confined, high width-to-depth, gravel-cobble, riffle-pool
  • SAC09: Unconfined, low width-to-depth, gravel
  • SAC10: Unconfined, gravel-cobble, riffle-pool

 

Funding

California Environmental Protection Agency, Award: 16-062-300

United States Department of Agriculture, Award: CA‐D‐LAW‐7034‐H

United States Department of Agriculture, Award: CA‐D‐LAW‐2243‐H