Geospatial datasets for HAR-based modeling of riparian vegetation and restoration potential
Abstract
This dataset contains geospatial and tabular data used to model riparian vegetation distribution and assess reach-scale geomorphological degradation and restoration potential using height-above-river (HAR) metrics and a random forest classifier. The data include raw topographic and imagery inputs, river-mile markers, training samples, HAR surfaces, a probabilistic vegetation model output, derived elevation zones, reach-scale metrics, and supporting tables used in figures and analyses.
Description
This dataset contains geospatial and tabular data used to model riparian vegetation distribution and assess reach-scale restoration potential using height-above-river (HAR) metrics and a random forest classifier. The dataset includes HAR surfaces, probabilistic vegetation model outputs, derived elevation zones, reach-scale delineations, and summary metrics used in analyses and figures.
All data were generated as part of the workflow described in McConnell et al. (in press).
Contents ("DATA.zip")
GIS files ("GIS_data.gdb")
- DEM_2005
The digital elevation model (DEM) raster created from the 2005 LiDAR data for Lower Putah Creek. The data included here has been clipped to remove the portion around Monticello Dam due to safety concerns from the Bureau of Reclamation (this area was not relevant to the study). The original DEM is available by request from Solano County Water Agency (2005) (gpoore@scwa2.com).
Resolution: 10 ft (3.3 m): - baseflow_stream_raster_2005
The baseflow water surface raster created from a stream boundary generated from the 2005 LiDAR data for Lower Putah Creek. The data is included here but is also available by request at Solano County Water Agency (2005). - imagery_NAIP_2020
A mosaic of 2020 aerial imagery tiles from the National Agriculture Imagery Program (NAIP) (Figure 3). Source NAIP tiles, which are public domain, were downloaded from USGS EarthExplorer (2023). - HAR_raster
Height above river (HAR) surface (Figure 5) generated from the 2005 DEM and baseflow stream raster, using the riparian toolbox from Dilts (2015).
Resolution: 10 feet (3.3 meters) - study_boundary
Study boundary used to clip the HAR surface, drawn to represent the extent relevant for river management. - training_samples
Training samples drawn for land cover types on the 2020 NAIP imagery (In McConnell et al. (in press), this was used to produce Figure 3, which is a map of a site with a few polygons of each class on the NAIP imagery). - HAR_zones
Classification of elevation zones derived from relationships between HAR and vegetation (In McConnell et al. (in press), these zones are mapped at a large scale in Figure 6).
Values:
1: aquatic
2: core riparian
3: marginal riparian
4: transition
5: valley oak
6: out-of-channel - reaches
Polygon feature class representing geomorphologically-uniform river reaches (In McConnell et al. (in press), these reaches are mapped and labeled in Figure 8). - RF_generated_probabilistic_land_cover_surface
Probabilistic land cover surface generated by the random forest (RF) model using training samples and the HAR surface (In McConnell et al. (in press), the probabilistic land cover surface is mapped at a large scale in Figure 6).
Resolution: 6.6 ft (2 m)
Values:
0: barren
1: herbaceous
2: riparian forest
3: shrub
4: valley oak
5: water - river_mile_markers_Putah_Creek
River-mile markers used by the Solano County Water Agency in resource management and planning. The data is included here but is also available by request at Solano County Water Agency (2005).
Tables ("Tables")
- RF_confusion_matrix.csv
Validation data use to generate the random forest model confusion matrix (In McConnell et al. (in press), this confusion matrix is presented as a table in Figure SM2).- water: Water land-cover class.
- barren: Barren land-cover class.
- herbaceous: Herbaceous vegetation class.
- shrub: Shrub vegetation class.
- rip. forest: Riparian forest vegetation class.
- valley oak: Valley oak vegetation class.
- total: Total number of validation observations.
- user's accuracy: User's accuracy statistic for each class.
- producer's accuracy: Producer's accuracy statistic for each class.
- Kappa: Cohen's Kappa statistic.
- reach_rankings.csv
Reach-scale calculations for degradation, restoration potential, and other metrics (In McConnell et al. (in press), these data are presented in the following formatted tables: Tables 2b and SM3, SM4, SM5, SM6, and SM7).- Reach number: Unique stream reach identifier.
- length (miles): Length of stream reach (miles).
- mean baseflow width (feet): Mean width of the baseflow channel (feet).
- aquatic (%): Percent of reach classified as aquatic zone.
- core riparian (%): Percent of reach classified as core riparian zone.
- marginal riparian (%): Percent of reach classified as marginal riparian zone.
- transition (%): Percent of reach classified as transition zone.
- combined riparian (%): Combined percentage of core and marginal riparian zone.
- combined transition + aquatic (%): Combined percentage of transition and aquatic zone.
- riparian ranking: Rank based on combined riparian percentage.
- aquatic + transition ranking: Rank based on combined aquatic and transition percentage.
- sum ranking (degradation): Combined degradation ranking score.
- degree of degradation: Categorical degradation class.
- downstream river-mile: River-mile location of the downstream reach boundary.
- upstream river-mile: River-mile location of the upstream reach boundary.
- aquatic (acres): Area classified as aquatic zone (acres).
- core riparian (acres): Area classified as core riparian zone (acres).
- marginal riparian (acres): Area classified as marginal riparian zone (acres).
- transition (acres): Area classified as transition zone (acres).
- valley oak (acres): Area classified as valley oak zone (acres).
- out-of-channel (acres): Area outside the active channel and riparian corridor (acres).
- in-channel area (acres): Total area within the channel and riparian corridor (acres).
- stream centerline length (feet): Length of the stream centerline (feet).
- reduction in baseflow width required if restored (feet): Channel narrowing required to achieve the restoration target width (feet).
- terrestrial riparian area created by narrowing channel (acres): Additional riparian area created under the restoration scenario (acres).
- potential core riparian area added if restored: Estimated increase in core riparian area under the restoration scenario (acres).
- total core riparian area if restored (acres): Existing plus potential restored core riparian area (acres).
- restoration potential ranking: Rank based on restoration potential.
- ranking total core riparian area after restoration: Rank based on total core riparian area after restoration.
- sum restored potential ranking: Combined restoration ranking score.
- notes: Notes regarding reach conditions, management considerations, or restoration opportunities.
- place names: Geographic place names associated with the reach.
- reach_geo_stats.csv
Reach-scale calculations of geomorphological statistics, as calculated in McConnell (2023) (In McConnell et al. (in press), these statistics are presented in a formatted table, Table SM1).- Reach number: Unique stream reach identifier.
- river-mile marker at downstream end: River-mile location of the downstream reach boundary.
- river-mile marker at upstream end: River-mile location of the upstream reach boundary.
- elevation above mean sea level at upstream end: Elevation at the upstream end of the reach (feet above mean sea level).
- elevation above mean sea level at downstream end: Elevation at the downstream end of the reach (feet above mean sea level).
- elevation difference between upstream end of thalweg and downstream end of thalweg: Elevation loss along the reach thalweg (feet).
- length of stream centerline: Length of the stream centerline within the reach (feet).
- percent slope of channel: Average channel slope (%).
- mean width of baseflow: Mean width of the baseflow channel (feet).
- mean bank slope in degrees slope: Mean bank slope (degrees).
- standard deviation of bank slope in degrees slope: Standard deviation of bank slope (degrees).
- mean bank slope in percent slope: Mean bank slope (%).
- standard deviation of bank slope in percent slope: Standard deviation of bank slope (%).
- land_cover_HAR_histogram.csv
Table containing sums of land cover in each zone predicted by the probabilistic RF model. Land cover distribution is calculated across the HAR surface, where HAR is binned in one-foot increments (In McConnell et al. (in press), these zonal statistics are plotted against HAR in Figure 7).- OBJECTID: Unique identifier for the HAR zone.
- height above river (feet): Height-above-river zone (feet).
- height above river (meters): Height-above-river zone (meters).
- barren (m2): Area classified as barren land (square meters).
- herbaceous (m2): Area classified as herbaceous vegetation (square meters).
- shrub (m2): Area classified as shrub vegetation (square meters).
- riparian_forest (m2): Area classified as riparian forest vegetation (square meters).
- valley_oak (m2): Area classified as valley oak vegetation (square meters).
- barren (acres): Area classified as barren land (acres).
- herbaceous (acres): Area classified as herbaceous vegetation (acres).
- shrub (acres): Area classified as shrub vegetation (acres).
- riparian forest (acres): Area classified as riparian forest vegetation (acres).
- valley oak (acres): Area classified as valley oak vegetation (acres).
- total area (acres): Total area within the HAR zone, including barren land.
- total relative vegetated area (acres): Total vegetated area within the HAR zone, excluding barren land.
- herbaceous (percent relative cover): Relative cover of herbaceous vegetation calculated from vegetated area only.
- shrub (percent relative cover): Relative cover of shrub vegetation calculated from vegetated area only.
- riparian forest (percent relative cover): Relative cover of riparian forest vegetation calculated from vegetated area only.
- valley oak (percent relative cover): Relative cover of valley oak vegetation calculated from vegetated area only.
- RF_convergence_plot.csv
Table containing mean square error (MSE) for each number of trees tested to determine stability for the random forest model (In McConnell et al. (in press), these data are plotted in Figure 4).- number of trees: Number of trees included in the Random Forest model.
- overall mean square error (MSE): Overall model mean square error.
Spatial Reference
- Coordinate system: NAD 1983 StatePlane California II FIPS 0402 (US Feet)
- Units: US Feet and US Acres (converted to meters and hectares in McConnell et al. (in press))
Methods
Data were generated using LiDAR-derived elevation models and aerial imagery to calculate height above river (HAR). A random forest classifier was trained to model probabilistic relationships between HAR and vegetation distribution. These relationships were used to define elevation-based zones and to assess geomorphic condition and restoration potential across delineated river reaches.
Full methodological details are provided in McConnell et al. (in press).
File Formats
- Spatial data are provided as feature classes and/or GRID rasters in Esri file geodatabases (.gdb).
- Tabular data are provided as comma-delimited CSVs.
Notes
- All spatial data were processed using ArcGIS Pro version 3.2.0, including the Spatial Analyst Extension. No codes or scripts were used in this program, only existing geoprocessing tools as described in McConnell et al. (in press).
- All tables and plots were produced in MS Excel version 2016.
- All files are named in a manner consistent with the data described in the manuscript.
Usage
These data can be used to:
- reproduce analyses presented in McConnell et al. (in press)
- explore relationships between elevation and vegetation distribution
- evaluate reach-scale geomorphic condition and restoration potential
Citation
In addition to the DOI for this dataset, please cite the associated manuscript:
McConnell, C.R., Thorne, J.H., Greco, S.E., in press. Using height-above-river metrics and machine learning to model riparian vegetation distribution and reach-scale restoration potential. Ecological Informatics.
References
References apply only to raw inputs cited in this dataset. A complete set of references for the methods can be found in McConnell et al. (in press).
- McConnell, C.R., 2023. Modeling Riparian Geomorphology and Vegetation on a Regulated River to Assess Change, Prioritize Ecological Restoration Areas, and Inform Restoration Design (Ph.D.). University of California, Davis, United States -- California.
- Dilts, T.E., 2015. Topography Tools for ArcGIS 10.3 and earlier - Overview [WWW Document]. URL https://www.arcgis.com/home/item.html?id=b13b3b40fa3c43d4a23a1a09c5fe96b9 (accessed 7.19.23).
- ESRI, 2023. ArcGIS Pro.
- Microsoft Corporation, 2016. Excel.
- Solano County Water Agency, 2005. Putah Creek LiDAR flight 2005.
- USDA, 2023. National Agriculture Imagery Program [WWW Document]. URL https://naip-usdaonline.hub.arcgis.com/ (accessed 11.12.23).
- USGS, 2023. EarthExplorer [WWW Document]. URL https://earthexplorer.usgs.gov/ (accessed 11.12.23).
