Data for: Channel mobility and floodplain reworking across river planform morphologies
Data files
Apr 16, 2024 version files 469.06 MB
-
Bermejo.zip
-
ChannelBelt.zip
-
FigureCodes.zip
-
README.md
-
SingleRivers.zip
Abstract
Source-to-sink transfer of sediment and organic carbon (OC) is regulated by river mobility. Quantifying trends in river mobility is, however, challenging due to diverse planform morphologies (e.g., meandering, braided) and measurement methods. Here, we utilize a state-of-the-art remote-sensing method applicable to all planform morphologies to quantify the mobility timescales of 80 rivers worldwide. Results show that, across the continuum from meandering to braided rivers, there is a systematic reduction in timescale of channel mobility and—to a lesser extent—the timescale of floodplain reworking. This leads to an overall decrease in the efficiency at which braided river channels rework old floodplain material compared to their meandering counterparts. Reduced reworking efficiency of braided channels stems from their relatively smaller channel-belt areas relative to their channel area. Results suggest that river-mobility timescales can help us characterize sediment and OC storage and transit times from remote sensing.
README: General information:
This README file was generated on 2024-04-15
Dataset title
Data for "Planform Morphology Control on River Mobility and Floodplain Reworking"
Overview
This supplementary dataset includes all underlying data included in the associated manuscript. The bulk of the included files are the binary channel masks to derive mobility measurements. We include all the necessary files to generate your own raster data, and reproduce the analyses used in the article. We also include all the tabular mobility calutations for the 80 rivers.
Data Formats
- Tabular Files::
- .csv: Broadly used to record mobility calculations at the individual river level as well as aggregated data across all rivers. Naming conventions are descriptive, and include the river name and what the data the file contains.
- Text Files:
- .txt: Used to describe directory-sepcific metadata--primarily column descriptions for tabular .csv files.
- Raster Files:
- .tif: GeoTIFF files. This is a raster format that represents n-dim array-like data with an associated Geotransform that fixes the array in spatial coordinates. There are three types of GeoTiffs included in this repsository:
- Landsat-derived Collection 1 multi-spectral reflectance. Refer to: https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C01_T1_ANNUAL_RAW for reference. These are 8-band images with the following band mapping:
- 1 - Blue1
- 2 - Blue2
- 3 - Green
- 4 - Red
- 5 - Swir1
- 6 - NIR
- 7 - Band Quality
- 8 - Swir2
- Binary channel masks. These are single band GeoTIFFs with a 1 where there is water presence and a 0 where there is no water.
- Digital Elevation Models (DEMs). These are single band GeoTIFFs with topographic information saved at each pixel.
- Landsat-derived Collection 1 multi-spectral reflectance. Refer to: https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C01_T1_ANNUAL_RAW for reference. These are 8-band images with the following band mapping:
- .tif: GeoTIFF files. This is a raster format that represents n-dim array-like data with an associated Geotransform that fixes the array in spatial coordinates. There are three types of GeoTiffs included in this repsository:
- Vector Files:
- .gpkg: Geopackage files. This file format is used to represent geocoded information. At its most basic, this is a dataframe with a spatial geometry associated with each row. In other words, this is a frame of polygons, lines, or points with geographic vertexes. You can then tie metadata to each vector shape. We use this format to track the study areas for each of the analyzed rivers.
- Script Files:
- .py: Python files. These are data-specific Python scripts that did not fit cleanly with the other Github repositories. Examples include scripts to generate figures.
- Animation Files:
- .gif: Looped Gif files. We include these for each of the 80 rivers. We used them to visually spot-check what the river dynamics are doing.
Recommended Software for Data Analysis:
Scripting
I use a Python-based geospatial stack. To work with the Python files themselves, you can use any text/code editor (e.g. Visual Studio Code, Atom, Vim) or scripting environment (e.g. PyCharm, Spyder, Visual Studio). I developed the files in Vim side-by-side with an IPython terminal. The codes designs reflect this, and are not run in the command line, but rather in a terminal.
Some import packages used for the analysis are:
- Rasterio
- Shapely
- Geopandas
- Pyproj
- Scipy
- Pandas
- Numpy
Geogrpahic Data (GIS)
Raster, vector and tabular data can all be visualized using GIS software. I use QGIS, an open-source GIS program, but ArcGIS can be equally used.
All .tif, .gpkg, and .csv files are spatially located. GeoTIFF files include a EPSG (coordinate system). CSV and .gpkg files are either in geographic coordinates, i.e. Latitude and Longitude (EPSG: 4326 WGS 1984), or in projected coordinates (UTM), which match the .tif files for the associated river.
Content:
Overall description of the data
We have separated the data into four .zip files, which each contain a distinct portion of the research data. The river data is separated by river planform where:
B - Braided Rivers\
LSW - Low-Sinuosity Wandering Rivers\
HSW - High-Sinuosity Wandering Rivers\
Me - Meandering Rivers\
A brief overview of the .zip files. More detailed descriptions can be found below:
- Bermejo: All files relevant to the downstream analysis of the Rio Bermejo. This includes:
- Aggregated mobility data
- Yearly GeoTIFFS
- .gpkg files of reach region
- .py scripts to pull the downstream mobility
- ChannelBelt: All files relevant to measuring the channel-belt areas of the 80 rivers included in the study. Data is broken out by planform. This includes:
- .gpkg files with the mapped channel-belt extents
- .csv files with the measured data from the .gpkg files
- .py sript to generate data from the .gpkg files
- FigureCodes:: Generalized scripts to generate figures
- SingleRivers: All files relevant to measuring the mobility of the 80 rivers included in the study. This includes:
- Mobility data for each river
- Aggregated mobility data for all rivers
- Yearly binary GeoTIFFS of channel masks
- .gpkg files of reach regions
- .py script to aggregate the data
Detailed description of the data
├── Bermejo:
This includes all of the files and codes necessary to do the downstream analysis of the Rio Bermejo mobility.
Overview directory structure:
├── Bermejo\
├── BermejoChannelBelt\ \
├── Masks\ \
├── MeasureMobility\ \
├── shapes\ \
├── TabularData\ \
Details:
├── BermejoChannelBelt
├── channel_analysis.py
- channel_analysis.py: A Python script that uses the Centerlines, Channel-belt boundaries, and channel counts to record measurements of centerline length and channel-belt areas.
├── Measurements\ \
├── Centerlines\ \
├── ChannelBeltBoundaries\ \
├── ChannelCountSampleLocations\ \
- *Centerlines\*: The directory includes 4 .gpkg files (one for each of the four reaches) with LineString vector representations of the river centerlines. These are generated from the binary channel masks.
- *ChannelBeltBoundaries\*: The directory includes 4 .gpkg files (one for each of the four reaches) with LineString vector representations of the mapped boundaries of the channel-belts centerlines. These are manually mapped from the images and the DEM data.
- *ChannelCountSampleLocations\*: The directory includes 4 .gpkg files (one for each of the four reaches) with Point vector representations of the sample locations for the channel count measurements. At each point, we cound the number of laterally active channel threads.
├── RasterData\ \
├── DEMs\ \
├── Images\ \
├── Masks\ \
- *DEMs\*: The directory includes 4 GeoTiff (.tif) files (one for each of the four reaches) with Digital Elevation Model (DEM) data covering the respective reach.
- *Images\*: The directory includes 4 GeoTiff (.tif) files (one for each of the four reaches) with multi-spectral Landsat-derived reflectance data. Each image is generated from an annual composite of the year 2000 (to align with the collection of NASADEM data)
- *Masks\*: The directory includes 4 .GeoTiff (.tif) files (one for each of the four reaches) with a single binary channel mask generated for the year 2000.
├── Masks
├── Reach1\ \
├── mask\ \
├── Reach2\ \
├── mask\ \
├── Reach3\ \
├── mask\ \
├── Reach4\ \
├── mask\ \
- mask\: Each mask directory contains 37 GeoTiff files that contain annual binary channel masks for the respective reachs. For example, Reach1\mask\ contains masks of the Reach 1 river position every year from 1985 to 2021.
├── MeasureMobility
├── reach1_downstream_analysis.py\
├── reach2_downstream_analysis.py\
├── reach3_downstream_analysis.py\
├── reach4_downstream_analysis.py
- These are all Python scripts that calculate the mobility of the Bermejo reach at downstream intervals. The scripts use the data in the Masks\ directory. The four files are largely identical, albeit with different input parameters.
├── shapes
├── Reach1.gpkg\
├── Reach2.gpkg\
├── Reach3.gpkg\
├── Reach4.gpkg
- These are the Polygon vectors for each of the Reach study areas. For example, Reach1.gpkg includes the Polygon covering the area included in the mobility analysis for Reach 1.
├── TabularData
├── Bermejo_ChannelBelt_data.csv\
├── BermejoCombinedReach.csv\
├── BermejoTransitTime.csv\
├── ChannelCounts.csv\
├── Column_Descriptions.txt
- Bermejo_ChannelBelt_data.csv\: Aggregated channel-belt data for each of the Bermejo reaches. Reach-averages for the mobility variables are also included. This is created by the "channel_analysis.py" script.
- BermejoCombinedReach.csv\: Aggregated mobility data for each of the Bermejo reaches. Lat-Lon and UTM coordinates are provided for each data point.
- BermejoTransitTime.csv\: Geochemically-derived transit times for each of the four Bermejo reaches. This data is sourced from Repasch et al. (2020).
- ChannelCounts.csv\: Mannual counts of the laterally active river threads for each of the Bermejo reaches. The sample numbers correspond to the point ID in the ChannelCount_Sample.gpkg files.
- Column_Descriptions.txt\: Descriptions of all the tabular data columns in each of the four files.
├── ChannelBelt:
This includes all of the .gpkg files and resulting .csv files to measure the channel-belt areas for the rivers.
Overview directory structure:
├── ChannelBelt\
├── get_channel_belt_width.py\
├── Shapes\
├── CSVs\ \
Details:
├── get_channel_belt_width.py
- get_channel_belt_width.py: A Python script that uses the .gpkg files in the Shapes directory to produce measurements of the channel-belt areas.
├── Shapes
├── B_Shapes\\
├── HSW_Shapes\\
├── LSW_Shapes\\
├── Me_Shapes\
- B_Shapes/: The directory contains 23 .gpkg files that contain two LineString vectors for the opposing mapped channel-belt boundaries. These are generated by hand. The rivers within this directory have a Braided (B) planform.
- HSW_Shapes/: The directory contains 9 .gpkg files that contain two LineString vectors for the opposing mapped channel-belt boundaries. The rivers within this directory have a High-sinuosity wandering (HSW) planform.
- LSW_Shapes/: The directory contains 23 .gpkg files that contain two LineString vectors for the opposing mapped channel-belt boundaries. The rivers within this directory have a Low-sinuosity wandering (LSW) planform.
- Me_Shapes/: The directory contains 22 .gpkg files that contain two LineString vectors for the opposing mapped channel-belt boundaries. The rivers within this directory have a Meandering (Me) planform.
├── CSVs
├── B_CSV\\
├── HSW_CSV\\
├── LSW_CSV\\
├── Me_CSV\\
├── csv_column_desc.txt
- B_CSV/: The directory contains 21 .csv files that contain the channel-belt measurements for each Braided (B) river.
- HSW_CSV/: The directory contains 20 .csv files that contain the channel-belt measurements for each High-sinuosity wandering (HSW) river.
- LSW_CSV/: The directory contains 21 .csv files that contain the channel-belt measurements for each Low-sinuosity wandering (LSW) river.
- Me_CSV/: The directory contains 21 .csv files that contain the channel-belt measurements for each Meandering (Me) river.
├── FigureCodes:
This includes 3 Python scripts that can be used to generate manuscript figures.
Overview directory structure:
├── FigureCodes\
├── bermejo_figure.py\
├── figure.py\
├── supplement.py\
- bermejo_figure.py: The script takes the Bermejo data from the Bermejo archive and generates plots between TR (mobility) and sediment transit time.
- figure.py: The script takes the aggregated mobility data for the individual rivers and makes plots between channel planform, mobility measurements, and channel-belt size.
- supplement.py: The script takes the aggregated mobility data for the individual rivers and makes a plot that is used in the supplement (between slope and mobility).
├── SingleRivers:
This includes all of the yearly binary masks, .gif animation of river movement and measured mobility for the rivers included in the manuscript. We also include the aggregated data.
Overview directory structure:
├── SingleRivers\
├── FullData\\
├── RiverData\
Details:
├── FullData\\
├── data_aggregate.py\
├── FullData_25_113023.csv\
├── FullData_50_113023.csv\
├── FullData_75_113023.csv\
├── FullData_Column_Desc.txt
- data_aggregate.py: The script takes the mobility fits from each of the individual river folders (RiverData) and aggregates it into three dataframes for the 25th, 50th, and 75th quantile estimates of river mobility.
- FullData_25_113023.csv: Aggregated measurements for the 25th percentile mobility. This means that the mobility estimate is fit to the 25th percentile measuremetns of A_R,i and A_M,i respectively.
- FullData_50_113023.csv: Aggregated measurements for the 50th percentile mobility. This means that the mobility estimate is fit to the 50th percentile measuremetns of A_R,i and A_M,i respectively.
- FullData_75_113023.csv: Aggregated measurements for the 75th percentile mobility. This means that the mobility estimate is fit to the 75th percentile measuremetns of A_R,i and A_M,i respectively.
- FullData_Column_Desc.txt: Column descriptions for the three .csv files included in this directory.
├── RiverData\\
├── Column_Desc.txt\
├── B\\
├── HSW\\
├── LSW\\
├── Me\\
- Column_Desc.txt: File containing column descriptions for the three types of .csv files included in this directory (_mobility_metrics.csv, _pixel_values.csv, _yearly_mobility.csv)
- B\: Directory containing all mobility data for the 20 Braided (B) reaches included in the study.
- HSW\: Directory containing all mobility data for the 20 High-Sinuosity Wandering (HSW) reaches included in the study.
- LSW\: Directory containing all mobility data for the 21 Low-Sinuosity Wandering (LSW) reaches included in the study.
- Me\: Directory containing all mobility data for the 21 Meandering (Me)) reaches included in the study.
Each river directory contains 6 types of files. Below are generic descriptions of what each of these files types are:
- * cumulative.gif: These are .gif animations of the history of river movement. Paired with this are the mobility plots (growth in A_R,i and decay in A_M,i)
- * mobility_metrics.csv: Derived mobility metrcis (e.g. M, T_M, R). These are the 25th, 50th and 75th percentile measurements made from the parameters of the exponential fits to A_R,i and A_M,i.
- * pixel_values.csv: Fit mobility parameters (e.g. PR, CR, PM, CM). These are the 25th, 50th, 75th percentile measurements made from fitting the exponential curve to A_R,i and A_M,i.
- * yearly_mobility.csv: These contain the database of A_R,i and A_M,i measurements at each yearly timestep.
- * .gpkg: These are the Polygon vectors that define the study area of each reach. Masks are downloaded and clipped to these areas.
- *mask\*: These directories contain 37 binary channel masks that track the yearly river water presence. For each river, there are corresponding masks for every year between 1985 and 2021.
Description of variables
Here is a general index of variables used throughout the .csv files in the data repsoitory. We provide the variable names as shown in the files in bold. We then provide the corresponding manuscript variables in parentheses. This information is also included within column description files found in .csv directories.
General:
- i: timestep from the baseline year. e.g. a baseline year of 1985 and the measurement is for 1991, i = 6. Also called "image timestep"
- year: Year of the measurement
- dt: Datetime object for the mask year
- range: The baseline-set the measurement is a part of.
Direct measurements of image mobility areas:
- w_b (Aw, Aw): Wetted channel area, km2. Overbar indicates ensemble average.
- d_b (Ad)*: The "dry" area in the year mask.
- fR (AR,i): Reworked floodplain area, km2. Overbar indicates ensemble average.
- fR_wick (AR, wickert): The simialr reworked floodplain area from Wickert et al. (2013).
- O_wd (AM, w->d): The wet -> dry overlapping channel area.
- O_dw (AM, d->w): The dry -> wet overlapping channel area.
- O_avg (AM): The average overlapping channel area.
- O_wick (AM, wickert): The overlapping channel area from Wickert et al. (2013).
Derived metrics of mobility:
- Quantile: Percentile of the measurement.
- CM (CM): Exponential overlap decay decay rate (1/years).
- PM (PM): Long-term channel overlap (square meters).
- CMwd (CM,w->d): Wet -> Dry exponential overlap decay decay rate (1/years).
- PMwd (PM,w->d): Wet -> Dry long-term channel overlap (square meters).
- CMdw (CM,d->w): Dry -> Wet exponential overlap decay decay rate (1/years).
- PMdw (PM,d->w): Dry -> Wet long-term channel overlap (square meters)
- CR (CR): Exponential floodplain reworking growth rate (1/years)
- PR (PR): Active floodplain area (square meters).
- M: Overlap decay rate (1/years).
- T_M (TM): Overlap decay timescale (years).
- Mwd (Mw->d): Wet -> dry overlap decay rate.
- T_Mwd (TM,w->d): Wet -> dry overlap decay timescale.
- Mdw (Md->w): Dry -> wet overlap decay rate.
- T_Mdw (TM,d->w): Dry -> wet overlap decay timescale.
- R: Floodplain reworking rate (1/years).
- T_R (TR): Floodplain reworking timescale (years).
- TM/TR (TM/TR): Floodplain reworking efficiency (-)
Measurements of channel-belt properties:
- cb_area_m2 (ACB): Channel-belt area, km2.
- Width (m): Reported thread width (meters)
- Catchment Area (km2): Reported carchment area (square kilometers).
- Degree of Anabranching (-): Galleazzi et al. (2020) degree of anabranching.
- Channel Count Index (-): Reported channel count.
- Sinuosity (-): Reported sinuosity-the ratio of along-stream distance to straight-line distance.
- Channel_form_index: Calcualted channel_form index. Sinuosity/Channel count.
- T_CB (TCB): Channel-belt turnover timescale, yr.
- mean_width_m: Measured mean channel-belt width (meters).
- median_width_m: Measured median channel-belt width (meters).
- std_width_m: Standard deviation channel-belt width (meters).
- sem_width_m: Standard error channel-belt width (meters).
- cb_area_m2 (ACB): Channel-belt area (square meters).
- cb_length_m: Channel-belt length (meters).
- area_width_m: Channel-belt width calculated as cb_area_m2 / cb_length_m
- A_CB/(A_w) (CB/Aw, ACB/Aw): Normalized channel-belt area, km2/ km2.
Sharing/Access information
We leverage Google Earth Engine (GEE) Landast data for the natural data links to the relevant datasets are: \
[https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C02_T1_L2\](https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C02_T1_L2\)
[https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LE07_C02_T1_L2\](https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LE07_C02_T1_L2\)
[https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LT05_C02_T1_L2\](https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LT05_C02_T1_L2\)
We use a Python package to download and generate all channel masks:
https://github.com/evan-greenbrg/GEE_watermasks
We use a Python package to measure the mobility:
https://github.com/evan-greenbrg/CalculateMobility
Methods
This dataset was collected using a novel framework to quantify river mobility from remotely sensed data. We use Google Earth Engine hosted Landsat 5, 7, and 8 multi-spectral images to generate channel mask time series for 80 rivers across the globe.