Data from: Marsh width and elevation govern lateral dynamics in coastal wetlands
Data files
May 26, 2026 version files 129.07 MB
-
1938AerialImageryTIFFs.zip
17.13 MB
-
ColiformEnv.tif
90.48 MB
-
ErosionTransectsData.xlsx
188.86 KB
-
FinalForestEdge1938.zip
1.33 MB
-
FinalForestEdge2022.zip
1.59 MB
-
FinalShoreline1938.zip
940.08 KB
-
FinalShoreline2022.zip
1.14 MB
-
ForestBaselineDSAS.zip
73.38 KB
-
GeorefMisalignPoints.zip
22.60 KB
-
HistoricalEdge25mBuffer.zip
1.03 MB
-
HistoricalEdgeError.zip
22.33 KB
-
HistoricalShoreline25mBuffer.zip
819.86 KB
-
HistoricalShorelineError.zip
21.12 KB
-
MigrationTransectsData.xls
269.31 KB
-
PresentEdge25mBuffer.zip
948.46 KB
-
PresentEdgeError.zip
22.37 KB
-
PresentShoreline25mBuffer.zip
909.07 KB
-
PresentShorelineError.zip
21.10 KB
-
RandomErosionTransects.zip
38.92 KB
-
RandomMigrationTransects.shp.zip
39.76 KB
-
README.md
20.83 KB
-
SalinityEnv.tif
789.56 KB
-
ShorelineBaselineDSAS.zip
126.11 KB
-
ShorelineMarshWidth.zip
4.48 MB
-
SoilDrainageEnv.zip
3.99 MB
-
TidalRangeEnv.tif
2.62 MB
Abstract
The persistence of coastal wetlands is controlled by coupled interactions among vegetation, geomorphology, and hydrodynamics that regulate elevation gain and spatial extent under sea level rise (SLR). While vertical processes such as sediment accretion and marsh elevation gains have been widely studied, less is known about controls on lateral dynamics, including edge erosion and marsh upslope migration, which together determine long-term marsh persistence. Understanding how landscape structure mediates these processes is critical for predicting responses to environmental change. We quantified salt marsh lateral erosion and upslope migration over 80+ years across 75 km of Long Island’s south shore and evaluated eight potential environmental drivers: tidal range, marsh width, elevation, slope, topographic position index, salinity, soil type, and fecal coliform concentration, a proxy for wastewater input. We found that marshes were migrating landward at rates of 0.44 ± 0.014 m yr⁻¹ and eroding at rates of 0.20 ± 0.007 m yr⁻¹. Rates of marsh migration and erosion were strong functions of environmental conditions (out-of-bag R2= 0.71 for migration; 0.72 for erosion), with particularly strong relationships observed for marsh width and elevation. Wider marshes and those at higher elevations showed slower migration rates, suggesting that these landscape configurations buffer coastlines not only from storm surges, but also from the forest dieback that facilitates wetland expansion. Higher-elevation marshes experienced lower erosion rates, consistent with the role of elevation capital in enhancing stability. Slope was not a significant predictor of marsh migration despite its intuitive importance and prominence in numerical models.
Description of the data and file structure
This dataset includes 1938 aerial photographs and the forest edges, shorelines, and baselines used in the Digital Shoreline Analysis System (DSAS) to produce transects and measure erosion and migration rates. This also includes spatial data of environmental variables including slope, topographic position index (TPI), marsh width, soil drainability, fecal coliform concentrations, salinity, and tidal range. Datasets connecting each of the erosion and migration transects to each of the environmental factors in this study are included. The random points and transects used in the error analysis are also included. These were used to quantify uncertainty in boundary position and error arising from georeferencing misalignment. All shapefiles and TIFs are in the NAD 1983 UTM Zone 18N projected coordinate system. All .zip files include CPG, DBF, PRJ, SBN, SBX, SHP, XML, and SHX files.
Files and variables
File: 1938AerialImageryTIFFs.zip
Description: Historical photos (June & July, 1938) of the Long Island South Shore which were collated and georeferenced in ArcGIS Pro (version 3.5.3, ESRI, Redlands, CA, USA) using a first-order polynomial transformation, with a minimum of six control points per image tile located using persistent features such as road intersections and historic structures.
File: FinalForestEdge1938.zip
Description: Marsh-forest edge shapefiles created using a semi-automated digitization of habitat boundaries utilizing historical (1938) aerial photography. Boundaries were hand-digitized using visual inspection at a 1:2000 scale with dense tree stands being used to define forests. After converting image-objects to polylines, we selected those within 20 m of the hand-digitized marsh–forest edge, generated a 10 meter buffer around the selected segments to derive a midpoint centerline, and then connected these centerlines to produce a continuous forest-edge line.
Variables
FID: Feature ID (unique identifier for each section of marsh-forest edge) Shape: Each of these forest edges are polylines Shape_Length: Length of each polyline in meters Date: Year of the marsh-forest edge (1938)
File: FinalForestEdge2022.zip
Description: Marsh-forest edge shapefiles created using a semi-automated digitization of habitat boundaries utilizing contemporary (2022) NAIP aerial photography. Boundaries were hand-digitized using visual inspection at a 1:2000 scale with dense tree stands being used to define forests. After converting image-objects to polylines, we selected those within 20 m of the hand-digitized marsh–forest edge, generated a 10 m buffer around the selected segments to derive a midpoint centerline, and then connected these centerlines to produce a continuous forest-edge line.
Variables
FID: Feature ID (unique identifier for each section of marsh-forest edge) Shape: Each of these forest edges are polylines Shape_Length: Length of each polyline in meters Date: Year of the marsh-forest edge (2022)
File: FinalShoreline1938.zip
Description: Marsh-shore boundary shapefiles created using a semi-automated digitization of habitat boundaries utilizing historical (1938) aerial photography. Boundaries were hand-digitized using visual inspection at a 1:2000 scale with marshes being defined by difference in color (lighter gray in the 1938 imagery) and with darker bodies of water and whiter sandy shorelines being easily distinguishable. After converting image-objects to polylines, we selected those within 20 m of the hand-digitized marsh–forest edge, generated a 10 m buffer around the selected segments to derive a midpoint centerline, and then connected these centerlines to produce a continuous marsh-shore line.
Variables
OBJECTID: Object ID (unique identifier for each section of marsh-shore boundary) Shape: Each of these marsh-shore boundaries are polylines Shape_Length: Length of each polyline in meters Date: Year of the marsh-forest edge (1938)
File: FinalShoreline2022.zip
Description: Marsh-shore boundary shapefiles created using a semi-automated digitization of habitat boundaries utilizing contemporary (2022) aerial photography. Boundaries were hand-digitized using visual inspection at a 1:2000 scale with marshes being defined by difference in color (green/brown area in 2022 imagery) and with darker bodies of water and whiter sandy shorelines being easily distinguishable. After converting image-objects to polylines, we selected those within 20 m of the hand-digitized marsh–forest edge, generated a 10 m buffer around the selected segments to derive a midpoint centerline, and then connected these centerlines to produce a continuous marsh-shore line.
Variables
OBJECTID: Object ID (unique identifier for each section of marsh-shore boundary) Shape: Each of these marsh-shore boundaries are polylines Shape_Length: Length of each polyline in meters Date: Year of the marsh-forest edge (2022)
File: ForestBaselineDSAS.zip
Description: For both 1938 and 2022, a 100-meter buffer was created around the forest–marsh edge. The outer edge of each buffer, on the water-facing side, was then traced to define the onshore DSAS baseline for the respective year.
Variables
OBJECTID: Object ID (one continuous baseline for marsh-forest edge) Shape: One continuous polyline Shape_Length: Length of the polyline in meters ID: Additional identifier
File: ShorelineBaselineDSAS.zip
Description: For both 1938 and 2022, a 100-meter buffer was created around the marsh-shore boundary. The outer edge of each buffer, on the water-facing side, was then traced to define the onshore DSAS baseline for the respective year.
Variables
OBJECTID: Object ID (one continuous baseline for marsh-shore boundary) Shape: One continuous polyline Shape_Length: Length of the polyline in meters ID: Additional identifier
File: ShorelineMarshWidth.zip
Description: The shoreline was defined as the boundary where the ground elevation is at the same level as sea level across the study site. Marsh width was defined as the distance from each point to the 0-m NAVD88 contour.
Variables
FID: Feature ID (unique identifier for shoreline across study site) Shape: Polyline ZM (3D polyline geometry that includes both elevation and measurement values at each vertex) Shape Length: Length of polyline ZM in meters
File: SoilDrainageEnv.zip
Description: Shapefile of SSURGO soil drainability classes across the Long Island South Shore (excessively drained, moderately well drained, poorly drained, very poorly drained, well drained) and an additional water class (USDA 2025).
Variables
FID: Feature ID (unique identifier for polygons across study site representing different drainage classes) Shape: Polygon ZM (polygon feature type that stores four-dimensional data for each vertex: longitude, latitude, elevation/height, and a value for soil drainage. Drainage_Class: soil drainability class for each polygon broken down into 6 classifications (excessively drained, moderately well drained, poorly drained, very poorly drained, well drained, water) Shape_Leng: Perimeter of individual polygons in meters Shape_Area: Area of individual polygons in meters squared
File: ColiformEnv.tif
Description: Fecal coliform data (Suffolk County Department of Health Services 2025) sampling stations were found throughout the Great South Bay, Moriches Bay, and Shinnecock Bay and these data were interpolated across the study site using empirical Bayesian kriging after confirming no significant directional trends.
Variables
Value: Fecal coliform concentration for each pixel in MPN (Most Probable Number)/100 mL
File: SalinityEnv.tif
Description: Salinity data (Suffolk County Department of Health Services 2025) sampling stations were found throughout the Great South Bay, Moriches Bay, and Shinnecock Bay and these data were interpolated across the study site using empirical Bayesian kriging after confirming no significant directional trends.
Variables
Value: Water salinity levels for each pixel in ppt (parts per thousand)
File: TidalRangeEnv.tif
Description: Tidal range was calculated as the difference between mean higher high water and mean lower low water values derived from NOAA VDatum polygons across the Great South Bay (NOAA 2024).
Variables
Value: Tidal range level for each pixel in meters
File: ErosionTransectsData.xlsx
Description: Data were compiled for each erosion transect, including EPR (end point rate; average marsh erosion rate) and NSM (net shoreline movement; total erosion distance), along with environmental variables at each transect location: elevation, slope, TPI, coliform levels, salinity, tidal range, soil type, and marsh width.
Variables
OBJECTID: Unique identifier for each transect that measured marsh erosion EPR: End point rate; calculates change by dividing the distance of marsh-shore boundary movement by the time elapsed between the oldest (1938) and most recent (2022) marsh-shore boundary dates (in meters); positive values indicate landward erosion while negative values indicate seaward expansion of the marsh NSM: Net Shoreline Movement; total change in distance of marsh-shore boundary movement between 1938 and 2022 (in meters) elevation: meters above sea level relative to NAVD88 (North American Vertical Datum of 1988) slope: Slope (% rise) TPI: Topographic Position Index (relative elevation; negative values indicate that elevation for one cell is lower than surrounding cells) coliform: Fecal coliform concentration (MPN/100 mL) salinity: Water salinity (ppt) tidal_range: Tidal range (m) soil_type: Soil drainability classes (1: Excessively drained, 2: Well drained, 3: Moderately well drained, 4: Poorly drained, 5: Very poorly drained, 6: Water) marsh_width: Nearest distance from transect to forest edge in meters
File: MigrationTransectsData.xlsx
Description: Data were compiled for each migration transect, including EPR (end point rate; average marsh migration rate) and NSM (net shoreline movement; total migration distance), along with environmental variables at each transect location: elevation, slope, TPI, coliform levels, salinity, tidal range, soil type, and marsh width.
Variables
OBJECTID: Unique identifier for each transect that measured marsh migration EPR: End point rate; calculates change by dividing the distance of marsh-forest edge movement by the time elapsed between the oldest (1938) and most recent (2022) marsh-shore boundary dates (in meters); positive values indicate upland marsh migration while negative values indicate forest expansion into the marsh NSM: Net Shoreline Movement; total change in distance of marsh-forest edge movement between 1938 and 2022 (in meters) elevation: meters above sea level relative to NAVD88 (North American Vertical Datum of 1988) slope: Slope (% rise) TPI: Topographic Position Index (relative elevation; negative values indicate that elevation for one cell is lower than surrounding cells) coliform: Fecal coliform concentration (MPN/100 mL) salinity: Water salinity (ppt) tidal_range: Tidal range (m) soil_type: Soil drainability classes (1: Excessively drained, 2: Well drained, 3: Moderately well drained, 4: Poorly drained, 5: Very poorly drained, 6: Water) marsh_width: Nearest distance from shoreline to transect in meters
File: GeorefMisalignPoints.zip
Description: 500 points were generated across the study site with each one being a permanent or near-permanent feature such as a road intersection or distinct curve on a road and the difference in location between the georeferenced 1938 imagery and the true 2022 imagery was measured.
Variables
Shape: Multipoint OBJECTID: Object ID (unique identifier for each multipoint) Distance: How far the permanent or near-permanent feature is misaligned between the 1938 and 2022 imagery (in meters) Angle: Measures direction of the misalignment distance (in degrees)
File: RandomErosionTransects.zip
Description: 500 of the 1,587 erosion transects were randomly chosen and the angle was measured.
Variables
OBJECTID: Unique identifier for each of the random transects (1-500) Shape: Each transect is a polyline ObjectID_1: Unique identifier of the transects based on their original number (1-1,587) EPR: End point rate; calculates change by dividing the distance of marsh-shore boundary movement by the time elapsed between the oldest (1938) and most recent (2022) marsh-shore boundary dates (in meters); positive values indicate landward erosion while negative values indicate seaward expansion of the marsh NSM: Net Shoreline Movement; total change in distance of marsh-shore boundary movement between 1938 and 2022 (in meters) Angle: Measures direction of the transect (in degrees) Shape_Length: Length of individual transects (in meters)
File: RandomMigrationTransects.shp.zip
Description: 500 of the 1,261 migration transects were randomly chosen and the angle was measured.
Variables
OBJECTID: Unique identifier for each of the random transects (1-500) Shape: Each transect is a polyline ObjectID_1: Unique identifier of the transects based on their original number (1-1,261) EPR: End point rate; calculates change by dividing the distance of marsh-forest edge movement by the time elapsed between the oldest (1938) and most recent (2022) marsh-shore boundary dates (in meters); positive values indicate upland marsh migration while negative values indicate forest expansion into the marsh NSM: Net Shoreline Movement; total change in distance of marsh-forest edge movement between 1938 and 2022 (in meters) Angle: Measures direction of the transect (in degrees) Shape_Length: Length of individual transects (in meters)
File: HistoricalEdge25mBuffer.zip
Description: A 25 meter buffer was created around the 1938 marsh-forest edge to have points randomly generated in this buffer.
Variables
OBJECTID: Identifier for individual buffer Shape: Polygon Shape_Length: Length of buffer (in meters) Shape_Area: Area of buffer (in meters squared)
File: HistoricalShoreline25mBuffer.zip
Description: A 25 meter buffer was created around the 1938 marsh-shore boundary to have points randomly generated in this buffer.
Variables
OBJECTID: Identifier for individual buffer Shape: Polygon Shape_Length: Length of buffer (in meters) Shape_Area: Area of buffer (in meters squared)
File: PresentEdge25mBuffer.zip
Description: A 25 meter buffer was created around the 2022 marsh-forest edge to have points randomly generated in this buffer.
Variables
OBJECTID: Identifier for individual buffer Shape: Polygon Shape_Length: Length of buffer (in meters) Shape_Area: Area of buffer (in meters squared)
File: PresentShoreline25mBuffer.zip
Description: A 25 meter buffer was created around the 2022 marsh-shore boundary to have points randomly generated in this buffer.
Variables
OBJECTID: Identifier for individual buffer Shape: Polygon Shape_Length: Length of buffer (in meters) Shape_Area: Area of buffer (in meters squared)
File: HistoricalEdgeError.zip
Description: 500 points were randomly generated in the historical marsh-forest edge 25 meter buffer. Distance from the edge was measured as well as how each point was classified by the semi-automated boundary delineation method and an independent observer.
Variables
OID: Original ID (unique identifier for each point) Shape: Point CID: Constraint ID (representing the ID of which the random points were generated) IO_Class: Point classified by independent observer (0 = marsh; 1 = forest) SAB_Class: Point classified by semi-automated boundary delineation method (0 = marsh; 1 = forest) Distance_to_Edge: distance of point from marsh-forest edge (in meters)
File: HistoricalShorelineError.zip
Description: 500 points were randomly generated in the historical marsh-shore boundary 25 meter buffer. Distance from the edge was measured as well as how each point was classified by the semi-automated boundary delineation method and an independent observer
Variables
OID: Original ID (unique identifier for each point) Shape: Point CID: Constraint ID (representing the ID of which the random points were generated) IO_Class: Point classified by independent observer (0 = marsh; 1 = water) SAB_Class: Point classified by semi-automated boundary delineation method (0 = marsh; 1 = water) Distance_to_Edge: distance of point from marsh-shore boundary (in meters)
File: PresentEdgeError.zip
Description: 500 points were randomly generated in the present marsh-forest edge 25 meter buffer. Distance from the edge was measured as well as how each point was classified by the semi-automated boundary delineation method and an independent observer.
Variables
OID: Original ID (unique identifier for each point) Shape: Point CID: Constraint ID (representing the ID of which the random points were generated) IO_Class: Point classified by independent observer (0 = marsh; 1 = forest) SAB_Class: Point classified by semi-automated boundary delineation method (0 = marsh; 1 = forest) Distance_to_Edge: distance of point from marsh-forest edge (in meters)
File: PresentShorelineError.zip
Description: 500 points were randomly generated in the present marsh-shore boundary 25 meter buffer. Distance from the edge was measured as well as how each point was classified by the semi-automated boundary delineation method and an independent observer.
Variables
OID: Original ID (unique identifier for each point) Shape: Point CID: Constraint ID (representing the ID of which the random points were generated) IO_Class: Point classified by independent observer (0 = marsh; 1 = water) SAB_Class: Point classified by semi-automated boundary delineation method (0 = marsh; 1 = water) Distance_to_Edge: distance of point from marsh-shore boundary (in meters)
Code/Software
Statistical analyses were conducted using the open-source statistical computing software R (R Core Team 2025).
Use: R was used to generate graphs in figures 4, 5, and 6 in the manuscript. R was also used for the random forest modelling to relate rates of marsh migration and erosion to environmental predictors.
Input data: “MigrationTransectsData.xlsx” and “ErosionTransectsData.xlsx” which combines the migration and erosion rates measured in this study with publicly available environmental data.
Packages: “tidyverse” (Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H, 2019); “dplyr” (Kucheryavskiy S, 2020); “ggplot2” (H. Wickham, 2016); “readxl” (Wickham H, Bryan J, Kalicinski M, Valery K, Leitienne C, Colbert B, Hoerl D, Miller E, Bryan MJ, 2019); “randomForest” (RColorBrewer S, Liaw MA, 2018); “rfPermute” (Archer E, Archer ME, 2016); “pdp” (Greenwell BM, 2017); “tidyr” (Wickham H, Wickham MH, 2017); “stringr” (Wickham H, Wickham MH, 2019); “cowplot” (Wilke CO, Wickham H, Wilke MC, 2019); “patchwork” (Pedersen TL, 2019); “svglite” (Wickham H, Henry L, Pedersen TL, Luciani TJ, Decorde M, Lise V, Plate T, Gohel D, Qiu Y, Malmedal H, 2023).
Access Information
The zipped folder for the 1938 aerial imagery .tif files can be opened and used in GIS software (e.g.QGIS) as well as in R and Python. The .zip archive compresses and organizes the raster datasets for storage and sharing, but the individual .tif raster files can be accessed directly once they are uncompressed.
Shapefiles can be opened and used in any GIS software (e.g.QGIS) and in R or Python. A shapefile consists of multiple file types beyond the .shp (specifically, .cpg, .dbf, .prj, .sbn, and .sbx). The user only interacts directly with the .shp file but the other files need to be in the same directory.
Data was derived from the following sources: Additional publicly available datasets used in the paper include 2022 NAIP imagery which can be accessed from the National Agriculture Imagery Program (NAIP) Geohub site, at https://naip-usdaonline.hub.arcgis.com/ Elevation data was derived from the USGS 1m-DEM (USGS 2024) which can be accessed from the USGS GIS Data Download site, at https://www.usgs.gov/the-national-map-data-delivery/gis-data-download
Marsh migration and erosion rates were quantified within the low elevation zone (<3.0 m NAVD88) of the Long Island South Shore Estuary Reserve (LISSER). The study area spans over 110 km and features 19,000 acres of vegetated tidal wetlands and salt marshes. The study site was broken down into four subregions: Western Great South Bay (WGSB), Eastern Great South Bay (EGSB), Western Moriches Bay (WMB), and Eastern Moriches Bay (EMB). After Hurricane Sandy, a new inlet formed (~400 square meters) found in the EGSB region. The salt marsh communities on the LISSER are defined by halophytic plants such as Sporobolus alterniflorus in the low marsh and Sporobolus pumilus, Distichlis spicata, Juncus gerardi, and Phragmites australis in the intertidal and high marsh. Acer rubrum, Quercus spp., Prunus serotina, Carya glabra, Nyssa sylvatica, and Juniperus virginiana are found in the upland forests. The relative SLR rate in New York City (the nearest tide gauge) is 5.5 ± 1.3 mm yr− 1 and the region is considered a SLR hotspot. Additional stressors that contribute to marsh and coastal forest loss in the LISSER include its susceptibility to extreme storm surge impacts, widespread exposure to wastewater discharge which may accelerate loss of low-elevation and intertidal marshes, and high concentrations of urban areas limiting the space for potential landward migration of marshes (e.g. “coastal squeeze”). Due to SLR and other factors, the LISSER is experiencing the emergence of ghost forests and undergoing substantial and active salt marsh loss (13% for 1974-2008).
Marsh migration and erosion rates were quantified between 1938 and 2022 using a semi-automated digitization of habitat boundaries utilizing historical and contemporary aerial photography. Historical photos were collated and georeferenced in ArcGIS Pro (version 3.5.3, ESRI, Redlands, CA, USA) using a first-order polynomial transformation, with a minimum of six control points per image tile located using persistent features such as road intersections and historic structures. National Agricultural Imagery Program (NAIP) was used to digitize modern (2022) habitat boundaries. Aerial imagery was processed in Google Earth Engine using a Simple Non-Iterative Clustering (SNIC) segmentation algorithm to group adjacent pixels with similar color intensities into image-objects (compactness=1; connectivity=8; neighborhood size=20; vectorization=5). The marsh-forest and marsh-shore boundaries were hand-digitized using visual inspection at a 1:2000 scale with dense tree stands being used to define forests. Patchy and sparse tree areas by the edge were assumed to be dying areas and therefore were not considered as forest for this study. Marshes were defined by difference in color (lighter gray in the 1938 imagery; green/brown area in 2022 imagery) with darker bodies of water and whiter sandy shorelines being easily distinguishable. After converting image-objects to polylines, we selected those within 20 m of the hand-digitized marsh–forest edge, generated a 10 m buffer around the selected segments to derive a midpoint centerline, and then connected these centerlines to produce a continuous forest-edge line. The same workflow was used to map the marsh–shore boundary.
The Digital Shoreline Analysis System (DSAS v6.0.170; USGS 2024) was used to measure change over time in habitat boundaries. A 100-meter buffer around the marsh edges was created to serve as the onshore DSAS baseline. Transects 200 meters in length were generated along the study area at 25-meter spacing with a 200-meter smoothing distance. Each transect was clipped to the extent between historic and contemporary boundaries, and the positional change and rate of change were calculated. Due to complex forest-edge or shoreline shapes, as well as sharp changes in baseline direction, some transects produced by DSAS were skewed and those were removed after thorough inspection.
After measuring marsh migration and erosion rates, we used environmental data to identify key drivers of migration and erosion. These included slope, elevation, TPI (topographic position index; a measure of elevation relative to surrounding cells), tidal range, marsh width, soil drainability, and water quality variables including average fecal coliform (FC) concentration and salinity (1976-present).
Transects generated in the Digital Shoreline Analysis System (DSAS) were exported to ArcGIS Pro and assigned one point every 10 meters to account for any environmental variability within each of the transects. Each of those points were assigned a value for each environmental variable analyzed in this study. Slope, elevation, and TPI were derived from the USGS 1-m DEM. Tidal range was calculated as the difference between mean higher high water and mean lower low water values derived from NOAA VDatum polygons across the Great South Bay. Marsh width was defined as the distance from each point to the 0-m NAVD88 contour, representing the boundary where the ground elevation is at the same level as sea level. SSURGO soil drainability classes were converted to continuous values (1-5, with 1 = Excessively drained; 5 = poorly drained) and an additional class for open water. Water quality data sampling stations were found throughout the Great South Bay, Moriches Bay, and Shinnecock Bay and these data were interpolated across the LISSER using empirical Bayesian kriging after confirming no significant directional trends.
Random forest (RF) models were used to relate rates of marsh migration and erosion (in m yr-1) to environmental predictors, including slope, elevation, TPI, tidal range, marsh width, soil drainability, and open water salinity and fecal coliform. Statistical analyses were conducted using the open-source statistical computing software R. Using the randomForest package, two-thirds of the data were used to train the decision trees while the other third were the out-of-bag (OOB) samples used for validation. We used the tuneRF function to find the optimal number of predictor variables by finding the number that minimizes OOB error. OOB error diagnostics were used to select the optimal number of trees by identifying the point where additional trees no longer improved accuracy. The optimal number of predictors was four for the marsh migration model and two for the marsh erosion model and the optimal number of trees was 200 for both models. Model performance was assessed using R2 and goodness-of-fit metrics including root mean square error (RMSE) and mean absolute error (MAE) calculated on the testing data. Predictor importance was evaluated using permutation-based importance metrics.
