Skip to main content

Data from: Hindcast-validated species distribution models reveal future vulnerabilities of mangroves and salt marsh species


Hodel, Richard; Soltis, Douglas; Soltis, Pamela (2022), Data from: Hindcast-validated species distribution models reveal future vulnerabilities of mangroves and salt marsh species, Dryad, Dataset,


Rapid climate change threatens biodiversity via habitat loss, range shifts, increases in invasive species, novel species interactions, and other unforeseen changes. Coastal and estuarine species are especially vulnerable to the impacts of climate change due to sea level rise and may be severely impacted in the next several decades. Species distribution modeling can project the potential future distributions of species under scenarios of climate change using bioclimatic data and georeferenced occurrence data. However, models projecting suitable habitat into the future are impossible to ground truth. One solution is to develop species distribution models for the present and project them to periods in the recent past where distributions are known to test model performance before making projections into the future. Here, we develop models using abiotic environmental variables to quantify the current suitable habitat available to eight Neotropical coastal species: four mangrove species and four salt marsh species. Using a novel model validation approach that leverages newly available monthly climatic data from 1960-2018, we project these niche models into two time periods in the recent past (i.e., within the past half-century) when either mangrove or salt marsh dominance was documented via other data sources. Models were hindcast-validated and then used to project the suitable habitat of all species at four time periods in the future under a model of climate change. For all future time periods, the projected suitable habitat of mangrove species decreased, and suitable habitat declined more severely in salt marsh species.


Data acquisition

            We obtained specimen-based occurrence data for each species from iDigBio (Integrated Digitized Biocollections; and GBIF (Global Biodiversity Information Facility; and supplemented these data with locality data from personal collections for three mangrove species (Avicennia germinans, Laguncularia racemosa, Rhizophora mangle). Four of the species included in the analysis are mangroves (Avicennia germinans, black mangrove; Laguncularia racemosa, white mangrove; and Rhizophora mangle, red mangrove) or mangrove-associated species (Conocarpus erectus, buttonwood). For simplicity, these four species will hereafter be collectively referred to as ‘mangroves,’ even though Conocarpus erectus is not considered a true mangrove (Tomlinson, 2016). We also selected four salt marsh species (Batis maritima, turtleweed; Sesuvium portulacastrum, sea purslane; Spartina alterniflora, smooth cordgrass; and Sporobolus virginicus, seashore dropseed) for analyses. These four species were selected because they occur in close proximity to one another—indicating the presence of salt marsh habitat—and because of their broad and exclusively coastal distributions in the Neotropics. We used SDM to investigate changes in suitable habitat for all eight species. The raw data were cleaned using standard approaches and R scripts (e.g., Marchant et al., 2017); duplicates and incorrect data (e.g., latitude and longitude of 0) were removed from the data set (all scripts used in this paper were deposited in GitHub (, and all cleaned occurrence data, layers, and models were deposited in Dryad. We included species that had exclusively coastal or estuarine distributions, and only species with at least 50 occurrence points (after cleaning) were used in the analyses. Given the complexities of the modeling approach, we focused on the Neotropics as opposed to a global analysis; only mangrove and salt marsh species with native ranges in the Americas were used (i.e., cosmopolitan species were excluded). Certain species that inhabit salt marshes, but that have extensive inland distributions, including freshwater wetlands, were excluded (e.g., Distichlis spicata).

            We acquired bioclimatic environmental layers from Worldclim 2.1 (; Fick & Hijmans, 2017) for multiple time periods. The bioclimatic layers, which contain temperature and precipitation data for every continent except Antarctica, have been used extensively and successfully in SDM studies (Booth, 2018). In Worldclim 2.1, annual precipitation, maximum temperature, and minimum temperature data are available for every month from 1960-2018 at 2.5 arc minute resolution; these three variables can be used to calculate values of all 19 bioclimatic variables (Harris et al., 2014; Fick & Hijmans, 2017; Hijmans et al., 2017). We considered the present to be 2013-2018, the 1980s salt marsh dominance period to be 1984-1989, and the early 2000s mangrove dominance to be 2001-2006. These time periods were selected to capture the optimal amount of either mangrove or salt marsh dominance during each documented oscillation (Cavanaugh et al., 2019), and we selected these windows of time so that the present and past time periods were all six years. Although many of the study species may be longer-lived than each of the time periods (i.e., six years), we prioritized using time periods that captured either mangrove or salt marsh dominance. Due to the oscillations of mangrove versus salt marsh dominance, many individual plants were likely exterminated on short time scales. We used all occurrence data to construct an SDM for each species for our defined present time (2013-2018) regardless of when the specimens were collected. It would be ideal to use separate occurrence specimens from each time period to assess SDM performance, but this was not possible with the temporal distribution of georeferenced data points. For each six-year time period, we averaged the annual precipitation, maximum temperature, and minimum temperature for each month (e.g., average values of these three variables were calculated across the six January months, six February months, etc. in each time period) and used the resulting 12 monthly averages to calculate the standard 19 bioclimatic variable values using the ‘biovars’ function in the ‘dismo’ R package for each six-year time period (Hijmans et al., 2017). The standard 19 bioclimatic variables are not available on a monthly basis because some of them incorporate seasonality and require data for at least one year. By using monthly data for annual precipitation, maximum temperature, and minimum temperature variables, all of the 19 bioclimatic variables can be calculated (Hijmans et al., 2017).

All layers were then trimmed so that the extent of the study area was between -120 and -32 degrees longitude, and -36 and 36 degrees latitude using custom scripts and the R package ‘raster’ (Hijmans et al., 2015) and exported in ASCII format (Fig. 1). This study area was selected because it included subtropical and tropical regions of both the Northern and Southern Hemispheres, captured the ecotone between mangrove and salt marsh species in both Hemispheres, and allowed for an expansion zone as some species may expand their ranges in the future as the climate changes. Regions such as Hawaii, where some Neotropical mangrove species have been introduced, were not included in the study. We used an R script and the R package ‘raster’ (Hijmans et al., 2015) to measure the pairwise correlation of the 19 bioclimatic variables. When variables were correlated with one another (r > 0.7), only one of the layers was retained for subsequent analyses (Dormann et al., 2013). After removing correlated layers, we had a data set of six bioclimatic variables (BIO2, mean diurnal temperature range; BIO5, maximum temperature of the warmest month; BIO6, minimum temperature of the coldest month; BIO12, annual precipitation; BIO15, precipitation seasonality; BIO18, precipitation of warmest quarter). BIO6 and BIO1 were highly correlated (r = 0.956), and BIO1 (mean annual temperature) was excluded even though it is frequently included in SDM analyses because BIO6 has been identified as an important variable shaping range limits of coastal species (Tomlinson, 2016). All layers were clipped using the ‘mask’ function in the ‘raster’ R package (Hijmans et al., 2015) such that all cells with an elevation greater than 10m were considered ‘no data’ cells. This was done to ensure that the SDM analyses were not trained on inland regions representing areas where these coastal species do not occur.


Species Distribution Modeling

The occurrence data obtained from digitized herbaria records and the six environmental layers were used as input for the SDM analyses. SDM uses the occurrence data for each species in the present to identify pixels that have suitable habitat for the species of interest based on environmental data. We used the maximum entropy algorithm implemented in MAXENT v3.4.1 (Phillips et al., 2006; Phillips et al., 2017) to conduct SDM analyses. The maximum entropy algorithm uses presence data and random background sampling to develop the model, and it has been shown to perform well with presence-only data (Elith et al., 2006; Wisz et al., 2008). Optimal settings for MAXENT model fit were determined using the ‘ENMevaluate’ function in the ENMeval R package (Muscarella et al., 2014). We investigated regularization multipliers from 0.5 to 4 at intervals of 0.5 and the following features/combinations of features: linear, linear/quadratic, linear/quadratic/hinge, linear/quadratic/hinge/product, linear/quadratic/product/threshold, and linear/quadratic/hinge/product/threshold. The ‘ENMevaluate’ function was run for each species, using the same 10,000 background points, occurrence data for the species of interest, and the ‘maxnet’ algorithm with the ‘checkerboard2’ method. The DAICc scores for all models tested for each species were compared to determine the optimal model to be inputted into MAXENT. Other non-default settings used include five-fold cross-validation, a minimum training presence threshold, and fading by clamping. Cloglog output was used because it produces an estimate for each pixel between 0 and 1 that represents the probability of presence (Phillips et al., 2017).

We assessed each model’s prediction ability by using partial receiver operating characteristic (pROC), which measures the ratio of the area under the receiver operating characteristic curve (AUC). AUC ranges from zero to one and measures the model’s ability to predict suitable habitat, with one indicating perfect discrimination between suitable and unsuitable habitat. The pROC is the ratio of the partial AUC divided by random expectation, and it can range from 0 to 2, with 1 representing random model performance (Escobar et al., 2018). For independent occurrence points, this metric measures the relationship of omission error and proportion of suitable area under conditions of low omission errors (Peterson et al., 2008). Jackknife tests of regularized training gain were used to measure the relative contribution of each bioclimatic variable to the model. Average habitat suitability values for each pixel were modeled for the present, past, and future of each species, and these values were used in downstream analyses. For a given region, the sum of habitat suitability scores was considered the total suitable habitat.


SDM validation and projection into future

We measured each species’ suitable habitat and how it was projected to change from the present to the past. First, we defined an area representing the northeastern Florida ecotone used in previous studies to use for hindcast validations: between -82 and -80 degrees longitude, and 28 and 31 degrees latitude (Cavanaugh et al., 2019). For convenience, we hereafter refer to this region as ‘NE Florida.’ We considered the SDM to be properly fit when it accurately inferred the anticipated relative change in suitable habitat between the average mangrove species and the average salt marsh species for all past time periods in the NE Florida validation region. We also used a larger geographic region (between -87 and -79 degrees longitude, and 24 and 31 degrees latitude) to test if the hindcast validations were consistent when a larger region was used; we hereafter refer to this region as ‘Florida’ for simplicity.

Once the SDMs were hindcast-validated, the same approach was used to infer the projected change in suitable habitat for the three time periods in the future. To project future values of environmental variables, we used a widely used and well-validated climate model—the CNRM-CM6-1 model, which is a fully-coupled atmosphere-ocean general circulation model developed by Centre National de Recherches Météorologiques (CNRM) for the sixth generation of the IPCC Coupled Model Intercomparison Project 6 (CMIP6), and with the shared socioeconomic pathway 245 (Eyring et al., 2016). This climate model was selected because it is one of 49 used in the most recent IPCC CMIP6, is compatible with the WorldClim 2.1 data used for hindcast analyses (, and the shared socioeconomic pathway was selected because it represents a central part of the range of plausible future pathways. SDMs for the full geographic study region were projected into both the past and future time periods. Future time periods were determined by the availability of WorldClim 2.1 data.

Usage notes

The programx MAXENT v3.4.1, which was used for SDM analyses, is freely available. ASC layers can be viewed using the open-source program QGIS. R and RStudio (both freely available) and associated open-source packages were used to process and analyze data.


National Science Foundation, Award: EF-1115210

National Science Foundation, Award: DBI-1547229

National Science Foundation, Award: DBI-2027654

National Science Foundation, Award: DEB-1501600