Skip to main content
Dryad

Data from: Combining geostatistical and biotic interactions modelling to predict amphibian refuges under crayfish invasion across dendritic stream networks

Cite this dataset

Mota-Ferreira, Mário; Beja, Pedro (2021). Data from: Combining geostatistical and biotic interactions modelling to predict amphibian refuges under crayfish invasion across dendritic stream networks [Dataset]. Dryad. https://doi.org/10.5061/dryad.mw6m905tb

Abstract

Aim: Biological invasions are pervasive in freshwater ecosystems, often causing native species to contract into areas that remain largely free from invasive species impacts. Predicting the location of such ecological refuges is challenging, because they are shaped by the habitat requirements of native and invasive species, their biotic interactions, and the spatial and temporal invasion patterns. Here we investigated the spatial distribution and environmental drivers of refuges from invasion in river systems, by considering biotic interactions in geostatistical models accounting for stream network topology. We focused on Mediterranean amphibians negatively impacted by the invasive crayfishes Procambarus clarkii and Pacifastacus leniusculus.

Location: River Sabor, NE Portugal

Methods: We surveyed amphibians at 168 200-m stream stretches in 2015. Geostatistical models were used to relate the probabilities of occurrence of each species to environmental and biotic variables, while controlling for linear (Euclidean) and hydrologic spatial dependencies. Biotic interactions were specified using crayfish probabilities of occurrence extracted from previously developed geostatistical models. Models were used to map the distribution of potential refuges for the most common amphibian species, under current conditions and future scenarios of crayfish expansion.

Results: Geostatistical models were produced for eight out of 10 species detected, of which five species were associated with lower stream orders and only one species with higher stream orders. Six species showed negative responses to one or both crayfish species, even after accounting for environmental effects and spatial dependencies. Most amphibian species were found to retain large expanses of potential habitat in stream headwaters, but current refuges will likely contract under plausible scenarios of crayfish expansion.

Main conclusions: Incorporating biotic interactions in geostatistical modelling provides a practical and relatively simple approach to predict present and future distributions of refuges from biological invasion in stream networks. Using this approach, our study shows that stream headwaters are key amphibian refuges under invasion by alien crayfish.

Methods

Workflow detailing the methodological procedures

 

# Pre-processing of geographical river basin data:

  1. The catchment of the river Sabor and its river segments were extracted from the Catchment Characterisation and Modelling (CCM) river network database (Vogt et al., 2007), available from the web page: http://ccm.jrc.ec.europa.eu/.
  2. In ArcGIS, we clipped the Sabor river basin using the CATCHMENTS shapefile:
    1. We selected the Sabor river catchment polygons using the tool Select by Atributes using the expression - "WSO6_ID" =  442389, and then we exported the selected elements to a new shape file;
    2. We used the new shape file to select the river segments using the tool Select by Location, and then we exported the selected elements to a new shape file.
  3. To generate the points for model predictions, we split the stream network in segments with a maximum length of 1000 meters:
    1. We used the Generate Points Along Lines tool, which is available in the Data management toolbox under "sampling". We then set to 1000 meters the specific distance between points along the lines;
    2. We used the points from the previous step to split the stream network using the Split Line at Point tool, which is in the Data Management toolbox under "Features";
    3. We computed the segments centroids of the split segments using the Feature to Point tool, under the same toolbox.

 

# Environmental variables

  1. We characterized each point sampled for amphibians and each point used to predict amphibian distributions using four environmental variables: elevation [Alt], total annual precipitation [Prec], Strahler’s stream order [SO] and the probability of water presence during the dry season [Water]. We also used two variables describing the potential for biotic interactions between amphibians and either Procambarus clarkii [Pclar] or Pacifastacus leniusculus [Plen].
  2. We estimated elevation [alt] from a DEM with a 10-meter resolution built from digitalized 1:25,000 topographic maps.
  3. We extracted total annual precipitation [Prec] from from WordClim 2 data set  with a 30’ (≈ 1km) resolution (Fick & Hijmans, 2017) available at http://worldclim.org/version2;
  4. We took Strahler’s stream order [SO] from the CCM  2.1 data set.
  5. We computed the probability of each stream segment holding water in the summer [Water] based in 189 site observations in 2012 in previous study (Ferreira et al., 2016).
  6. As biotic variables we used the probability of P. clarkii [Pclar] or P. leniusculus [Plen] occurring in stream segments, estimated from the geostatistical distribution models computed by Filipe et al. (2017), which were based on electrofishing data collected on 167 sites in 2012.
  7. To extract variables values for each point described in [4]:
    1. For variables in raster formats (Alt and Prec), we used the Extract Multi Values to Points, in the Spatial Analyst Tools, under “Extraction”;
    2. For variables in vector format (SO, Water, Pclar and Plen), we used the Spatial Join tool, in the Analysis Tools toolbox under “Overlay”.
  8. Before analysis, the values of all variables were standardised to zero mean and unit standard deviation, by subtracting each value from the overall mean of the observed sites and divide by their standard deviation.

 

# Simulation of crayfish expansion

  1. We used the geostatistical models of crayfish distribution provided by Filipe et al. (2017) to simulate the potential expansion of Procambarus clarkii and Pacifastacus leniusculus across the Sabor watershed. Simulations were based on the idea that expansion of a species will occur towards stream reaches where its probability of occurrence is estimated to be higher by the model. In practice, this was implemented by raising the probability of occurrence estimated at a given site assuming increases in the relative risk (Rr) of the species occurring at that site. The relative risk was defined as the odds ratio of the probabilities of crayfish occurrence under future and current conditions, where odds are the ratio of the probability of occurrence and the probability of absence. Given a relative risk, Rr, the new probability of occurrence under an expansion scenario is computed by applying the expression:

where pi and p'i are the probabilities of crayfish occurrence at present and in the future at site i.

  1. Using the expression in [12], we simulated for each species three expansion scenarios, corresponding to two, three and five-fold increases of the relative risk (Rr). We also considered the isolate and joint effects of expansion of just one or both species, respectively, corresponding to the 16 scenarios provided in Table A.

 

Table A: Summary of crayfish expansion scenarios analysed in this study

Relative Risk Scenarios

Pacifastacus leniusculus

1x

2x

3x

5x

Procambarus clarkii

1x

C1L1

C1L2

C1L3

C1L5

2x

C2L1

C2L2

C2L3

C2L5

3x

C3L1

C3L2

C3L3

C3L5

5x

C5L1

C5L2

C5L3

C5L5

 

 

# STARS processing

  1. To produce the files to analyse in R with SSN, we used STARS, a toolbox for ArcGIS package available at: https://www.fs.fed.us/rm/boise/AWAE/projects/SSN_STARS/software_data.html.
  2. We followed the tutorial provided by the authors of the STAR toolbox.  In each step, we will refer the number the section in the tutorial between brackets. The tutorial is available at: https://www.fs.fed.us/rm/boise/AWAE/projects/SSN_STARS/downloads/STARS/STARS_tutorial_2.0.7.pdf.
  3. We imported the Sabor river shapefile produced in step [2.b] into a Landscape Network geodatabase (LSN file) using the Polyline to Landscape Network tool under “Pre-processing” (section 7). Because the CCM 2.1 dataset is a topologically correct river network, we did not need to check topological errors (section 8).
  4. We used watershed areas to weight the relative influence of the branching upstream segments. We accumulated the attribute CATCHEMENT of CCM2.1 in the stream network using the Accumulate Values Downstream tool under “calculate” (section 12).
  5. To incorporate the observations and all the predictions point sets, we used the Snap Points to Landscape Network tool. Because the locations of the observed sites were taken on the field, we had to set a search radius (100 meters) and manually check if the sites were snapped in the right location on the stream network. This was not needed for predictions as these points were generated on the stream network (section 14).
  6. We assigned the accumulated watershed area computed in step [17] to observed and predictions points, using Watershed Attributes tool (section 15).
  7. We computed the distance between the outlet and:
    1. each stream segment (Upstream Distance – Edges tool);
    2. each point (observed sites and prediction points; Upstream Distance – Sites tool).

(section 16)

  1. To compute the spatial weights needed to fit the spatial model, we:
    1. Calculated the segment proportional influence related to accumulated watershed area (Segment PI tool) (section 17).
    2. Calculated the additive function in:
      1. Stream segments (Additive Function – Edges);
      2. Sites (observed and predictions; Additive Function – Sites).

(section 18)

  1. Because the set of observed sites is different for each species, we repeated the previous process and exported a SSN file for each species. In each SSN file there is a representation the topology of the Sabor stream network, spatial data to build the geostatistical models (observations and variables), and spatial data to make predictions (16 crayfish expansion scenarios). All SSN files are together with the R scrits and associated files (see Tables B and C) are available in Dryad (https://doi.org/10.5061/dryad.mw6m905tb), and they can be used to repeat all analysis in R using the SSN package (section 19).

Table B: List of R scripts and SSN files available in Dryad (https://doi.org/10.5061/dryad.mw6m905tb) and that can be used to repeat all analysis in R using the SSN package.

Name

Description

Script.R

R script with instructions for model selection and prediction.

start.rdata

R data file with objects to facilitate model selection and prediction (table 3).

lsnAc.ssn

SSN file with site observations of Alytes cisternasii in the Sabor river basin, variables for model building and selection, and 16 crayfish expansion scenarios.

lsnAo.ssn

SSN file with site observations of Alytes obstetricans in the Sabor river basin, variables for model building and selection, and 16 crayfish expansion scenarios.

lsnBs.ssn

SSN file with site observations of Bufo spinosus in the Sabor river basin, variables for model building and selection, and 16 crayfish expansion scenarios.

lsnLb.ssn

SSN file with site observations of Lissontriton boscai in the Sabor river basin, variables for model building and selection, and 16 crayfish expansion scenarios.

lsnPp.ssn

SSN file with site observations of Pelophylax perezi in the Sabor river basin, variables for model building and selection, and 16 crayfish expansion scenarios.

lsnRi.ssn

SSN file with site observations of Rana iberica in the Sabor river basin, variables for model building and selection, and 16 crayfish expansion scenarios.

lsnSs.ssn

SSN file with site observations of Salamandra salamandra in the Sabor river basin, variables for model building and selection, and 16 crayfish expansion scenarios.

lsnTm.ssn

SSN file with site observations of Triturus marmoratus in the Sabor river basin, variables for model building and selection, and 16 crayfish expansion scenarios.

 

Table 3: List of R objects available through Dryad (https://doi.org/10.5061/dryad.mw6m905tb) and that can be used together with SSN files and R scripts (Table B) to repeat all analysis in R using the SSN package.

Name

Description

CorMdls

A list with the 180 possible spatial structures that can be fitted in function glmssn of the package SSN.

Sps

A character vector with species abbreviations for use in loops.

Sps.sh

A shorter character vector with species abbreviations for use in loops.

 

# Model building in R using the SSN package

  1. To carry out the statistical analysis we produced an R script that is provided in Amphibians&Crayfish.zip available in Dryad (https://doi.org/10.5061/dryad.mw6m905tb). This script can run properly, without any editing, if the .zip file is extracted to a drive named “C” in Windows. We inserted in the script a reference to each step of this document.
  2. We fitted the geostatiscal models using the SSN package (Hoef et al., 2014). For model selection we used the MuMIn package (Barton, 2018). To handle the spatial data in R we used the rgdal package (Bivand et al., 2019). We computed model evaluation metrics using the modEvA package (Barbosa et al., 2016).
  3. We imported the SSN file and computed the distance matrix. We also extracted the table with species observations and variables to a spatial data frame object.
  4. We built 3 sets of saturated logistic regression models (full models) with all available variables:
    1. Environmental variables (i.e., Alt, Prec, SO and Water);
    2. Biotic variables (i.e., Pclar and Plen);
    3. Environmental + Bioti
  5. We fitted all possible combination of variables with the dredge function (Barton, 2018) and selected the configuration with the lowest AICc for the 3 sets of models (best models).
  6. For the best Environmental + Biotic model, we extracted the residuals and computed the empirical semivariogram (Torgegram).
  7. Using the glmssn function (Hoef et al., 2014), we fitted all possible spatial structures to the best Environmental + Biotic model residuals. We retained the spatial structure that minimized the AICc. This step can take about one hour for each species.
  8. Using the glmssn function, we built the Environmental, Biotic, and Environmental + Biotic model using the variable configurations selected in step [27]. We computed model residuals using leave-one-out cross-validation procedure (Hoef et al., 2014). With the AUC and threshMeasures functions (Barbosa et al., 2016) we computed the AUC, Koen’s Kappa, and true skill statistics (TSS) using the prevalence as a threshold.
  9. We built the Environmental + Biotic + Spatial model by combining the variable configuration of the Environmental + Biotic model selected in step [27] with spatial structure that best fitted the residuals selected in step [29]. We then computed model residuals using leave-one-out cross-validation procedure (Hoef et al., 2014). With the AUC and threshMeasures function (Barbosa et al., 2016) we computed the AUC, Koen’s Kappa, and true skill statistics (TSS) using the prevalence as a threshold.

# Predicting species occurrences under current and simulated crayfish expansion scenarios

  1. We used the models produced in the previous steps, together with the 16 crayfish expansion scenarios (Table S1), to predict the distributions of each amphibian species at present and in the future. For each species and expansion scenario, probabilities of species occurrence produced by the models were transformed into estimated presences/absences using current prevalence as the threshold for species presence.
  2. A first set of estimated distributions was produced using the Environmental + Biotic models built in step [30], thereby excluding the spatial (geostatistical) component (i.e., step [29]). This involved the following procedure:
    1. We imported each of the prediction points produced in step [11] using the function importPredpts and created downstream hydrologic distances matrices between points using the createDistMat function.
    2. We used the predict.glmssn function to predict for each of the set of predictions points.
    3. We transformed the predictions values from the logit scale to the probabilistic scale.
    4. We assigned presence to the stream segment under the current scenario if the predicted value is higher than the observed prevalence calculated on step [33].
  3. Predictions of species distributions were estimated using exactly the same procedure as in [33] using the Environmental + Biotic + Spatial models built in step [31].
  4. We saved an .rdata file for each species. For each species, we manually exported to a point shapefile using the writeOGR function the predicted occurrences for each of the 16 expansion scenario for Spatial and Non-Spatial models (8 species x 2 models x 16 scenarios = 256 shapefiles).

# Mapping predicted species distributions under current and simulated crayfish expansion scenarios

  1. We combined the predictions in the point shapefiles with the splitted stream network we had produced in step [3] using the Spatial Join tool, in the Analysis Tools toolbox under “Overlay” on ArcGis.
  2. For the species where the models achieved a good discrimination ability (AUC >= 0.80), we mapped the distribution of amphibians refuges by combining the scenario with no increase in the relative risk of crayfish occurrence (the base scenario, “C1L1”) with the most extreme scenario of crayfish invasion (“C5L5”):
    1. We classified stream segments with predicted absences in both scenarios as unsuitable habitat;
    2. We classified stream segments with predicted presences in both scenarios as suitable habitat held under the most extreme scenario of crayfish invasion;
    3. We classified stream segments with predicted presences in the base scenario (“C1L1”) and predicted absences in the extreme scenario as suitable habitat that will possible be lost under crayfish invasion;

# References cited in the workflow

Barbosa A.M., Brown J.A., Jiménez-Valverde A., & Real R. (2016) modEvA: Model Evaluation and Analysis. .

Barton K. (2018) MuMIn: Multi-Model Inference. R package version 1.42.1. https://CRAN.R-project.org/package=MuMIn. .

Bivand R., Keitt T., & Rowlingson B. (2019) rgdal: Bindings for the “Geospatial” Data Abstraction Library.

Ferreira M., Filipe A.F., Bardos D.C., Magalhães M.F., & Beja P. (2016) Modeling stream fish distributions using interval-censored detection times. Ecology and Evolution, 6, 5530–5541.

Fick S.E. & Hijmans R.J. (2017) WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. International Journal of Climatology, 37, 4302–4315.

Filipe A.F., Quaglietta L., Ferreira M., Magalhães M.F., & Beja P. (2017) Geostatistical distribution modelling of two invasive crayfish across dendritic stream networks. Biological Invasions, 19, 1–14.

Hoef J.M. Ver, Peterson E.E., Clifford D., & Shah R. (2014) SSN: An R Package for Spatial Statistical Modeling on Stream Networks. Journal of Statistical Software, 56, 1–45.

Vogt J., Soille P., De Jager A., Rimavičiūtė E., Mehl W., Foisneau S., Bódis K., Dusart J., Paracchini M.L., Haastrup P., & Bamps C. (2007) A pan-European River and Catchment Database. 

Funding

Fundação para a Ciência e Tecnologia, Award: LTER/BIA-BEC/0004/2009,SFRH/BD/95202/2013,PTDC/AAG-MAA/2261/2014

EDP - Energias de Portugal