Data and code from: The roles of abiotic and biotic factors in driving range shifts: An invasive Pomacea snail facilitates Rostrhamus sociabilis (Snail Kite) northward range expansion
Data files
May 22, 2026 version files 152.41 KB
-
Data.zip
93.55 KB
-
README.md
20.55 KB
-
Results.zip
38.32 KB
Abstract
Rostrhamus sociabilis (Snail Kite) has recently expanded its range in Florida, tracking the invasion of a Pomacea snail (P. maculata), and exhibiting considerable changes in bill size and feeding niche. This range expansion is not aligned with changes in climatic conditions or the distribution of their historic prey (P. paludosa). The Eltonian Noise Hypothesis (ENH), which posits that interactive (biotic) factors have stronger effects on species’ distributions at local scales, predicts that noninteractive (abiotic) factors are generally more relevant at geographic extents. However, in this study, we explore the R. sociabilis range shift as a potential counterexample of the ENH. Under the biotic-abiotic-mobility framework (BAM), we explore the role of biotic and abiotic factors in the northward range expansion of this endangered species. Over the past 15 years, R. sociabilis has begun consuming the more-abundant invasive snails more often, while increasing in bill size, expanding ~175 km northward from previous range limits in the Kissimmee River Valley. We developed ecological niche models using 3 algorithms (Maxent, generalized linear model, and ellipsoids) and found stability in climatic suitability between past and present models. Moreover, although native snails occur in northern Florida, R. sociabilis have had a historically patchy northern distribution due in part to the availability of appropriate wetland conditions. We found a strong latitudinal cline, with bill length increasing with latitude at least through 2020, suggesting that this morphological change broadened the species’ biotic suitable area and distributional potential. The interplay between changes in phenotype and biotic interactions has been poorly documented in distributional ecology, given a lack of rich occurrence datasets. Here, we highlight a case in which a biological invasion and subsequent changes in morphology and diet have facilitated the expansion of a specialized predator into areas that were unsuitable until recently.
https://doi.org/10.1093/ornithology/ukae022
Name: Fernando Machado-Stredel
Institution (at the time of submission): University of Kansas
Institution: University of New Mexico
Email: f.machado.stredel@gmail.com
Alternate email: fmachadostredel@unm.edu
Dataset Overview
The dataset contains data and code to replicate analyses in Machado-Stredel et al. (2024), showcasing the range expansion of Snail Kites (Rostrhamus sociabilis) into northern Florida, facilitated by the biological invasion of a new prey (Pomacea maculata) and subsequent changes in morphology and diet of these specialized raptors. Data cover clean occurrences to perform ecological niche models (ENM) in two periods, using average climatic variables derived from the CHELSA database (Karger et al. 2017, Karger and Zimmermann 2018), to assess differences in climatic suitability before and during the range shift of Snail Kites in Florida. Data from 4,014 Snail Kite fledglings at 2,057 nests between 2002 and 2020 were used to perform morphological change analyses of bill length. Additional eBird occurrences were used to perform logistic regressions to model the occurrence probability of these raptors in four latitudinal bands of Florida. Logistic models suggested that although the occurrence probability has increased in all bands across decades, the rate of increase seems to be higher in the northernmost band, which corroborates field-based observations that the species has expanded in Florida. The calibration area to train ENM algorithms was generated in the R package grinnell (Machado-Stredel et al. 2021). We used 117 clean occurrences drawn from the Global Biodiversity Information Facility (GBIF) to run models in the past (1920-1980) and project the best model to more recent climates (2010-2016) using three algorithms: ellipsoids that take the environmental background into account (Jiménez and Soberón 2022), generalized linear models (Guisan et al. 2002), and Maxent in the R package kuenm (Phillips et al. 2017, Cobos et al. 2019). The best models were selected based on partial ROC (Peterson et al. 2008), predictive ability (omission rates at an E = 5%; Anderson et al. 2003), and the Akaike Information Criterion corrected for small sample sizes (AICc; Warren and Seifert 2011). Via ecological niche modeling, we found stability in climatic suitability between past and present models, highlighting that the range expansion has likely been caused by other factors. Further, we found a strong latitudinal cline, with bill length increasing with latitude at least through 2020, suggesting that this morphological change broadened the species’ biotic suitable area, allowing for a shift in diet (i.e., a novel biotic interaction). In that sense, changes in morphology and diet have facilitated the expansion of these specialized predators into northern Florida.
Dates of Data Collection
Collected by the authors:
A. Morphometric measurements: 2002-2020
From repositories:
A. Snail Kite GBIF occurrences: 1920–1980
B. Climatic variables (CHELSA): 1920-1980 & 2010-2016
C. eBird occurrences: 1970-2020
Data Spatial Scope
Snail Kite GBIF occurrences (GBIF 2021) were obtained for its whole geographic distribution, from the United States to Argentina, over the period 1920–1980. Occurrences were spatially thinned with a 50 km filter, resulting in 117 occurrences (snail_kite_clean_occurrences.csv, see below), which were used along with bioclimatic variables to perform ENMs. Global CHELSA bioclimatic variables of temperature and precipitation (30 arcsec resolution) were processed to generate average layers at a broader spatial resolution (10 arcmin), for two periods (1920-1980 [past] and 2010-2016 [present]), and to a narrower extent representative of the Americas (extent: lon: -33.0001, -177.0001; lat: -45.0001, 44.9999). We summarized these variables through a principal component analysis, and used the first three principal components to create four sets of ENM predictors (all combinations of two and three components). The calibration area used to obtain background environments in PC space (M_Rsociabilis_03Jun21.shp) has the following extent: lon: -38.3335, -105.8335; lat: -34.6668, 38.9999. Additional Snail Kite occurrences in Florida were drawn from the eBird database using the auk package (Strimas-Mackey et al. 2023), along with eBird checklists from the state of Florida (for any species of bird) to model occurrence probability. Morphometric data include fledglings measured in Florida breeding areas between 2002 and 2020 (extent: lon: -80.1527, -82.2537; lat: 25.3666, 29.5858).
Funding
Funding was provided to Robert J. Fletcher Jr. by the US Army Corps of Engineers (W912HZ-20-2-0033), South Florida Water Management District (9500008729), Florida Fish and Wildlife Conservation Commission (13416), and the St John’s River Water Management District (34941). Luis Osorio-Olvera acknowledges PAPIIT-IA202824 and CONAHCyT-Ciencia de Frontera Project CF-2023-I-115 for providing financial support for developing the time-specific bioclimatic layers.
Ethics Approval
Research on fledglings was conducted in conformance with applicable laws, including IACUC number 202008334.
Sharing/Access
This work is licensed under a CC0 1.0 Universal (CC0 1.0) Public Domain Dedication license. Certain datasets are under different licenses, and we provide links for their access (see Data Sources).
Data Sources
Occurrences used for ENMs were obtained from the Global Biodiversity Information Facility (GBIF 2021). Climatic variables were downloaded from CHELSA (see 00_download_data_chelsa.R, envidatS3paths.txt; visit https://www.chelsa-climate.org/). Processed bioclimatic variables (10 arcmin resolution; extent: lon: -33.0001, -177.0001; lat: -45.0001, 44.9999; see M_SnailKite.R) are available upon request to the corresponding author. Checklists from eBird, until May 2020, were obtained from the eBird Basic Dataset (EBD; ebd_relMay-2020) via the auk R package. Access to the EBD needs to be requested, and after approval, a text file can be downloaded and read in R (see get_auk_occ.R). See details on requesting access and downloading in: https://ropensci.org/blog/2018/08/07/auk/
Recommended Citation
Machado-Stredel, F., Atauchi, P. J., Núñez-Penichet, C., Cobos, M. E., Osorio-Olvera, L., Khalighifar, A., Peterson, A. T., & Fletcher, R. J. (2024). The roles of abiotic and biotic factors in driving range shifts: An invasive Pomacea snail facilitates Rostrhamus sociabilis (Snail Kite) northward range expansion. Ornithology, 141(3). https://doi.org/10.1093/ornithology/ukae022
References (in this ReadMe)
Anderson, R. P., D. Lew, and A. T. Peterson (2003). Evaluating predictive models of species’ distributions: Criteria for selecting optimal models. Ecological Modelling 162:211–232.
Cobos, M. E., A. T. Peterson, N. Barve, and L. Osorio-Olvera (2019). kuenm: an R package for detailed development of ecological niche models using Maxent. PeerJ 7:e6281.
GBIF (2021). Global Biodiversity Information Facility (GBIF) Occurrence Download. https://doi.org/10.15468/dl.hscfwp
Guisan, A., T. C. Edwards, and T. Hastie (2002). Generalized linear and generalized additive models in studies of species distributions: Setting the scene. Ecological Modelling 157:89–100.
Jiménez, L., and J. Soberón (2022). Estimating the fundamental niche: Accounting for the uneven availability of existing climates in the calibration area. Ecological Modelling 464:109823.
Karger, D. N., and N. E. Zimmermann (2018). CHELSAcruts – High resolution temperature and precipitation timeseries for the 20th century and beyond. EnviDat. https://doi.org/10.16904/envidat.159
Karger, D. N., O. Conrad, J. Böhner, T. Kawohl, H. Kreft, R. W. Soria-Auza, N. E. Zimmermann, H. P. Linder, and M. Kessler (2017). Climatologies at high resolution for the Earth’s land surface areas. Scientific Data 4:170122.
Machado-Stredel, F., M. E. Cobos, and A. T. Peterson (2021). A simulation-based method for selecting calibration areas for ecological niche models and species distribution models. Frontiers of Biogeography 13:4.
Peterson, A. T., M. Papeş, and J. Soberón (2008). Rethinking receiver operating characteristic analysis applications in ecological niche modeling. Ecological Modelling 213:63–72.
Phillips, S. J., R. P. Anderson, M. Dudík, R. E. Schapire, and M. E. Blair (2017). Opening the black box: An open-source release of Maxent. Ecography 40:887–893.
R Core Team (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Strimas-Mackey, M., E. Miller, and W. Hochachka (2023). auk: eBird Data Extraction and Processing with AWK. R package version 0.7.0. https://cornelllabofornithology.github.io/auk/
Warren, D. L., and S. N. Seifert (2011). Ecological niche modeling in Maxent: The importance of model complexity and the performance of model selection criteria. Ecological Applications 21:335–342.
Files and Folders
Description and structure of files
Files in this repository are located in two (compressed) directories: "Data.zip" or "Results.zip". Most files consist of R scripts to obtain and process variables from the CHELSA repository (CC BY license; see Data/Raw), as well as to generate the niche modeling calibration area and run three different ENM algorithms (see Results/ENM). Clean GBIF occurrences used in ENM are found in Data/Processed/snail_kite_clean_occurrences.csv (raw occurrences can be downloaded from GBIF 2021). Additional eBird occurrences can be obtained using the R script in Data/Raw/eBird/get_auk_occ.R. R scripts for other analyses can be found in the Results directory (i.e., logistic regression, morphometrics), which use input files found in Data/Processed.
File Details
Spatial shapefiles are collections of .cpg, .dbf, .prj, .qmd, .shp, and .shx files.
Data in tables is presented as .csv files.
Raster files of climatic variables can be downloaded as *.tif files and transformed to .asc files in subsequent ENM steps.
Scripts are written in the R Environment for Statistical Computing (v 4.1.0).
Data
After decompressing "Data.zip": Files in the "Raw" subdirectory allow obtaining the data, and files in the "Processed" subdirectory are used as input files by scripts in the subdirectories of "Results.zip".
Data/Raw/eBird/get_auk_occ.R
R script to read the eBird Basic Dataset with the auk R package and obtain eBird checklists for Snail Kites and all bird species in Florida.
R dependencies:
- auk
Data/Raw/scripts_get_BioclimaticVars/envidatS3paths.txt
A list of EnviCloud paths that was used to download CHELSA bioclimatic variables via the 00_download_data_chelsa.R script in the scripts_create_bios directory.
Data/Raw/scripts_get_BioclimaticVars/scripts_create_bios/00_download_data_chelsa.R
R script used to download monthly time series of temperature (min and max) and precipitation from CHELSA (.tif files), during the period 1901-2016. Layers are stored in new directories (prec, tmin, and tmax).
R dependencies:
- raster
- stringr
- furrr
Data/Raw/scripts_get_BioclimaticVars/scripts_create_bios/01_nona_ids.R, nona_ids_for_prc_and_tmp.R
R scripts that identify pixels with values (non_NAs) across .tif files of temperature and precipitation.
R dependencies:
- raster
- magrittr
- furrr
- purrr
- rio
Data/Raw/scripts_get_BioclimaticVars/01_raw_vars_per_year.RAn
R script that organizes climatic data in directories per year.
R dependencies:
- raster
- stringr
- furrr
- matrixStats
- purrr
Data/Raw/scripts_get_BioclimaticVars/02_create_prec_avg_1920_1980.R, 02_create_prec_avg_2010_2016.R
R scripts to create precipitation averages for the periods 1920-1980 and 2010-2016.
R dependencies:
- raster
- stringr
- furrr
- matrixStats
- purrr
- rio
Data/Raw/scripts_get_BioclimaticVars/02_create_tmax_avg_1920_1980.R, 02_create_tmax_avg_2010_2016.R
R scripts to create maximum temperature averages for the periods 1920-1980 and 2010-2016.
R dependencies:
- raster
- stringr
- furrr
- matrixStats
- purrr
- rio
Data/Raw/scripts_get_BioclimaticVars/02_create_tmin_layers_1920_1980.R, 02_create_tmin_avg_2010_2016.R
R scripts to create minimum temperature averages for the periods 1920-1980 and 2010-2016.
R dependencies:
- raster
- stringr
- furrr
- matrixStats
- purrr
- rio
Data/Raw/scripts_get_BioclimaticVars/02_bioclimaticvars_dirs.R, 03_bioclimaticvars_prec.R
R scripts to organize and generate bioclimatic variables of precipitation (bio12-bio17; see variable details in Machado-Stredel et al. 2024 Table S1).
R dependencies:
- raster
- stringr
- furrr
- magrittr
- matrixStats
- purrr
- rio
- terra
- Rfast
Data/Raw/scripts_get_BioclimaticVars/03_biovars_prec_1920_1980.R, 03_biovars_prec_2010_2016.R
R scripts to generate average bioclimatic variables of precipitation (bio12-bio17; see variable details in Machado-Stredel et al. 2024 Table S1) for the periods 1920-1980 and 2010-2016.
R dependencies:
- raster
- furrr
- magrittr
- matrixStats
- purrr
- rio
- Rfast
Data/Raw/scripts_get_BioclimaticVars/03_bioclimaticvars_temp.R
R script to organize and generate bioclimatic variables of temperature (bio1-bio7, bio10, bio11; see variable details in Machado-Stredel et al. 2024 Table S1).
R dependencies:
- raster
- stringr
- furrr
- magrittr
- matrixStats
- purrr
- rio
- terra
- Rfast
Data/Raw/scripts_get_BioclimaticVars/03_biovars_temp_1920_1980.R, 03_biovars_temp_2010_2016.R
R scripts to generate average bioclimatic variables of temperature (bio1-bio7, bio10, bio11; see variable details in Machado-Stredel et al. 2024 Table S1) for the periods 1920-1980 and 2010-2016.
R dependencies:
- raster
- furrr
- magrittr
- matrixStats
- purrr
- rio
- Rfast
Data/Raw/scripts_get_BioclimaticVars/04_biovars_cuartos.R, 04_biovars_cuartos_1920_1980.R, 04_biovars_cuartos_2010_2016.R
R scripts to generate average bioclimatic variables of temperature (bio8, bio9) and precipitation (bio18, bio19) for the periods 1920-1980 and 2010-2016. These variables are: mean daily mean air temperatures of the wettest quarter (bio8), mean daily mean air temperatures of the driest quarter (bio9), mean monthly precipitation amount of the warmest quarter (bio18), and mean monthly precipitation amount of the coldest quarter (bio19).
R dependencies:
- raster
- furrr
- magrittr
- matrixStats
- purrr
- rio
- stringr
- terra
- dplyr
Data/Raw/scripts_get_BioclimaticVars/05_bioclim_todo_america.R (optional)
R script to crop the average .tif files for the periods 1920-1980 and 2010-2016 to the Western Hemisphere.
R dependencies:
- raster
- furrr
- magrittr
- terra
- stringr
- purrr
Data/Processed/snail_kite_clean_occurrences.csv
This file has 117 clean occurrences across Snail Kites geographic distribution (geographic coordinates in decimal lat/lon) used in ENMs.
Number of variables: 3
Number of rows: 117
Variable list:
- Species: species common name
- Longitude: x coordinates in decimal format
- Latitude: y coordinates in decimal format
Data type: character, numeric. No missing data
Data/Processed/logistic_regression/SK_data.csv
This file is needed to perform logistic regressions to model Snail Kite occurrence probability. It has several hexagons (10 km spatial resolution) in Florida that contain eBird checklists reporting any bird species ("ebird") and only Snail Kites ("sk"), sorted in 4 latitudinal bands and 5 decades.
Number of variables: 4
Number of rows: 20
Variable list:
- band: latitudinal bands (1-4 from north to south; see Fig. 2)
- decade: values every ten years from 1970 to 2010
- ebird: number of hexagons having eBird checklists with any bird species in Florida
- sk: number of hexagons having eBird checklists with Snail Kites in Florida
Data type: character, numeric. No missing data
Data/Processed/morphometrics/Machado_Stredel_Ornithology_Data.csv
This file contains the bill length measurements (mm) of 4,014 Snail Kite fledglings along with age (days), nest ID, coordinates (17N UTM), and year of birth. It is used to test for bill length changes over time and latitude.
Number of variables: 8
Number of rows: 4,014
Variable list:
- ID: fledgling identification number
- Nest_ID: nest identification number
- YOB: year of birth
- X: longitude coordinates in UTM
- Y: latitude coordinates in UTM
- age_days: fledgling age
- bill: length in mm
- weight: mass in grams
Data type: character, numeric. No missing data
Results
After decompressing "Results.zip", one can find files in three subdirectories named by analysis: "logistic_regression", "morphometrics", and "ENM".
Results/logistic_regression/logit_reg_transformed_dat_SK_18Dec2023.R, SK_LogReg_summaries.txt
R script to perform four logistic regressions (one per latitudinal band) to model Snail Kite occurrence probability. It uses the input Data/Processed/logistic_regression/SK_data.csv, and generates a model summary file (SK_LogReg_summaries.txt).
R dependencies:
- dplyr
Results/morphometrics/kite_bill_ornithology.R
R script to run a linear mixed model for year and latitude effects on Snail Kite bill size. This script uses the input file Data/Processed/morphometrics/Machado_Stredel_Ornithology_Data.csv
R dependencies:
- MuMIn
- glmmTMB
- ggplot2
- tidyverse
Results/ENM/calibration_area/M_SnailKite.R, M_Rsociabilis_03Jun21.shp
R script to simulate the accessible area (M) that is used to calibrate ecological niche models. From this script, six directories can be generated, containing the shapefile of each parameterization of M (see Machado-Stredel et al. 2024 Table S2), as well as associated files (see grinnell documentation in https://github.com/fmachados/grinnell
). The selected shapefile was renamed as "M_Rsociabilis_03Jun21.*" (i.e., a set of .cpg, .dbf, .prj, .qpj, .shp, and .shx files). Using this script, one can mask bioclimatic rasters with the selected calibration area.
R dependencies:
- devtools
- raster
- rgdal
- rgeos
- scales
- grinnell
Note: the function M_simulation of the grinnell package used to run in Python (but see the function M_simulationR for an updated option using only R).
Results/ENM/PCA_SK.R
A script to perform a principal component analysis (PCA) on the bioclimatic layers processed in M_SnailKite.R using the package ntbox.
R dependencies:
- ntbox
- sf
- raster
Results/ENM/scripts_maxent/kuenm.R
R script to calibrate and evaluate Maxent models using the kuenm R package, which splits the occurrence data into a training set (70% of points) and a testing set (30% of points), and tests 4 different sets of principal components as predictive variables (PC1 & PC2, PC1 & PC3, PC2 & PC3, PC1-PC3).
It is recommended to run this script before other ENM algorithms (i.e., ellipsoids, GLMs), in order to split the occurrences in training and testing sets.
R dependencies:
- devtools
- kuenm
- raster
- terra
Results/ENM/scripts_to_fit_ellipsoids/final_model_PC01_PC02_19August2021.R, functions_lau.R
R scripts to model niche ellipsoids via maximum likelihood, using a weighted distribution of environmental conditions relative to the available conditions within the calibration area (functions are sourced from the functions_lau.R script).
R dependencies:
- dplyr
- devtools
- overllip
- raster
- sf
- rgdal
- rgeos
- tools
- matrixcalc
- future
- ntbox
- overllip
- devtools
- ellipse
- furrr
Results/ENM/scripts_glm/Snail_kite_glm_enm.R, 00_Functions.R
R scripts to model ecological niches based on generalized linear models (functions are sourced from the 00_Functions.R script).
R dependencies:
- kuenm
- scales
- raster
- viridis
- biosurvey
GBIF occurrences were downloaded, cleaned, and spatially filtered (50 km threshold) across the Western Hemisphere over the period 1920-1980 to implement three ecological niche modeling (ENM) algorithms (117 occurrences). CHELSA bioclimatic variables were generated with a broader spatial resolution (10’) for periods 1920-1980 and 2010-2016 separately, and summarized through principal component analysis. We used the first three principal components to create four sets of predictors. We generated a calibration area in the R package grinnell using clean occurrences and the 1920-1980 bioclimatic variables via simulations of accessible areas (Machado-Stredel et al. 2021). We calibrated our models in the past (1920-1980) and projected the best model to current climates (2010-2016) using three algorithms: ellipsoids that take the environmental background into account (Jiménez and Soberón 2022), generalized linear models (Guisan et al. 2002), and Maxent in kuenm (Phillips et al. 2017, Cobos et al. 2019). The best models were selected based on partial ROC (Peterson et al. 2008), predictive ability (omission rates at an E = 5%; Anderson et al. 2003), and the Akaike Information Criterion corrected for small sample sizes (AICc; Warren and Seifert 2011). We executed all modeling steps on the R platform (R Core Team 2023).
eBird occurrences of Snail Kites in the US were downloaded with the R package auk, cleaned, and standardized to count the number of hexagons (10 km spatial resolution) that were occupied by the species in 4 latitudinal bands in Florida. We did logistic regressions per latitudinal band to model occurrence probability in R.
Morphological change analyses were made in R using data from 4,014 fledglings at 2,057 nests between 2002 and 2020 with the package glmmTMB (https://github.com/glmmTMB/glmmTMB).
