Temporal changes in taxon abundances are positively correlated but poorly predicted at the global scale

Lertzman-Lepofsky, Gavia 1 ; Dolezal, Aleksandra2; Waters, Mia3; Fuster-Calvo, Alexandre4; Black, Emily3; Flaman, Stephanie5; Straus, Samantha6; Langendorf, Ryan7; Eckert, Isaac8; Fan, Sophia3; Branch, Haley9; Chardon, Nathalie3; Collins, Courtney G. G.3

Published Nov 08, 2024 on Dryad. https://doi.org/10.5061/dryad.63xsj3vbc

Data files

Nov 08, 2024 version files 237.27 MB

Data_and_code_(3).zip

237.26 MB
README.md

5.44 KB

Abstract

Linking changes in taxon abundance to biotic and abiotic drivers over space and time is critical for understanding biodiversity responses to global change. Furthermore, deciphering temporal trends in relationships among taxa, including correlated abundance changes (e.g. synchrony), can facilitate predictions of future shifts. However, what drive these correlated changes over large scales are complex and understudied, impeding our ability to predict shifts in ecological communities. We use two global datasets containing abundance time-series (BioTIME) and biotic interactions (GloBI) to quantify correlations among yearly changes in the abundance of pairs of geographically proximal taxa (genus pairs). We use a hierarchical linear model and cross-validation to test the overall magnitude, direction, and predictive accuracy of correlated abundance changes among genera at the global scale. We then test how correlated abundance changes are influenced by latitude, biotic interactions, disturbance, and time-series length while accounting for differences among studies and taxonomic categories. We find that abundance changes between genus pairs are, on average, positively correlated over time, suggesting synchrony at the global scale. Furthermore, we find that abundance changes are more positively correlated with longer time-series, with known biotic interactions, and in disturbed habitats. However, the magnitude of these ecological drivers alone are relatively weak, with model predictive accuracy increasing approximately two-fold with the inclusion of study identity and taxonomic category. This suggests that while patterns in abundance correlations are shaped by ecological drivers at the global scale, these drivers have limited utility in forecasting changes in abundances among unknown taxa or in the context of future global change. Our study indicates that including taxonomy and known ecological drivers can improve predictions of biodiversity loss over large spatial and temporal scales, but also that idiosyncrasies of different studies continue to weaken our ability to make global predictions.

https://doi.org/10.5061/dryad.63xsj3vbc

Description of the data and file structure

This directory stores the scripts and data used in the analyses for "Temporal changes in taxon abundances are positively correlated but poorly predicted at the global scale". There are three sub directories, each with nested folders.

- output/ This is a folder with sub-directories that organizes the output of scripts--intermediate data steps, figures, and model objects. There are 3 subfolders: figures, models, and prep_data.

- raw biotime data contains the raw data downloaded from BioTime that are used in this analysis

- scripts/ This folder contains has sub directories that organize all the scripts used in this analysis, data curation, cleaning, modelling, and figure creation. This folder contains 3 subfolders: figures, models, and prep data

Files and variables

File: File_structure.zip

Description:

This directory stores the final scripts used in the data processing and analysis for the working group after manuscript revision and review. There are three sub directories, each with it's own readme with detailed information about what files are used for and what scripts are created.

output/ This is a folder with sub-directories that organizes the output of scripts--intermediate data steps and model objects

models contains the saved model outputs from running the lmer_models.R analysis:
- `lmer_model_final.Rdata` - this is the primary model output
- `lmer_model_kfold.Rdata` - cross-validation output with folds not accounting for studyID or taxaxonomic category
- `lmer_model_kfoldstudy.Rdata` - cross-validation output with folds for studyI
- `lmer_model_kfoldtax.Rdata` - cross-validation output with folds for taxaxonomic category
prep_data contains the intermediate data the arises from the data cleaning scripts and then gets fed into the modelling script
- model_data_final.Rdata - final and cleaned data used to fit the model
- results.abundance.csv- output from results.abundance.data.setup.R, called in setup_model data.R`. Contains log proportional pairwise changes in abundance across gener
- within.study.updated.interactions.020724ENB.csv - created in `pair_interactions_taxize_all.R` and called in same script, intermediate outpu
- results_abundance_interactions_taxa_032024ENB.RDS - created in pair_interactions_taxize_all.R, called in setup_model data.R. Contains genus pair log proportional changes and whether or not each genus per genera pair has interaction data from Globi. Final output of pair_interactions_taxize_all.
- worldclim.csv- output from worldclim summaries.R script, called in `setup_model data.R`. Contains mean annual temperature and precipitation values for each longitude and latitude coordinate in our cleaned BioTime data
- disturbance cleaning.csv - manually cleaned BioTime disturbance data down to the plot level, called in setup_model data.R. NAs in this file indicate absence of information for those cells.

raw biotime data empty folder to follow the licensing requirements of BioTime--please download the BioTime data directly from the BioTime website https://biotime.st-andrews.ac.uk/

scripts/ This folder contains has sub directories that organize all the scripts used in this analysis, data curation, cleaning, modelling, and figure creation

Code/software

The scripts folder contains the files used in the data processing and analysis for manuscript. There are three sub directories:

models contains 2 scripts used to run the statistical analyses
- `setup_model data` - This file collates all the intermediate data files from the data pre-processing and converts it into the form to fit the model (adds metadata, calculates pearson correlations and z-scores etc)
- `lmer_models` - This file script that runs our primary model, calculates the cross-validation, and runs the null model to verify our model results.
prep data/ This folder the contains scripts that take the raw data from BioTime and cleans it, calculate the log proportional changes, and the GloBi interactions
- results.abundance.data.setup.R - filters and cleans the data, calculates log proportional changes and richness for genera-pairs
- pair_interactions_taxize_all.R - adds the GloBi interactions and taxonomic class with manual corrections being performed.
- worldclim summaries.R - pulls the worldclim summaries for each plot in each study. Note that we did not end up using this data in analyses.
figures/
- `Manuscript_figures.R` which contains all the code for creating the main figures used in the manuscript, the inserts for figure 2, the supplementary figures, and for calculating the results.
- `generate_table_attributions.RMD` which tabulates and organizes each of the studies that we used for citation purposes. This is a supplementary table in our manuscript.

Access information

Publicly accessible locations of the data:

https://biotime.st-andrews.ac.uk/

Data was derived from the following sources:

https://www.globalbioticinteractions.org/