Life on the edge: A new toolbox for population-level climate change vulnerability assessments
Data files
Jun 28, 2023 version files 226.47 KB
-
Life_on_the_edge_pipeline.zip
220.16 KB
-
README.md
6.30 KB
Sep 07, 2024 version files 6.58 MB
-
Life_on_the_edge_pipeline.zip
6.57 MB
-
README.md
7 KB
Dec 23, 2024 version files 6.58 MB
-
Life_on_the_edge_pipeline.zip
6.57 MB
-
README.md
6.17 KB
Abstract
Global change is impacting biodiversity across all habitats on earth. New selection pressures from changing climatic conditions and other anthropogenic activities are creating heterogeneous ecological and evolutionary responses across many species’ geographic ranges. Yet we currently lack standardised and reproducible tools to effectively predict the resulting patterns in species vulnerability to declines or range changes.
We developed an informatic toolbox that integrates ecological, environmental and genomic data and analyses (environmental dissimilarity, species distribution models, landscape connectivity, neutral and adaptive genetic diversity, genotype-environment associations and genomic offset) to estimate population vulnerability. In our toolbox, functions and data structures are coded in a standardised way so that it is applicable to any species or geographic region where appropriate data are available, for example individual or population sampling and genomic datasets (e.g. RAD-seq, ddRAD-seq, whole genome sequencing data) representing environmental variation across the species geographic range.
To demonstrate multi-species applicability, we apply our toolbox to three georeferenced genomic datasets for co-occurring East African spiny reed frogs (Afrixalus fornasini, A. delicatus and A. sylvaticus) to predict their population vulnerability, as well as demonstrating that range loss projections based on adaptive variation can be accurately reproduced from a previous study using data for two European bat species (Myotis escalerai, and M. crypticus).
Our framework sets the stage for large scale, multi-species genomic datasets to be leveraged in a novel climate change vulnerability framework to quantify intraspecific differences in genetic diversity, local adaptation, range shifts and population vulnerability based on exposure, sensitivity, and landscape barriers.
Dataset contains input files needed to run Life on the edge for an example dataset (Afrixalus fornasini)
You may run data for your focal species following the structure and content of the example files provided
First you need to download the following and place in the correct directories to be sure the toolbox will function properly:
- Environmental predictor data (e.g. Worldclim2/CHELSA, land cover, see below)
- A working plink and maxent version (see below)
- Country border data (e.g. Natural Earth data, see below)
Tutorials for initial setup and running the toolbox
Full setup and how to run the LotE toolbox - https://cd-barratt.github.io/Life_on_the_edge.github.io/Vignette
Description of the data and file structure
- Params.tsv is a tab separated file that contains all parameters for running each species dataset. The parameters are already set to default values for the example dataset to replicate results in the manuscript
-data- directory:
- Please download a working maxent.jar executable (https://biodiversityinformatics.amnh.org/open_source/maxent/) as well as a working plink (https://www.cog-genomics.org/plink/) executable.
- These can be placed anywhere (we recommend within -data-), and the toolbox locates these with the ‘maxent_executable’ and ‘plink_executable’ parameters in Params.tsv
/genomic_data/
- Input genomic files are pre-processed and stored in the species directory in ‘./-data-/genomic_data’ (Plink formatted .map and .ped files)
- denovo_test_parameter_ranges.csv is for optimizing parameters for Stacks with new datasets (not necessary in the example as it is already processed)
/spatial_data/
- Input georeferenced coordinates for each sample are in a .csv file in the species directory in ‘./-data-/spatial_data’
/environmental_data/
- Within a folder named e.g. ’30s’ (the spatial resolution of the data), You should place environmental layers used for SDMs, GEAs etc in a ‘current’ and ‘future’ folder
- These folders can be specified exactly in the params.tsv file (‘current_climate_data_path’, ‘future_climate_data_path’) for the current and future conditions
- We generally use climate projections for all 19 bioclim variables (Worldclim2, https://www.worldclim.org/data/index.html) as well as landcover (Globio4) and slope (calculated from a DEM available with Worldclim2 data)
- You must select your future environmental projections (e.g. time period: 2061-2080, Global circulation model: HadGEM3-GC31-LL) conditions when downloading future data
- The data will later be clipped to the study region for your own analyses (extents can be controlled per species using the ‘geographic_extent’ parameter in params.tsv)
/map_data/
- Please download a world shapefile and unzip it to this directory (e.g. https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/cultural/ne_50m_admin_0_countries.zip)
-outputs- directory:
- An empty folder that will be populated with results when running Life on the edge. The log files will also be stored here.
Sharing/Access information
Links to other publicly accessible locations of the data:
Data was derived from the following sources:
- Raw sequence data is available at the European Nucleotide Archive (ENA): Myotis escalerai and M. crypticus (PRJEB29086), and the NCBI Short Read Archive (SRA): Afrixalus fornasini (SRP150605).
- Spatial occurrence data was derived from the respective publications https://doi.org/10.1111/mec.14862 and https://doi.org/10.1073/pnas.1820663116
Code/Software
HPC submission scripts are located in Life_on_the_edge_submit_scripts.zip:
- 00_setup_life_on_the_edge.sh - initial setup script for Life on the edge
- 01_run_life_on_the_edge.sh - a script that calls all relevqant scripts and functions to run the toolbox in its entirety
- -run_life_on_the_edge-.sh - a wrapper script whereby multi-species or different parameter sets for analyses may be conducted as separate jobs via HPC
Life on the edge scripts and R functions are located in Life_on_the_edge_pipeline_scripts_functions.zip
R_functions directory:
- life_on_the_edge_functions.R - contains all functions that make up the Life on the edge toolbox
-scripts- directory:
- -LFMM-.R - an R script to run LFMM analyses (recommended as some systems sometimes cause crashes when using the gea_lfmm() function in the life_on_the_edge_functions.R file)
- run_LOE_exposure.R - runs all functions to prepare data and quantify exposure for each population
- run_LOE_population_vulnerability.R - runs all functions to quantify population vulnerability for each population and create summary PDFs
- run_LOE_range_shift_potential.R - runs all functions to quantify range shift potential for each population
- run_LOE_sensitivity.R - runs all functions to quantify sensitivity for each population
processing_environmental_data directory:
00_process_environmental_data - a script to strip out multi-band environmental layers (i.e. Wordlclim2 data) and store them for use with Life on the edge
Version changes
23-Dec-2024: Changed the way R packages are installed into singularity container (based on a list of specific versions held in ‘all_R_packages_list.csv’) to avoid incorrect versions being used. Cleaned up local adaptation simulation routine in small parts and fixed minor errors found using new datasets
5-Sept-2024: Additional functionality in the R functions added to add latitude and longitude graticules to all maps produced, as well as updated params file to match the published article
Raw sequence data is available at the European Nucleotide Archive (ENA): Myotis escalerai and M. crypticus (PRJEB29086), and the NCBI Short Read Archive (SRA): Afrixalus fornasini – (SRP150605). Input data (processed genomic data and spatial-environmental data prior to running the toolbox) available as part of this repository.
Methods: see methods text of manuscript and tutorials: Setup and running the LotE toolbox - https://cd-barratt.github.io/Life_on_the_edge.github.io/Vignette
Full tutorials for setup and running the LotE toolbox - https://cd-barratt.github.io/Life_on_the_edge.github.io/Vignette
This software is intended for HPC use. Please make sure the software below is installed and functional in your HPC environment before proceeding:
- Life on the edge data and scripts (also available here: https://github.com/cd-barratt/Life_on_the_edge)
- Singularity (3.5) and bioconductor container with correct R version: https://cloud.sylabs.io/library/sinwood/bioconductor/bioconductor_3.14
- R (4.1.3). Dependencies for toolbox installed within R version in singularity container upon setup (you specify your R libraries in the script where annotated)
- Julia (1.7.2)
Additionally you need to download the following and place in the correct directories to be sure the toolbox will function properly:
* Environmental predictor data - please download and place environmental layers used for SDMs, GEAs etc in separate folders for current and future environmental conditions. These folders can be named/specified exactly in the params.tsv file ('current_climate_data_path', 'future_climate_data_path'). We generally use climate projections for all 19 bioclim variables [Worldclim2](https://www.worldclim.org/data/index.html) as well as landcover [Globio4](https://www.globio.info/globio-data-downloads]) and slope (calculated from a digital elevation model available with Worldclim2 data). For future conditions you must select a GCM and time period (e.g. time period: 2061-2080, Global circulation model: HadGEM3-GC31-LL). Each time you rin the toolbox for a given species the data will be clipped to the study region for your analyses (extents can be controlled per species using the 'geographic_extent' parameter in params.tsv)
* Plink and Maxent executables - please download a working executable for Maxent, [maxent.jar](https://biodiversityinformatics.amnh.org/open_source/maxent/) as well as a working [plink](https://www.cog-genomics.org/plink/) executable. These can be placed anywhere (we recommend within -data-), and the toolbox locates these with the 'maxent_executable' and 'plink_executable' parameters in Params.tsv
* Country border data - please download a world shapefile and unzip it to this directory (e.g. [Natural Earth country borders](https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/cultural/ne_50m_admin_0_countries.zip))
To use the toolbox, make a working directory in your HPC environment (e.g. ~/work/Life_on_the_edge_pipeline/) and move or copy the Singularity container there. Download the scripts and R functions from the attached Zenodo repository and have them all in the same file structure as your Life_on_the_edge working directory.
i.e. so your working directory should contain the following:
-data-
-scripts-
-outputs-
bioconductor_3.14.sif
Params.tsv
R_functions
To make things as clean as possible, you may wish to make a submit scripts directory (e.g. ~/submit_scripts/Life_on_the_edge_pipeline), and a data directory (e.g. ~/data/Life_on_the_edge_pipeline). I would suggest you unzip and move the contents of Life_on_the_edge_submit_scripts.zip to your submit scripts directory and use your data directory for storing raw (unprocessed) genomic data.
Before you begin there will be some setup needed. All scripts will need modification to point towards your work directory ($YOUR_WORKING_DIR), submit scripts directory ($YOUR_SUBMIT_SCRIPTS_DIR) and data ($YOUR_DATA_DIR) as well as your email address adding for job notifications ($YOUR_EMAIL). Paths to your own local R libraries, Singularity and Julia will differ from the example based on your own HPC setup, so these will need to be edited to match your own structure.
The same goes for the params.tsv file which controls the analyses, the paths in this will need to be modified ($YOUR_WORKING_DIR).