Data from: Different strokes for different croaks: Using an African reed frog species complex as a model to understand idiosyncratic population requirements for conservation management

Data files

Nov 10, 2025 version files 2.73 MB

data.zip

2.72 MB
README.md

9.64 KB

Abstract

Biodiversity is under increasing pressure from environmental change, although the scope and severity of these impacts remain incompletely understood. For many species, a lack of information about population‐specific responses to future environmental change hinders the development of effective conservation strategies. Here, we use an East African reed frog species complex as a model to explore spatial variation in vulnerability to future environmental changes. Our sampling across two threatened biodiversity hotspots spans the entire geographic range of H. mitchelli and H. rubrovermiculatus in Kenya, Tanzania, and Malawi. Using genome‐wide (ddRAD‐seq) data, we evaluate levels of neutral genetic diversity and local adaptations across sampling localities. We then integrate spatial approaches (genomic offset, modeled dispersal barriers, and Species Distribution Models) to predict how populations may respond differently to future environmental changes, such as climate warming and predicted land use changes. Based on our analyses, we characterize population structure and identify region‐specific management needs that reflect genetic variation among populations and the uneven impacts of predicted change across the landscape. Peripheral populations are most vulnerable to future environmental changes due to (i) low levels of neutral genetic diversity (Malawi and Pare mountains in Tanzania), (ii) putative signals of local adaptation to wetter conditions with predicted disruptions to genotype–environment associations (i.e., high genomic offset, Kenya and Northern Tanzania), and (iii) the projected contraction of suitable habitat, which is a pervasive threat to the species complex in general. Populations in Northern, Central, and Southern Tanzania show the lowest vulnerability to environmental change and may serve as important reservoirs of genetic diversity for potential future genetic rescue initiatives. Our study highlights how populations across different parts of species ranges may be unevenly affected by future global changes and provides a framework to predict which conservation actions may help mitigate these effects.

Dataset DOI: 10.5061/dryad.h1893200t

Description of the data and file structure

This DRYAD repository contains the raw genomic, environmental, and spatial data needed to recreate all analyses in the manuscript

Files and variables

File: data.zip

Description: When unzipped there are three subdirectories: environmental_spatial_data (containing all shapefiles, raster files and information within csv files for each sample), genomic_data (containing our H. mitchelli complex SNP datasets in various formats for downstream analyses) and pop_maps (text files dividing each of the individuals per locality or population cluster).

Details:

environmental_spatial_data contains:

Hyperolius_mitchelli.cpg, Hyperolius_mitchelli.dbf, Hyperolius_mitchelli.prj, Hyperolius_mitchelli.qmd, Hyperolius_mitchelli.shx, Hyperolius_mitchelli.shp are needed to load the species' distributional range in a GIS (or R)
ne_50m_admin_0_countries_lakes.cpg, ne_50m_admin_0_countries_lakes.dbf, ne_50m_admin_0_countries_lakes.prj, ne_50m_admin_0_countries_lakes.shx, ne_50m_admin_0_countries_lakes.shp are needed to plot country boundaries in a GIS (or R)
predictor_sample_information.csv and predictor_sample_information_GEA.csv contain all extracted predictor based on individual samples geographic coordinates. Detailed sample locations do not pose a risk to the species. Columns are as follows: Sample - name of sample, Cluster - genetic cluster which the sample belongs to, Pop or Site_name - sampled locality name, LAT - decimal latitude, LONG - decimal longitude, bio_1-bio_19 are the 19 bioclim variables from Worldclim2 (bio_1, bio_2, bio_5-bio_11 in degrees celsius, bio_3 bio_4 are ratios, bio_11-bio_19 measure precipitation in millimetres):

BIO1 = Annual Mean Temperature

BIO2 = Mean Diurnal Range (Mean of monthly (max temp - min temp))

BIO3 = Isothermality (BIO2/BIO7) (×100)

BIO4 = Temperature Seasonality (standard deviation ×100)

BIO5 = Max Temperature of Warmest Month

BIO6 = Min Temperature of Coldest Month

BIO7 = Temperature Annual Range (BIO5-BIO6)

BIO8 = Mean Temperature of Wettest Quarter

BIO9 = Mean Temperature of Driest Quarter

BIO10 = Mean Temperature of Warmest Quarter

BIO11 = Mean Temperature of Coldest Quarter

BIO12 = Annual Precipitation

BIO13 = Precipitation of Wettest Month

BIO14 = Precipitation of Driest Month

BIO15 = Precipitation Seasonality (Coefficient of Variation)

BIO16 = Precipitation of Wettest Quarter

BIO17 = Precipitation of Driest Quarter

BIO18 = Precipitation of Warmest Quarter

BIO19 = Precipitation of Coldest Quarter
Samples_ddRAD-seq.csv contains sample names, populations and their decimal geographic coordinates. Columns are as follows: Sample - name of sample, Pop - sampled locality name, LAT - decimal latitude, LONG - decimal longitude)
A directory named 'clipped' with subdirectories 'current' and 'future' - these should have the 19 bioclim variables (tif raster format) for plotting and extracting environmental data from - we cannot include them here due to the creative commons CCO licensing but you can download them here: WorldClim).

genomic_data contains:

Hyperolius_mitchelli.bed is a PLINK format binary genotype file with all genotypes and individuals
Hyperolius_mitchelli.bim is a PLINK format variant information file with all genotypes and individuals
Hyperolius_mitchelli.fam is a PLINK format sample information file with all genotypes and individuals
Hyperolius_mitchelli.nosex is a PLINK format missing sex information report file
Hyperolius_mitchelli.log is a PLINK format run log file
Hyperolius_mitchelli.map is a PLINK format variant map file with all genotypes and individuals
Hyperolius_mitchelli.ped is a PLINK format pedigree genotype file with all genotypes and individuals
Hyperolius_mitchelli.raw is a PLINK format numeric genotype file with all genotypes and individuals
Hyperolius_mitchelli.vcf is a variant call format file with all genotypes and individuals
H_mitchelli.gen.txt is a genepop format file with all genotypes and individuals
H_mitchelli.gen_pops.txt is a genepop format file with all genotypes and individuals (separated by population)

popmaps contains:

popmap_Central_Tanzania, popmap_Hondo_hondo, popmap_Kenya, popmap_Kibasila, popmap_Kihansi, popmap_Kimboza, popmap_Kivumoni, popmap_Kiwengoma, popmap_Mabayani, popmap_Makangaga, popmap_Makangala, popmap_Malawi, popmap_Mngeta, popmap_Mukurmudzi, popmap_Namatimbili, popmap_Nguru, popmap_Nguu, popmap_Northern_Tanzania, popmap_Noto, popmap_Pare, popmap_Segoma, popmap_Sheldricks_falls, popmap_Shimba_lodge, popmap_Southern_Tanzania, popmap_Tanzania, Hyperolius_mitchelli_pops_file.txt, Hyperolius_mitchelli_pops_tz_clusters_file.txt, Hyperolius_mitchelli_subpops_1_file.txt Hyperolius_mitchelli_subpops_2_file.txt Hyperolius_mitchelli_subpops_3_file.txt - all popmap format files for assigning - all popmap format files used by Stacks2 to generate specific population level info for effective population size calculations. All popmaps are two columns, with the name of the relevant sample followed by the population name used to group the samples

File: scripts.zip

Description: Scripts (bash and R) for running each separate analyses based on the input files. All script(s) are located within a relevant folder (Admixture, AMOVA, EEMS, fastStructure, FST, Genetic diversity, Genomic offset, Ne, PCA, sNMF and RDA, Stacks parameter optimisation)

Details:

Admixture contains: plot_admixture.R to plot results of admixture analysis in R
AMOVA contains: run_AMOVA.R to run AMOVA analyses in R
EEMS contains: 1_RUNEEMS.sh to run Estimated Effective Migration Surfaces, and 2_PLOT_EEMS.R to plot the resulting outputs in R
fastStructure contains: fastStructure.sh to generate results of fastStructure analysis
FST contains: FST_diveRsity.R to calculate Weir and Cockerham's FST in the R package diveRsity
Genetic_diversity contains: plot_genetic_diversity.R to plot the results of the Stacks2 outputs for genetic diversity
Genomic_offset contains: local_offset_gradientForest.R to calculate genomic offsets, plot_genomic_offsets.R to plot genomic offsets and plot_dotplots.R to plot the dotplots of genomic offsets acros various populations and scenarios
Ne contains: 01_stacks_populations.sh to generate the output files needed to calculate Ne, 02_easySFS.sh to generate site frequency spectra per population group, 03_momi2.sh to generate Ne estimates
PCA contains: PCA.sh to generate eigenvector and eigenvalues files based on SNPs in PLINK, Plot_PCA.R to plot the resulting PCA
SDMs contains: prepare_environmental_data.R to structure and prepare spatial and climate data prior to running species distribution models. -run_SDMs-.sh to run the SDMs in bash, plotSDMs.R to plot the resulting SDM outputs, README_run_SDMs.R to describe how to run the SDMs in bash (parallelised R code run in a bash shell usin the Life on the edge framework - https://cd-barratt.github.io/Life_on_the_edge.github.io/)
sNMF_and_RDA contains: check_collinearity_extract_predictor_data.R - to extract environmental data from georeferenced coordinates and then check collinearity for environmental data related to samples (to ensure that it is not problematic for Genotype Environment Association analyses), 01_sNMF_pop_structure_impute_missing_data.R to impute missing genotype data and make a short population structure analysis using sNMF, 02_RDA.R to perform the RDA GEA, 03_RDA_candidate_loci_categorising_indvs.R to categorise individual and populations local adaptation to environment based on the RDA results, 04_adaptive diversity.R to plot these local adaptations in geographic space
Stacks_parameter_optimisation contains: Stacks_01a_denovo_map_test_parameters.sh - to run a range of test parameters for denovo_map.pl in stacks so that different combinations of parameters can be tested, Stacks_01b_extract_results.sh - to collect all results from the previous script, Stacks_01c_denovo_map_full.sh to run the optimised Stacks2 parameters for all downstream analyses used in the manuscript. Plotting functions can be found in the associated DRYAD repository for this article: https://www.nature.com/articles/s41437-024-00710-4

Code/software

Admixture- Admixture (1.3.0)

AMOVA - Adegenet (2.1.11), poppr (2.98), pegas (1.3)

EEMS - EEMs (0.0.0.9000)

fastStructure - fastStructure (1.0)

FST - diveRsity (1.9.90)

Genetic diversity - Stacks (2.62)

Genomic offset - gradientForest (0.1-37)

Ne - momi2 (2.1.19)

PCA - plink (1.90b.38)

SDMs - biomod2 (4.2-6-2)

sNMF and RDA - LEA (2.0.0), RDA (2.8-0)

Stacks parameter optimisation - Stacks (2.62)

Miscellaneous R packages for data wrangling and tidying - raster (3.6-32), terra (1.8-60), tidyverse (2.0.0), plyr (1.8.9), dplyr (1.1.4), tibble (3.3.0), tmap (4.2) , stringr (1.5.2), ggplot2 (4.0.0)

Access information

Other publicly accessible locations of the data:

raw genomic data is archived in the European Nucleotide Archive (accession PRJEB97160, https://www.ebi.ac.uk/ena/browser/view/PRJEB97160).