Supporting code and data to reproduce analysis for: Genomic signatures of past megafrugivore-mediated dispersal in Malagasy palms
Data files
Apr 29, 2024 version files 203.27 MB
-
data_genomic_signatures_megafrugivore_dispersal_malagasy_palms.zip
203.26 MB
-
README.md
12.98 KB
Abstract
Seed dispersal affects gene flow and hence genetic differentiation of plant populations. During the Late Quaternary, most fruit-eating and seed-dispersing megafauna went extinct, but whether these animals have left signatures in the population genetics of their food plants, particularly those with large, ‘megafaunal’ fruits (i.e., > 4 cm – megafruits), remains unclear.
Here, we assessed the population history, genetic differentiation, and recent migration among populations of four animal-dispersed palm (Arecaceae) species with large (Borassus madagascariensis), medium-sized (Hyphaene coriacea, Bismarckia nobilis), and small (Chrysalidocarpus madagascariensis) fruits on Madagascar. We integrated double-digest restriction-site-associated DNA sequencing (ddRAD) of 167 individuals from 25 populations with (past) distribution ranges for extinct and extant seed-dispersing animals (e.g., giant lemurs, elephant birds), landscape and human impact data, and applied linear mixed-effects models to explore the drivers of genetic variation in Malagasy palms.
Palm populations that shared more megafrugivore species in the past had lower genetic differentiation than populations that shared fewer megafrugivore species. This suggests that megafrugivore-mediated seed dispersal in the past may have led to frequent gene flow among populations. In comparison, extant frugivore diversity only decreased genetic differentiation in the small-fruited palm. Furthermore, genetic differentiation decreased with landscape connectivity (i.e., environmental suitability, forest cover and river density), and human impact (i.e., road density) has decreased genetic differentiation among populations.
Synthesis: Our results suggest that the legacy of megafrugivores regularly achieving long dispersal distances is still reflected in the population genetics of palms that were formerly dispersed by such animals. Furthermore, low genetic differentiation was possibly maintained after the megafauna extinctions through alternative dispersal (e.g., human- or river-mediated), long generation times, and long lifespans of these megafruit palms. Our study illustrates how species interactions that happened >1000 years ago can leave imprints in population genetics.
README: Data to reproduce analysis for: Genomic signatures of past megafrugivore-mediated dispersal in Malagasy palms
https://doi.org/10.5061/dryad.pc866t1z6
Authors: Méndez L., Barratt C. D., Durka W., Kissling W. D., Eiserhardt W. L., Baker W. J., Randrianasolo V., Onstein R. E.
Contact: Laura Méndez (laura.mendezcue@gmail.com)
Contents
This data repository consists of the following folders and files:
Main folder (data_genomic_signatures_megafrugivore_dispersal_malagasy_palms):
· Word document containing information on each file and instructions for use (ReadMe.docx)
· “input” folder:
o “BayesAss” folder:
§ Contains input files to run BayesAss (BA3-autotune.sh and BA3-SNPs.sh). Specifically, imputed modified .immanc files (by R script 03_modification_immanc_file.R) previously transformed to .immanc by PGDSpider and previously imputed by LinkImputeR (Bm_modified.immanc, BN_modified.immanc, Cm_modified.immanc, HC_modified.immanc).
o “LinkImputeR” folder:
§ Contains input files to run LinkImputeR. Specifically, .vcf files resulting from ddRAD sequencing data filtering with Stacks and VCFtools, which contain missing data (Bm_miss50.vcf, BN_miss50.vcf, Cm_miss50.vcf, HC_miss50.vcf).
o “SDMs” folder:
§ Contains input files to perform ensemble species distribution models for each palm species (02_SDMs_FigureS2.R). Specifically, it contains a .csv file per species with present-day palm species occurrence data from Méndez et al. (2022), complemented with occurrence records from the Global Biodiversity Information Facility. Coordinates are given in latitude and longitude and the origin of the occurrence point is also shown (Bismarckia_nobilis.csv, Borassus_madagascariensis.csv, Chrysalidocarpus_madagascariensis.csv, Hyphaene_coriacea.csv).
o Coordinates of each palm population sampled, calculated as the centroid of all individuals per population per palm species (25 populations in total of 4 palm species). Coordinates are given in latitude and longitude (centroids_west.csv).
· “output” folder:
o “ADMIXTURE” folder:
§ Contains output (.Q and .P files) from running ADMIXTURE with K=1-10 for each palm species. It also contains .txt files with the results of each of the 20 runs with different seeds per species used to choose the best K.
o “BayesAss” folder:
§ “results” folder:
- Contains a folder per species with the results from running BayesAss (BA3-SNPs.sh). We estimated the 95% credible intervals (95_CI) on migration rates by calculating the mean ± 1.96 × standard deviation.
§ “tracer_files” folder:
- Contains the output tracer files to verify chain convergence with Tracer (**_modified.trace.txt*).
§ Summary of the results from BayesAss combining all species results from among population migration (both back and forth migration rates per pairs of populations) into one .csv file (all_migration_rates.csv).
§ Summary of the results from BayesAss combining all species results but modified (05_modification_bayesass_ouput.R) to only include the average value of the back and forth migration per pair of populations (pairwise_bayesass.csv).
§ Summary of the results from BayesAss combining all species results from within population migration into one .csv file (within_pop_migration.csv).
o “extracted_predictors” folder:
§ Contains all the output files from running script 04_extracting_connectivityPreds_Figure1 (every file ending in **_results.csv*).
§ Summary of all the results (every file ending in _results.csv) from 04_extracting_connectivityPreds_Figure1 used to run the linear mixed models with script 06_models_Figures2_3_S3_S8_S10 (pairwise_predictors.xlsx*).
§ Final and scaled summary of results (same as pairwise_predictors.xlsx, but scaled and with predictors to run models with maximum-likelihood population effect included) (pairwise_predictors_scaled.xlsx).
§ Same predictors as in the other files but extracted per population using the centroids_west.csv file (predictors_per_pop.csv).
o “LinkImputeR” folder:
§ “results” folder:
- Contains a folder per species with the results from running LinkImputeR (each of the .ini files included in input/LinkImputeR). Inside each folder there is a sum.dat file which contains a summary of the results, which helps decide which Case is best fitting per species (*.accuracy1).
§ Final imputed .vcf files created by running LinkImputeR specifying the Case chosen by looking into each of the sum.dat files in the results folders (.accuracy1). These files are used subsequently to create the .immanc file with PGDSpider (Bm.LinkImputeR_Case18.vcf, BN.LinkImputeR_Case8.vcf, Cm.LinkImputeR_Case10.vcf, HC.LinkImputeR_Case15.vcf*).
o “PGDSpider” folder:
§ Contains output from running PGDSpider to transform the imputed .vcf files with LinkImputeR (Bm.LinkImputeR_Case18.vcf, BN.LinkImputeR_Case8.vcf, Cm.LinkImputeR_Case10.vcf, HC.LinkImputeR_Case15.vcf) to .immanc files. These files are used by R script 03_modification_immanc_file and used in analyses with BayesAss (**_immanc.inp*).
o “RAxML” folder:
§ Contains output from running RAxML, specifically the best trees per species (RAxML_bestTree.*.GTRGAMMA.autoMRE.lewis).
§ Population maps used to create the PHYLIP files to run RAxML where each individual belongs to a different population. They are used by R script 09_plotting_trees_FigureS6.R to specify the names of each individual to plot on the tip labels of each best tree (popmap_*.txt).
o “sdm_R” folder:
§ “ensemble_models” folder:
- Contains final ensemble species distribution models created by running R script 02_SDMs_FigureS2.R for each palm species (Bm_brt.glm.maxent.rf.tif, BN_brt.gam.rf.tif, Dm_brt.gam.glm.maxent.rf.tif, HC_brt.glm.maxent.rf.tif). These files are used by R script 04_extracting_connectivityPreds_Figure1.R to extract connectivity based on environmental suitability and plot the maps from Figure 1.
- Contains files with results from each variable’s importance as an average across the resulting ensemble species distribution models calculated in R script 02_SDMs_FigureS2.R (*_average_varImp.csv).
- File summarizing the model performance of each method used to build the ensemble species distribution models of each palm species (model_performace.xlsx).
§ “thinned” folder:
- Contains the thinned occurrence files of each palm species created with R script 02_SDMs_FigureS2.R (*_thinned_5.csv).
§ RasterStack containing all spatial predictors for running R script 02_SDMs_FigureS2.R. This .tif file was created by script 01_environmental_predictors.R (predictors_sdm.tif).
§ File supporting the predictors_sdm.tif file to be able to assign correct names for each raster layer (names.env.xlsx).
Metadata
File supporting the predictors_sdm.tif file to be able to assign correct names for each raster layer (names.env.xlsx):
* *
· BIO1: Annual mean temperature (°C x 10)
· BIO2 = Mean Diurnal Range (Mean of monthly (max temp - min temp))
· BIO3 = Isothermality (BIO2/BIO7) (×100)
· BIO4: Temperature seasonality (standard deviation ×100)
· BIO5 = Max Temperature of Warmest Month
· BIO6: Minimum temperature of the coldest month (°C x 10)
· BIO7 = Temperature Annual Range (BIO5-BIO6)
· BIO8 = Mean Temperature of Wettest Quarter
· BIO9 = Mean Temperature of Driest Quarter
· BIO10 = Mean Temperature of Warmest Quarter
· BIO11 = Mean Temperature of Coldest Quarter
· BIO12: Annual mean precipitation (mm x month-1)
· BIO13 = Precipitation of Wettest Month
· BIO14 = Precipitation of Driest Month
· BIO15 = Precipitation Seasonality (Coefficient of Variation)
· BIO16 = Precipitation of Wettest Quarter
· BIO17 = Precipitation of Driest Quarter
· BIO18 = Precipitation of Warmest Quarter
· BIO19 = Precipitation of Coldest Quarter
· pet: Annual potential evapotranspiration from the Thornthwaite equation (mm)
· cwd: Annual climatic water deficit (mm)
· ndm: Number of dry months in the year
· alt: Altitude (in m)
· slop: Slope in degrees
· asp: Aspect (clockwise from North, in degree)
· solar: Solar radiation (in Wh.m-2.day-1)
· percfor2010: Madagascar’s forest in 2010 was derived from the 30 m resolution 2000 forest map by Harper et al. (2007).
· Popdensity2010: “For all locations with more than 1000 people·km−2, we assigned a pressure score of 10. For more sparsely populated areas with densities lower than 1000 people·km−2, we logarithmically scaled the pressure score using: Pressure score = 3.333 × log(populationdensity+1)” (Venter et al. 2016).
· HFP2009: Human footprint for 2009, calculated as a summary value from other several human related variables such as: Built environments, population density, night-time lights, croplands, pasture, roads, railways, navigable waterways.
· HFP1993: Human footprint for 1993, calculated as a summary value from other several human related variables such as: Built environments, population density, night-time lights, croplands, pasture, roads, railways, navigable waterways.
· Roads: “We mapped the direct and indirect influence of roads by assigning a pressure score of 8 for 0.5 km out for either side of roads, and access pressures were awarded a score of 4 at 0.5 km and decaying exponentially out to 15 km either side of the road” (Venter et al. 2016)
· Clay: Soil clay content (0-2 micro meter) in g/100g (w%)
· CationEC: Cation exchange capacity (CEC measured in 1 M NH4OAc buffered at pH 7) in cmolc/kg (fine earth)
· OragicCarbon: Soil organic carbon content (fine earth fraction) in dg/kg at 6 standard depths.
· pHinH2O: Soil pH x 10 in H2O at 7 standard depths (to convert to pH values divide by 10) predicted using the global compilation of soil ground observations.
· Sand: Sand content (50/63-2000 micro meter) mass fraction in ‰ at 6 standard depths.
· Extractable_Aluminum: Extractable aluminium content (Al measured by Mehlich 3) in mg/kg (fine earth)
· Total_Nitrogen: Total nitrogen (N) content in g/kg of the fine earth fraction
*Summary of all the results (every file ending in _results.csv) from 04_extracting_connectivityPreds_Figure1 used to run the linear mixed models with script 06_models_Figures2_3_S3_S8_S10 (pairwise_predictors.xlsx).
· pop.pairs: Populations from the same species for which connectivity is calculated with different predictors
· Species: Full species name
· Species: Short species name
· Fst: Genetic differentiation obtained from Stacks output between each population pair
· distance_km: Distance in Km between each population pair
· human_uses: Number of different human uses for each species as defined by Rakotoarinivo et al. (2020)
· shared_frug: Number of shared extant frugivore species between each population pair
· shared_EXTINCT_frug: Number of shared extinct frugivore species between each population pair
· human_pop: Human population density (extracted from raster layer Popdensity2010) between each population pair
· forest_cover: Forest cover density (extracted from raster layer percfor2010) between each population pair
· rivers: River density between each population pair
· road_density: Road density (extracted from raster layer Roads) between each population pair
· suitability: Environmental suitability (extracted from the resulting ensemble species distribution models: Bm_brt.glm.maxent.rf.tif, BN_brt.gam.rf.tif, Dm_brt.gam.glm.maxent.rf.tif, HC_brt.glm.maxent.rf.tif) between each population pair
· fruit_class: Palm species with fruits < 4 cm are considered ‘small-fruited’, species with fruits > 4cm are considered ‘megafruited’. Among megafruited species, we differentiate between fruits between 4-6 cm which are ‘medium-sized megafruits’ and species with fruits > 6cm which are considered ‘large-sized megafruits’.
Methods
We sampled leaf tissue from 25 natural populations of four palm species that differ in fruit sizes (large megafruits: B. madagascariensis; medium-sized megafruits: H. coriacea, B. nobilis; small fruits: C. madagascariensis) throughout their distribution in the western part of Madagascar during July, August and September of 2019 (Figure 1, Table 1, Table S1). More details on sampling for each species are provided in Table S1. The uneven number of sampled populations across the species mirrors the natural distribution and abundance of each species in Madagascar, with B. nobilis, H. coriacea and C. madagascariensis having wider distributions than B. madagascariensis. The latter is restricted and more fragmented in its distribution, and hence endangered according to the IUCN Red List of Threatened Species (IUCN, 2012).