Genomic tools for comparative conservation genetics among three recently diverged stag beetles (Lucanus, Lucanidae)

Huang, Jen-Pan 1

Published Apr 28, 2023; Updated Jun 06, 2023 on Dryad. https://doi.org/10.5061/dryad.jh9w0vtft

Data files

Apr 28, 2023 version files 104.98 MB

De_Vivo.2022_Lucanus.zip

104.97 MB
README.md

8.75 KB

Jun 06, 2023 version files 114.24 MB

De_Vivo.2022_Lucanus.zip

114.23 MB
README.md

9.22 KB

Jun 06, 2023 version files 114.24 MB

De_Vivo.2022_Lucanus.zip

114.23 MB
README.md

9.91 KB

Abstract

We are witnessing a rapid decline in global biodiversity. International protocols and local conservation laws have been installed to counter such an unprecedented rate of decline. However, quantitatively evaluating how much biodiversity has been lost due to climatic and anthropogenic effects and how much biodiversity has been restored due to conservation efforts remain challenging. We applied a comparative conservation genomic approach to statistically and quantitatively address these questions using three geographical taxa from a stag beetle species complex. We found that the three sky-island taxa formed three independently evolving units without detectable post-divergence gene flow; furthermore, the three taxa, which have been divergent from each other since the mid-Pleistocene, have experienced episodes of demographic decline in the past. More importantly, even though idiosyncratic anthropogenic exploitations have been hypothesized to impact the recent demographic history (< 100 years) differently, we found a shared pattern of continuous decline in effective population size among the three geographical taxa. We argue that future empirical studies should include more taxa, in addition to the focal species, that may or may not be affected by the focal historical events to avoid making biased conservation plans.

These folders have the supplementary data for our study, in which we analyzed the population trends of tree stag beetles (genus Lucanus) sister species in Taiwan.

Description of the data and file structure

bootstrapped_sites: this folder has the results of the sites bootstrapping for evaluating parameter estimates (see https://speciationgenomics.github.io/fastsimcoal2/ for understanding how they are made and how they are used through Bash scripts for generating blocks of sites for bootstrapping). Therefore, each "sites" file contains around 6667 sites from the original Variant Call Format (VCF) file of each population, which is contained in the "AllSites" file. "header" files are the files with the header of the original VCF files. Although all files are in a text format, we suggest to check their content by using Bash scripts (e.g., head or tail), given their huge size.

bpp: folder with the files used for the bpp analyses. The "lucanus_a00.ctl" is the control file used for the analysis, "bpp.txt" is the bpp DNA sequences file, while "bppmap.txt" is the file with the specification of which individual belongs to which population. The rest are the output files from the softwares (tree file in "FigTree.tre" and log in "mcmc.txt"). The control file, the DNA sequences and the "bppmap.txt" files has to be used with bpp, although they can be edited through the use of a text editor (e.g., gedit). The tree and the log files are used with the R package bbpr for calculating the divergence times. That being said, they can be both visualized with a text editor, although the tree file can also be used for generating a tree picture with FigTree.

fineRADstructure: folder with the files used for fineRADstructure, both with the ipyrad ("ipyrad") and Stacks ("stacks") output. The definition of each file follows the one from the fineRADstructure website (https://www.milan-malinsky.org/fineradstructure ; in such website, there are also the scripts for using such software and how to plot the results with R). Therefore, the "ipyrad.txt" and the "populations.haps.radpainter" files are the input files, while the "chunks" files are outputs from the analyses, "mcmc" are the files with the population assignment and the "tree" files are the tree resulted from both "chunks" and "mcmc". The "missing" files show the missing data. All these files can be visualized by using a text editor or Bash scripts.

fsc: files used and results for the fastsimcoal analyses for all the three populations/species: "HC", "NT" and "ML". The "HC" stands for "Hsinchu", ML stands for "Miaoli" and "NT" stands for "Nantou": Hsinchu, Miaoli and Nantou are the counties where we sampled the beetles in Taiwan and where the three distinct populations come from.
"1_pastdecrease"=past decrease (around 15 generations ago) modeling results. "2_recdecrease"=recent decrease (around 5 generations ago) results. "3_popc"=constant population size. "4_nudoubledec"=two decrease events (past and recent). "5_oneevent"=constant decline. "6_increase"=constant increase. "7_oldconstant"=constant decline starting 100000 generations ago. "8_doubleconstant"=two constant declines, one starting 100000 generations ago and another one roughly 20 generations ago. "9_nobounddoubledec"=two decrease events (past and recent), with no time bounding. Each folder has the SFS file used for such analyses ("_MAFpop0.obs"), the model in the template (".tpl") and estimate (".est" files and all the replicates (see https://speciationgenomics.github.io/fastsimcoal2/ for understanding how they were generated). Each run has also the estimates in the parameter (".par") and ".pv" files, the likelihoods (both best likelihood estimate, ".bestlhoods", and best observed likelihood, ".brent_lhoods"), the simulation parameters ("*.simparam") and the simulated SFS in a text file ("*MAFpop0.txt"). See the fastsimcoal manual for more informations about the file format used by said software (http://cmpg.unibe.ch/software/fastsimcoal27/man/fastsimcoal27.pdf). As with the other input files (bpp, fineRADstructure), such files can be visualized and edited through a text editor or Bash scripts.
The "bestrun" per each model was calculated with a previously released script (available at https://raw.githubusercontent.com/speciationgenomics/scripts/master/fsc-selectbestrun.sh). Additionally, the calculated AIC for each model ("AIC_computed") in the ".AIC" files, the files used for the calculation of the weighted AIC with their outputs ("AIC_weights") and the calculated likelyhood distributions ("lk_distributions") in the ".lhoods" files are present here. The "AIC_weights" folders are split into "constant declines" (which has the calculation for the "7_oldconstant" and "8_doubleconstant" models) and "models" (which has the calculation for model 1 to 6). Again, a text editor can be used for getting a look inside these files.
Given their huge size (around 6 GB per each population), we did not include the results of the bootstraps for the best models. Such files will be available on request.

scripts: folder with the scripts used per each analyses, mostly in .txt files that can be opened in a way that said scripts can be copied, pasted and used. In the "fsc" folder, there are the Bash scripts used for running the fastsimcoal analyses, both per each model ("fsc_scripts.txt") and the bootstrap for getting a confidence interval ("bootstrapping_fsc.txt"). "ipyrad-analysis tools" has Python scripts used for some tools present in the ipyrad suite: PCA ("scripts_for_PCA.txt"), STRUCTURE ("scripts_structure.txt") and treemix ("treemix_scripts.txt"). "r" presents the R scripts used in this study. Specifically, "aic_fsc" has a modification of a previously released R script (available in https://github.com/speciationgenomics/scripts/blob/master/calculateAIC.sh) for calculating AICs (the ones presemt in the subfolder "AIC_computed" of the "fsc" folder), "aic_weights" has the scripts used for calculating weighted AICs, "bppr" contains the scripts used for the bppr package, "filtering" contains the scripts used for filtering the original VCF file from Stacks and keep only the SNPs with less than 20% of sites missing, "lk" contains the scripts used for plotting the likelyhood distribution for each model. "plot_r_stairwayplot" has the scripts for plotting the original Stairway Plot 2 output files in R and "snapclust" has the scripts for filtering loci from STRUCTURE files and make the snapclust analyses.

snapclust: STRUCTURE input files used for snapclust analyses, both filtered ("*80.str") and not, for both ipyrad ("ipyrad") and Stacks ("stacks") outputs. We suggest to use a spreadsheet software (e.g., Excel or Calc) or R for visualizing the STRUCTURE files, given their formatting. Among these two options, we suggest R, since some files might be too large for some spreadsheets.

stairway_plot: blueprints and results (the "summary" files) for the Stairway Plot 2 analyses. The "blueprint" files are the input files for Stairway Plot 2 and they can be edited through the use of a text editor. For the summary files, we suggest to use a spreadsheet software or R for visualizing them, given that they are in a tabular format.

treemix: treemix results, for both ipyrad ("ipyrad") and Stacks ("stacks") outputs, which can be used with the ipyrad-tools suite for visualizing the results (see https://ipyrad.readthedocs.io/en/master/API-analysis/cookbook-treemix.html). You can also use R, if you prefer it (https://speciationgenomics.github.io/Treemix/). ".cov.gz" files are covariance matrices between populations estimated from the data. ".covse.gz" files are the standard errors for each entry in the covariance matrix. ".treeout.gz" are the outputted tree models with migration events. ".vertices.gz" and ".edges.gz" have the internal structure of the inferred graph. ".llik" are the likelihood estimates. They may be accessible with a text editor or Bash scripts, but the TreeMix manual (https://bitbucket.org/nygcresearch/treemix/downloads/treemix_manual_10_1_2012.pdf) suggest to not change in any way vertices and edges.

vcfs: VCFs and HDF5 databases ("*.snps.hdf5") files used in this study, for both ipyrad ("ipyrad") and Stacks ("stacks") outputs. There is also a folder with the Stacks output filtered with only SNPs with less than 20% of sites missing ("only_snps_less_20_missing"). The HDF5 is a format used by ipyrad for making analyses (more informations available at https://ipyrad.readthedocs.io/en/master/API-analysis/cookbook-vcf2hdf5.html). Both VCFs and HDF5 files can be technically be opened and visualized by a text editor, but given their size and formatting we suggest to use them with the ipyrad-tools suite and R. Alternatively, Bash scripts can be used for visualizing them.

Sharing/Access information

The trimmed Stacks ddRADseq reads are avaliable in Short Reads Archive ((BioProject accession number: PRJNA899990).

Genomic tools for comparative conservation genetics among three recently diverged stag beetles (Lucanus, Lucanidae)

Data files

Abstract

README: Supplementary data for "Genomic tools for comparative conservation genetics: Objectively evaluating the impacts of anthropogenic exploitation and informing an immediate conservation plan among three recently diverged stag beetles (Lucanus, Lucanidae)."

Description of the data and file structure

Sharing/Access information

Works referencing this dataset