Data from: Pleistocene divergence and long-term population decline in three endemic Euphorbia species of high conservation concern from the south-western Alps and the mountains of Corsica and Sardinia
Abstract
In this study, we integrate genomic data (ddRADseq) with species distribution models to investigate the evolutionary and demographic history of three narrow-ranged endemic Euphorbia species (E. gayi, E. valliniana and E. variabilis) restricted to the south-western Alps, Corsica, and Sardinia. Using coalescent-based approaches, we infer divergence times and reconstruct long-term demographic trends, while also examining population structure and connectivity. This data archive contains genomic data derived from ddRADseq. For each analysis, the most relevant input and results files that should allow replication of the workflow are included.
This data archive contains genomic data derived from ddRADseq. For each analysis, the most relevant input and results files that should allow replication of the workflow are included. Demultiplexed ddRADseq reads are available from NCBI under BioProject PRJNA1336693.
In this study, we integrate genomic data (RADseq) with species distribution models to investigate the evolutionary and demographic history of three narrow-ranged endemic Euphorbia species restricted to the south-western Alps, Corsica, and Sardinia. Using coalescent-based approaches, we infer divergence times and reconstruct long-term demographic trends, while also examining population structure and connectivity. In addition, we project species distributions under multiple global change scenarios to assess future extinction risks.
Description of the Data and file structure
Summary of data
Data in this repository are structured in four folders, corresponding to the three species-level datasets (Egayi, Evalliniana, Evariabilis) used to assess intraspecific population structure and demographic history, and a multispecies dataset (Multispecies) used for phylogenetic inference and demographic modelling.
Usage notes
The different data types included in this repository are listed below, along with brief descriptions for how to work with them.
Descriptions of the VCF file formats can be found on the SAMtools file-format specifications page.
.dst: Plain text file containing a Nei’s genetic distance matrix among individuals. It is formatted as a PHYLIP-style distance file, with the first row specifying the number of samples and subsequent lines starting with the sample name followed by the pairwise distances to all other samples. Can by opened with any text editor (e.g., Linux less command, Nano, Vim, or Text Editor) and visualised in SplitsTree.
.input: Plain text file containing the input for STRUCTURE analyses. The first column lists the individual ID and subsequent columns represent different genomic positions. Each individual is represented by a single line and diploid genotypes are represented as integer between 1 and 4. Missing data is encoded as 0. These files can be edited in any text editor.
.log: Computer-generated text files that record all activities, operations, errors, and events that occur within a system or application. In this context, they were produced by the software SNAPP and document properties of the Markov-Chain-Monte-Carlo (MCMC) process, including effective sample size and mixing of chains. They are used to assess convergence of chains and sufficient sampling of the posterior parameter space. Log files produced by SNAPP can be analysed in Tracer.
.nex: A Nexus file is a modular, extensible data format used primarily in phylogenetics. Nexus files always begin with a fixed header #NEXUS followed by multiple blocks. Each block starts with BEGIN block_name; and ends with END;. Blocks can contain taxa names, genomic sequences, phylogenetic trees, distances, character sets and more. Nexus files can be edited in any text editor (e.g., Linux less command, Nano, Vim, or Text Editor).
.newick: NEWICK is a text-based format for representing phylogenetic trees in computer-readable form using (nested) parentheses and commas. The phylogenetic tree is represented in a single line, starting with > in the first column and a tree-recognition string, (e.g.,'Tree'), followed by nested parentheses describing the relations between the species represented in the tree. NEWICK files can be edited in any texteditor and graphically displayed in software like FigTree, TreeViewer, and drawtree.
.radpainter: Plain text file used as input for fineRADstructure. It contains the coancestry matrix, i.e. the number of nearest-neighbour haplotype “chunks” that each individual shares with every other individual.
.sfs: Two-dimensional site frequency spectrum summarizing the joint distribution of allele frequencies among populations. Plain text file that begins with a header line giving the dimensions of the spectrum (number of allele frequency categories per population). Each cell in the matrix corresponds to the number of sites where the derived (or minor) allele is observed i times in population 1 and j times in population 2. Can be opened and edited in any text editor.
.tre: Maximum clade credibility tree including mean node heights built from individual SNAPP trees in TreeAnnotator. The tree is in Nexus format and can be visualised by any supported tree viewer program like FigTree or iTOL.
.trees: Result of a SNAPP analysis. A set of multiple trees (and corresponding parameter values) generated using a Markov chain Monte Carlo (MCMC) algorithm, each of which is a sample from the posterior distribution of species trees and parameters. The trees are in Nexus format and can be visualised in DensiTree.
.treefile: The maximum-likelihood tree produced by IQ-TREE in NEWICK format, which can be visualised by any supported tree viewer program like FigTree or iTOL.
.vcf.gz: Compressed Variant Call Format (VCF) files, generated using bgzip. These contain detailed information about genetic variants, including their position relative to the reference genome, quality metrics, and individual genotype data. They can be examined, modified, and analyzed with tools such as BCFtools. Header lines in these files begin with #.
.xml: XML files define the data, models, and parameters for an analysis performed by the BEAST software (Bayesian Evolutionary Analysis by Sampling Trees). This XML file serves as the input for the BEAST program, which uses Markov chain Monte Carlo (MCMC) methods to perform phylogenetic analyses. Here, they describe the input data and paramters for a SNAPP analyses, implemented in BEAST. They can be opened and edited using any text editor and graphically displayed in BEAUti.
Egayi
Population_statistics
populations.snps.vcf: VCF file containing SNPs of Euphorbia gayi called in STACKS that are present in at least 80% of individuals and have a maximum heterozygosity of 0.65 and a minimum minor allele count of 3. To reduce linkage, a single random SNP per RAD locus was retained. Input file for computing population summary statistics.
Stairwayplot
populations.snps.vcf: VCF file containing SNPs of Euphorbia gayi called in STACKS that are present in at least 70% of individuals and have a maximum heterozygosity of 0.65. To reduce linkage, a single random SNP per RAD locus was retained. Input file for the analysis of historical population size changes in Stairwayplot.
STRUCTURE
populations.structure: STRUCTURE input file produced in STACKS based on SNPs of Euphorbia gayi present in at least 80% of individuals that have a maximum heterozygosity of 0.65 and a minimum minor allele count of 3. To reduce linkage, a single random SNP per RAD locus was retained.
out_STR.zip: Results of the STRUCTURE analysis for Euphorbia gayi, which was run with the admixture model for 1,000,000 MCMC generations with 100,000 generations as burnin for K (number of groups) ranging from 1 to 10 with 10 replicates each. The .zip files be upload to CLUMPAK for averaging across replicates and a summary of results.
Evalliniana
dadi
populations.snps.vcf: VCF file containing SNPs of Euphorbia valliniana called in STACKS that are present in at least 70% of individuals and have a maximum heterozygosity of 0.65. To reduce linkage, a single random SNP per RAD locus was retained. Input for easySFS to produce 2-D site frequency spectra (2-D SFS).
North-South.sfs: 2-D SFS producing in easySFS containing the joint distribution of allele frequencies between the Northern and Southern populations of Euphorbia valliana. Used as input for demographic modelling in dadi.
South-North.sfs: 2-D SFS producing in easySFS containing the joint distribution of allele frequencies between the Southern and Northern populations of Euphorbia valliana. Used as input for demographic modelling in dadi.
Population_statistics
populations.snps.vcf: VCF file containing SNPs of Euphorbia valliniana called in STACKS that are present in at least 80% of individuals and have a maximum heterozygosity of 0.65 and a minimum minor allele count of 3. To reduce linkage, a single random SNP per RAD locus was retained. Input file for computing population summary statistics.
Stairwayplot
populations.snps.vcf: VCF file containing SNPs of Euphorbia valliniana called in STACKS that are present in at least 70% of individuals and have a maximum heterozygosity of 0.65. To reduce linkage, a single random SNP per RAD locus was retained. Input file for the analysis of historical population size changes in Stairwayplot.
STRUCTURE
populations.structure: STRUCTURE input file produced in STACKS based on SNPs of Euphorbia valliniana present in at least 80% of individuals that have a maximum heterozygosity of 0.65 and a minimum minor allele count of 3. To reduce linkage, a single random SNP per RAD locus was retained.
out_STR.zip: Results of the STRUCTURE analysis for Euphorbia valliniana, which was run with the admixture model for 1,000,000 MCMC generations with 100,000 generations as burnin for K (number of groups) ranging from 1 to 10 with 10 replicates each. The .zip files be upload to CLUMPAK for averaging across replicates and a summary of results.
Evariabilis
Population_statistics
populations.snps.vcf: VCF file containing SNPs of Euphorbia variabilis called in STACKS that are present in at least 80% of individuals and have a maximum heterozygosity of 0.65 and a minimum minor allele count of 3. To reduce linkage, a single random SNP per RAD locus was retained. Input file for computing population summary statistics.
Stairwayplot
populations.snps.vcf: VCF file containing SNPs of Euphorbia variabilis called in STACKS that are present in at least 70% of individuals and have a maximum heterozygosity of 0.65. To reduce linkage, a single random SNP per RAD locus was retained. Input file for the analysis of historical population size changes in Stairwayplot.
STRUCTURE
populations.structure: STRUCTURE input file produced in STACKS based on SNPs of Euphorbia variabilis present in at least 80% of individuals that have a maximum heterozygosity of 0.65 and a minimum minor allele count of 3. To reduce linkage, a single random SNP per RAD locus was retained.
out_STR.zip: Results of the STRUCTURE analysis for Euphorbia variabilis, which was run with the admixture model for 1,000,000 MCMC generations with 100,000 generations as burnin for K (number of groups) ranging from 1 to 10 with 10 replicates each. The .zip files be upload to CLUMPAK for averaging across replicates and a summary of results.
Multispecies
dadi
populations.snps.vcf: VCF file containing SNPs of Euphorbia valliniana and Euphorbia variabilis called in STACKS that are present in at least 70% of individuals and have a maximum heterozygosity of 0.65. To reduce linkage, a single random SNP per RAD locus was retained. Input for easySFS to produce 2-D site frequency spectra (2-D SFS).
Eval-Evar.sfs: 2-D SFS producing in easySFS containing the joint distribution of allele frequencies between Euphorbia valliniana and Euphorbia variabilis. Used as input for demographic modelling in dadi.
Evar-Eval.sfs: 2-D SFS producing in easySFS containing the joint distribution of allele frequencies between Euphorbia variabilis and Euphorbia valliniana. Used as input for demographic modelling in dadi.
fineRADStructure
populations.haps.radpainter: Coancestry matrix among the three endemic species produced in STACKS populations based on loci present in at least 50% of individuals that have a maximum heterozygosity of 0.65 and a minimum minor allele count of 3. Used as input for fineRADstructure.
IQ-Tree
Endemics_OG_minDP5_maxDP1000_R50_MAC3.vcf.gz: VCF file containing SNPs of all Euphorbia accessions produced in STACKS that have been filtered to contain only sites with a minimum genotype read depth (minDP) of 5, a maximum genotype read depth (maxDP) of 1000, and a minimum minor allele count (MAC) of 3 that are present in at least 50% of samples (R50).
Endemics_OG_minDP5_maxDP1000_R50_MAC3.min4.phy.varsites.phy: Alignment used to build a maximum likelihood phylogenetic tree in IQ-TREE2. The file contains genotypes for all Euphorbia accessions produced in STACKS that have been filtered to contain only sites with a minimum genotype read depth (minDP) of 5, a maximum genotype read depth (maxDP) of 1000, and a minimum minor allele count (MAC) of 3 that are present in at least 50% of samples (R50). The vcf file was converted to phylip format using the script vcf2phylip.py.
Endemics_OG_minDP5_maxDP1000_R50_MAC3.min4.phy.varsites.phy.treefile: Best scoring maximum likelihood tree based on 23,338 SNPs produced in IQ-TREE2 under the TVM+F+ASC+R2 substitution model using 1000 ultrafast bootstrap replicates, ascertainment bias correction, and correction for overestimating node support.
NeighbourNet
Endemics_noOG_minDP5_maxDP1000_R50.vcf.gz: VCF file containing SNPs of all Euphorbia accessions but the outgroup produced in STACKS that have been filtered to contain only sites with a minimum genotype read depth (minDP) of 5, a maximum genotype read depth (maxDP) of 1000, and a minimum minor allele count (MAC) of 3 that are present in at least 50% of samples (R50).
Endemics_R50.dst: Matrix of Nei's genetic distance among individuals computed in adegenet based on the VCF file Endemics_noOG_minDP5_maxDP1000_R50.vcf.gz. Used to construct the NeighbourNet phylogenetic network in SplitsTree.
SNAPP
SNAPP input files (.xml), log files (.log) and resulting trees (.trees). The analysis was run for 2,200,000 generations and a tree was saved every 100th generation. Input files were generated with snapp_prep.rb by sampling one SNP per RAD locus without missing data across species from the VCF file SNAPP_minDP5_maxDP1000_R80.vcf.gz, containing SNPs present in at least 80% of Euphorbia accessions with a maximum observed heterozygosity of 0.65, a minimum genotype read depth (minDP) of 5 and a maximum genotype read depth (maxDP) of 1000. Finally,10% of trees were discarded as burnin and a maximum clade credibility tree was constructed in TreeAnnotator (SNAPP_burnin10.tre).
STRUCTURE
populations.structure: STRUCTURE input file produced in STACKS based on SNPs of all three Euphorbia species present in at least 80% of individuals that have a maximum heterozygosity of 0.65 and a minimum minor allele count of 3. To reduce linkage, a single random SNP per RAD locus was retained.
out_STR.zip: Results of the STRUCTURE analysis for all three Euphorbia species, which was run with the admixture model for 1,000,000 MCMC generations with 100,000 generations as burnin for K (number of groups) ranging from 1 to 10 with 10 replicates each. The .zip files be upload to CLUMPAK for averaging across replicates and a summary of results.
Data.zip
├── Egayi
│ ├── Population_statistics
│ │ └── populations.snps.vcf
│ ├── Stairwayplot
│ │ └── populations.snps.vcf
│ └── STRUCTURE
│ ├── out_STR.zip
│ └── populations.structure
├── Evalliniana
│ ├── dadi
│ │ ├── North-South.sfs
│ │ ├── populations.snps.vcf
│ │ └── South-North.sfs
│ ├── Population_statistics
│ │ └── populations.snps.vcf
│ ├── Stairwayplot
│ │ └── populations.snps.vcf
│ └── STRUCTURE
│ ├── out_STR.zip
│ └── populations.structure
├── Evariabilis
│ ├── Population_statistics
│ │ └── populations.snps.vcf
│ ├── Stairwayplot
│ │ └── populations.snps.vcf
│ └── STRUCTURE
│ ├── out_STR.zip
│ └── populations.structure
└── Multispecies
├── dadi
│ ├── Eval-Evar.sfs
│ ├── Evar-Eval.sfs
│ └── populations.snps.vcf
├── fineRADStructure
│ └── populations.haps.radpainter
├── IQ-Tree
│ ├── Endemics_OG_minDP5_maxDP1000_R50_MAC3.min4.phy.varsites.phy
│ ├── Endemics_OG_minDP5_maxDP1000_R50_MAC3.min4.phy.varsites.phy.treefile
│ └── Endemics_OG_minDP5_maxDP1000_R50_MAC3.vcf.gz
├── NeighbourNet
│ ├── Endemics_noOG_minDP5_maxDP1000_R50.vcf.gz
│ └── Endemics_R50.dst
├── SNAPP
│ ├── SNAPP_burnin10.tre
│ ├── snapp.log
│ ├── SNAPP_minDP5_maxDP1000_R80.vcf.gz
│ ├── snapp.trees
│ └── snapp.xml
└── STRUCTURE
├── out_STR.zip
└── populations.structure
