Habitat association predicts population connectivity and persistence in flightless beetles: a population genomics approach within a dynamic archipelago
Data files
Oct 22, 2024 version files 2.16 GB
-
0_data_processing_filters.zip
1.09 GB
-
1_genetic_variation_statistics.zip
860.21 MB
-
2_genetic_structure.zip
79.34 MB
-
3_genealogical_inference_svdquartets.zip
22.08 MB
-
4_demographic_modelling_fastsimcoal2.zip
103.47 MB
-
README.md
56.03 KB
Abstract
Habitat association has been proposed to affect evolutionary dynamics through its control on dispersal propensity, which is considered a key trait for lineage survival in habitats of low durational stability. The Habitat Constraint hypothesis predicts different micro- and macroevolutionary patterns for stable vs. dynamic habitat specialists, but the empirical evidence remains controversial and in insects mostly derives from winged lineages. We here use genome-wide SNP data to assess the effect of habitat association on the population dynamics of two closely related flightless lineages of the genus Eutagenia (Coleoptera: Tenebrionidae), which are co-distributed across the Cyclades islands in the Eastern Mediterranean but are associated with habitat types of different presumed stability: the psammophilous lineage is associated with dynamic sandy coastal habitats, while the geophilous lineage is associated with comparatively stable compact-soil habitats. Our comparative population genomic and demographic analyses support higher inter-island gene flow in the psammophilous lineage, presumably due to the physical properties of dynamic sand-dune habitats that promote passive dispersal. We also find consistent bottlenecks in the psammophilous demes, suggesting that lineage evolution in the dynamic habitat is punctuated by local extinction and recolonisation events. The inferred demographic processes are surprisingly uniform among psammophilous demes, but vary considerably among geophilous demes depending on historical island connectivity, indicating more stringent constraints on the dynamic-habitat lineage. This study extends the Habitat Constraint hypothesis by demonstrating that selection on dispersal traits is not the only mechanism that can drive consistent differences in evolutionary dynamics between stable vs. dynamic habitat specialists.
GENERAL INFORMATION
Dataset overview
A detailed description of the general framework and specific methodology that were followed in order to generate and process all included files, can be found in the relevant publication (see bellow).
This dataset has been generated following the double-digest Restriction site Associated DNA sequencing (ddRADseq) protocol of Peterson et al. (2012; PLoS ONE, 7: e37135) with small modifications (Papadopoulou and Knowles, 2017; Evolution, 71: 2901-2917). The constructed libraries of 72-80 individuals each were size-selected (350-450 bp), PCR-amplified (8-10 cycles) and sequenced on the Illumina HiSeq2500 platform (single-end, 150 bp reads). Raw Illumina reads (demultiplexed) and relevant metadata have been deposited in the NCBI Sequence Read Archive (SRA), under BioProject PRJNA951918 (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA951918). Demultiplexing of the raw Illumina reads, de novo assembly to putative loci and genotype calling, was performed using the STACKS2 v2.64 pipeline (Rochette et al., 2019; Molecular Ecology, 28: 4737-4754).
Corresponding author information
Name: Emmanouil Meramveliotakis
ORCID: https://orcid.org/0000-0002-6399-575X
Affiliation: Department of Biological Sciences, Faculty of Pure and Applied Sciences, University of Cyprus, Nicosia, Cyprus
email: emeram01@ucy.ac.cy
Alternative contact information
Name: Anna Papadopoulou
ORCID: https://orcid.org/0000-0002-4656-4894
Affiliation: Department of Biological Sciences, Faculty of Pure and Applied Sciences, University of Cyprus, Nicosia, Cyprus
email: papadopoulou.g.anna@ucy.ac.cy
Related publication
Meramveliotakis, E., Ortego, J., Anastasiou, I., Vogler, A. P., Papadopoulou, A. (ACCEPTED; Sep-2024) Habitat association predicts population connectivity and persistence in flightless beetles: a population genomics approach within a dynamic archipelago. Molecular Ecology
Funding information
This work is a product of the project EVOLHAB (EXCELLENCE/0421/0419), which was co-financed by the European Regional Development Fund and the Republic of Cyprus through the Research and Innovation Foundation (RIF).
During the course of this work EM was supported by the A. G. Leventis Foundation Educational Grants Scheme for doctoral students.
This research was also partly supported by the Spanish Ministry of Economy and Competitiveness through the Severo Ochoa Program for Centres of Excellence in R + D + I (SEV-2012-0262).
NOTES FOR FILES
File names
When applicable, file names follow the pattern
eut.(01).R(02).(03).(04).(05).(06).(07).c(08).(09).(10).(file_extension)
, where:
(01)
denotes the dataset type and can be defined as:eph
for the psammophilous lineageper
for all demes of the geophilous lineageper.nc
for the eastern (northern and southern sectors, and Donoussa) demes of the geophilous lineageper.w
for the western demes of the geophilous lineage
(02)
denotes the missing data filtering percent threshold across all individuals in order to keep a RAD locus, as implemented in thepopulations
program of theSTACKS2 v2.64
pipeline(03)
denotes the included sites after theSTACKS2
assembly and can be defined as:all
for including all variant sites per RAD locuswss
for including only the first SNP per RAD locustot
for including both variant and invariant sites for each RAD locus
(04)
denotes if loci that are potentially under selection (as identified using theBAYESCAN
software) have been removed from the dataset and can be defined as:fSEL
to state that potential loci under selection have been removed- blank : to state that the filter has not been applied on the dataset
(05)
denotes if highly variable loci have been removed from the dataset and can be defined as:fTH
to state that the highly variable loci have been removed- blank : to state that the filter has not been applied on the dataset
(06)
denotes if the included variant sites have been filtered based on their sequencing depth and can be defined as:fDP
to state that a filter has been applied to remove low and artificially high depth variants- blank : to state that the filter has not been applied on the dataset
(07)
denotes if the file includes only individuals of a specific deme (or deme pair) and can be defined as:- two-letter and one-number pattern : denotes that the file only includes data from the specific deme (e.g., “AD1”)
- _two consecutive two-letter and one-number patterns : denotes that the file only includes data from the specific deme pair (e.g., “AD1AD2”)
- blank : to state that the file includes all relevant individuals (e.g., individuals from all psammophilous demes)
(08)
denotes if an extra filter on variant missingness percentage has been applied (e.g., “c80” for 80% presence across all included individuals to retain a site)(09)
denotes if a minor allele count (mac) threshold has been applied and can be defined as:mac3
to state that all variants with a mac < 3 have been excluded- blank : to state that the filter has not been applied on the dataset
(10)
denotes if the dataset has been thinned to one variant per RAD locus after the application of the aforementioned filters and can be defined as:onesnp
to state that the dataset is thinned- blank : to state that the dataset has not been thinned
(file_extension)
denotes the file type extension
This naming pattern is mostly relevant to input files and especially to vcf
files.
VCF files
vcf
files may include either both variant and invariant sites for each RAD locus or only variant sites. The following columns are included:
#CHROM
: RAD locus number/identifierPOS
: Position of the SNP on the RAD locusID
: Identifier of the SNPREF
: Reference baseALT
: Alternate baseQUAL
: Quality scoreFILTER
: Filter statusINFO
: Additional informationNS
: Number of samples with data (only included for the variant sites)AF
: Allele frequency (only included for the variant sites)
FORMAT
: Data formatGT
: GenotypeDP
: Read depth (only included for the variant sites)AD
: Allele depth (only included for the variant sites)GQ
: Genotype quality (only included for the variant sites)GL
: Genotype likelihood (only included for the variant sites)
- Sample columns: One per individual, containing genotype information
Note that missing data are coded as ./.
DESCRIPTION OF DIRECTORIES AND FILES
0_data_processing_filters.zip
The compressed master directory “0_data_processing_filters”, contains a total of 16 directories and 60 files. The directory tree structure is visible bellow (files not shown):
0_data_processing_filters
├── geophilous
│ ├── 0_stacks2_output
│ ├── 1_bayescan_outliers
│ │ ├── 0_bayescan_input
│ │ ├── 1_bayescan_output
│ │ └── 2_bayescan_outliers_blacklist
│ ├── 2_theta_outliers
│ └── 3_filtered_datasets
└── psammophilous
├── 0_stacks2_output
├── 1_bayescan_outliers
│ ├── 0_bayescan_input
│ ├── 1_bayescan_output
│ └── 2_bayescan_outliers_blacklist
├── 2_theta_outliers
└── 3_filtered_datasets
It is divided into two main directories (i.e., “geophilous” and “psammophilous”), corresponding to the two distinct datasets of Eutagenia lineages with distinct habitat association. Each of the two main directories is further divided into four sub-directories which correspond to different data generation and filtering steps. Specifically:
Sub-directories
0_stacks2_output
##################################################
### GEOPHILOUS lineage dataset
##################################################
0_data_processing_filters/geophilous/0_stacks2_output/
├── eut.per.nc.R30.all.vcf.gz
├── eut.per.nc.R30.wss.vcf.gz
├── eut.per.R30.all.vcf.gz
├── eut.per.R30.wss.vcf.gz
├── eut.per.w.R30.all.vcf.gz
└── eut.per.w.R30.wss.vcf.gz
##################################################
### PSAMMOPHILOUS lineage dataset
##################################################
0_data_processing_filters/psammophilous/0_stacks2_output/
├── eut.eph.R30.all.vcf.gz
└── eut.eph.R30.wss.vcf.gz
It includes *.vcf
files, as generated by the first run of the populations
program in the STACKS2 v2.64
pipeline. All files have been minimally filtered to include variants that are present in at least 30% of all individuals.
1_bayescan_outliers
##################################################
### GEOPHILOUS lineage dataset
##################################################
0_data_processing_filters/geophilous/1_bayescan_outliers/
├── 0_bayescan_input
│ └── eut.per.R30.wss.txt
├── 1_bayescan_output
│ ├── eut.per.R30.wss_n5Kth10nbp20p5Kb50Kpo1000_fst.txt
│ └── eut.per.R30.wss_n5Kth10nbp20p5Kb50Kpo1000.sel
└── 2_bayescan_outliers_blacklist
├── bayescan_eut.per.R30.wss_n5Kth10nbp20p5Kb50Kpo1000_outliers.tsv
├── bayescan_eut.per.R30.wss_n5Kth10nbp20p5Kb50Kpo1000.tsv
├── bayescan_outliers_fSEL.Rmd
├── blacklistBS.txt
├── convergence_plots.pdf
└── plot_R.r
##################################################
### PSAMMOPHILOUS lineage dataset
##################################################
0_data_processing_filters/psammophilous/1_bayescan_outliers/
├── 0_bayescan_input
│ └── eut.eph.R30.wss.txt
├── 1_bayescan_output
│ ├── eut.eph.R30.wss_n5Kth10nbp20p5Kb50Kpo1000_fst.txt
│ └── eut.eph.R30.wss_n5Kth10nbp20p5Kb50Kpo1000.sel
└── 2_bayescan_outliers_blacklist
├── bayescan_eut.eph.R30.wss_n5Kth10nbp20p5Kb50Kpo1000_outliers.tsv
├── bayescan_eut.eph.R30.wss_n5Kth10nbp20p5Kb50Kpo1000.tsv
├── bayescan_outliers_fSEL.Rmd
├── blacklistBS.txt
├── convergence_plots.pdf
└── plot_R.r
Includes all relevant directories and files for the BAYESCAN
analyses. Specifically:
- The
0_bayescan_input
directories (one for each lineage specific dataset) include the input files. - The
1_bayescan_output
directories include the main output of theBAYESCAN
analyses, which can be used as input to (a) identify outliers (*_fst.txt
files), and (b) evaluate the convergence of the MCMC chain (*.sel
files). - The
2_bayescan_outliers_blacklist
directories include the blacklisted RAD loci (blacklistBS.txt
) that are potentially under selection and relevant files. Note that (a) the code to generate the files in this directory is included in the R-markdown file (bayescan_outliers_fSEL.Rmd
), and (b) the included scriptplot_R.r
is provided by the authors ofBAYESCAN
in the relevant repository (https://github.com/mfoll/BayeScan).
2_theta_outliers
##################################################
### GEOPHILOUS lineage dataset
##################################################
0_data_processing_filters/geophilous/2_theta_outliers/
├── blacklist95.txt
├── blacklist975.txt
├── blacklist995.txt
├── blacklistIQRextreme.txt
├── blacklistIQRmild.txt
├── blacklistMC.txt
└── theta_outliers_fTH.Rmd
##################################################
### PSAMMOPHILOUS lineage dataset
##################################################
0_data_processing_filters/psammophilous/2_theta_outliers/
├── blacklist95.txt
├── blacklist975.txt
├── blacklist995.txt
├── blacklistIQRextreme.txt
├── blacklistIQRmild.txt
├── blacklistMC.txt
└── theta_outliers_fTH.Rmd
Includes all relevant files (and code) to generate the blacklist of RAD loci with potentially inflated theta values (i.e., highly variable loci based on the number of segregating sites). The input files are located in the 0_stacks2_output
directories, while the different blacklist files are named based on the selected threshold (see theta_outliers_fTH.Rmd
).
3_filtered_datasets
##################################################
### GEOPHILOUS lineage dataset
##################################################
0_data_processing_filters/geophilous/3_filtered_datasets/
├── blacklist.per.fSEL.fTH99.txt
├── eut.per.nc.R30.all.fSEL.fTH99.fDP95.vcf.gz
├── eut.per.nc.R30.tot.fSEL.fTH99.vcf.gz
├── eut.per.R30.all.fSEL.fTH99.fDP95.vcf.gz
├── eut.per.R30.tot.fSEL.fTH99.vcf.gz
├── eut.per.w.R30.all.fSEL.fTH99.fDP95.vcf.gz
├── eut.per.w.R30.tot.fSEL.fTH99.vcf.gz
├── per-depth-post-filt.pdf
├── per-depth-pre-filt.pdf
├── per.nc-depth-post-filt.pdf
├── per.nc-depth-pre-filt.pdf
├── per.w-depth-post-filt.pdf
├── per.w-depth-pre-filt.pdf
└── variant_depth_filtering_fDP.Rmd
##################################################
### PSAMMOPHILOUS lineage dataset
##################################################
0_data_processing_filters/psammophilous/3_filtered_datasets/
├── blacklist.eph.fSEL.fTH99.txt
├── eph-depth-post-filt.pdf
├── eph-depth-pre-filt.pdf
├── eut.eph.R30.all.fSEL.fTH99.fDP95.vcf.gz
├── eut.eph.R30.tot.fSEL.fTH99.vcf.gz
└── variant_depth_filtering_fDP.Rmd
Includes the filtered *.vcf
files. Any further filtering (i.e., extra filtering for missing data, minor allele frequency threshold and thinning to one variant per RAD locus) for downstream analyses is applied to these files, depending on the requirements of the relevant software. Additionally, the directory includes:
- The
blacklist.*
text file, which is a combination of the two blacklists generated in the previous steps (i.e., the ones included in directories1_bayescan_outliers
and2_theta_outliers
) and it is used in the second run of thepopulations
program as implemented in theSTACKS2
pipeline to remove the relevant outliers. - A set of
*.pdf
files which include box-plot graphs of the pre- and post-filtering variant depth distribution per specimen. - A
variant_depth_filtering_fDP.Rmd
R-markdown file with the code to apply sample-oriented filtering on variant depth.
1_genetic_variation_statistics.zip
The compressed master directory 1_genetic_variation_statistics
, contains a total of 30 directories and 220 files. The directory tree structure is visible bellow (files not shown):
1_genetic_variation_statistics/
├── 0_genetic_diversity_pi_pixy
│ ├── 0_vcf_files
│ │ ├── geophilous
│ │ └── psammophilous
│ ├── 1_info
│ │ ├── geophilous
│ │ └── psammophilous
│ └── 2_pixy_raw_output
│ ├── geophilous
│ └── psammophilous
├── 1_genetic_divergence_Dxy_pixy
│ ├── 0_vcf_files
│ │ ├── geophilous
│ │ └── psammophilous
│ ├── 1_info
│ │ ├── geophilous
│ │ └── psammophilous
│ └── 2_pixy_raw_output
│ ├── geophilous
│ └── psammophilous
└── 2_genetic_differentiation _Fst_pixy
├── 0_vcf_files
│ ├── geophilous
│ └── psammophilous
├── 1_info
│ ├── geophilous
│ └── psammophilous
└── 2_pixy_raw_output
├── geophilous
└── psammophilous
Sub-directories
0_genetic_diversity_pi_pixy
1_genetic_variation_statistics/0_genetic_diversity_pi_pixy/
├── 0_vcf_files
│ ├── geophilous
│ │ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.AD1.c50.mod.vcf.gz
│ │ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.AD1.c50.mod.vcf.gz.tbi
│ │ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.AD2.c50.mod.vcf.gz
│ │ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.AD2.c50.mod.vcf.gz.tbi
│ │ ├-- [...]
│ │ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.TI2.c50.mod.vcf.gz
│ │ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.TI2.c50.mod.vcf.gz.tbi
│ │ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.TI3.c50.mod.vcf.gz
│ │ └── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.TI3.c50.mod.vcf.gz.tbi
│ └── psammophilous
│ ├── eut.eph.R30.tot.fSEL.fTH99.fDP95.AD5.c50.mod.vcf.gz
│ ├── eut.eph.R30.tot.fSEL.fTH99.fDP95.AD5.c50.mod.vcf.gz.tbi
│ ├── eut.eph.R30.tot.fSEL.fTH99.fDP95.AD6.c50.mod.vcf.gz
│ ├── eut.eph.R30.tot.fSEL.fTH99.fDP95.AD6.c50.mod.vcf.gz.tbi
│ ├-- [...]
│ ├── eut.eph.R30.tot.fSEL.fTH99.fDP95.TI5.c50.mod.vcf.gz
│ ├── eut.eph.R30.tot.fSEL.fTH99.fDP95.TI5.c50.mod.vcf.gz.tbi
│ ├── eut.eph.R30.tot.fSEL.fTH99.fDP95.TI6.c50.mod.vcf.gz
│ └── eut.eph.R30.tot.fSEL.fTH99.fDP95.TI6.c50.mod.vcf.gz.tbi
├── 1_info
│ ├── geophilous
│ │ ├── popmap.AD1.tsv
│ │ ├── popmap.AD2.tsv
│ │ ├-- [...]
│ │ ├── popmap.TI2.tsv
│ │ └── popmap.TI3.tsv
│ └── psammophilous
│ ├── popmap.AD5.tsv
│ ├── popmap.AD6.tsv
│ ├-- [...]
│ ├── popmap.TI5.tsv
│ └── popmap.TI6.tsv
└── 2_pixy_raw_output
├── geophilous
│ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.AD1.c50.mod_pi.txt
│ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.AD2.c50.mod_pi.txt
│ ├-- [...]
│ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.TI2.c50.mod_pi.txt
│ └── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.TI3.c50.mod_pi.txt
└── psammophilous
├── eut.eph.R30.tot.fSEL.fTH99.fDP95.AD5.c50.mod_pi.txt
├── eut.eph.R30.tot.fSEL.fTH99.fDP95.AD6.c50.mod_pi.txt
├-- [...]
├── eut.eph.R30.tot.fSEL.fTH99.fDP95.TI5.c50.mod_pi.txt
└── eut.eph.R30.tot.fSEL.fTH99.fDP95.TI6.c50.mod_pi.txt
It includes all input files to calculate genetic diversity (π) using the PIXY v1.2.7
software. It is divided into:
- The
0_vcf_files
directory, which includes the input*.vcf
files (one per deme) and the corresponding*.tbi
files (indexed*.vcf
files using thetabix
software). Both variant and invariant sites that are present in at least 50% of the individuals of each deme are included. The files have been generated following the guidelines of thePIXY
documentation (see https://pixy.readthedocs.io/en/latest/#) - The
1_info
directory, which includes population maps for each deme. - The
2_pixy_raw_output
directory, which includes the raw output files ofPIXY
. These files can be used to estimate average genetic diversity per deme.
1_genetic_divergence_Dxy_pixy
1_genetic_variation_statistics/1_genetic_divergence_Dxy_pixy/
├── 0_vcf_files
│ ├── geophilous
│ │ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz
│ │ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz.tbi
│ │ ├── eut.per.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz
│ │ ├── eut.per.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz.tbi
│ │ ├── eut.per.w.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz
│ │ └── eut.per.w.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz.tbi
│ └── psammophilous
│ ├── eut.eph.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz
│ └── eut.eph.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz.tbi
├── 1_info
│ ├── geophilous
│ │ ├── popmap.per.nc.pairs.tsv
│ │ ├── popmap.per.pairs.tsv
│ │ └── popmap.per.w.pairs.tsv
│ └── psammophilous
│ └── popmap.eph.pairs.tsv
└── 2_pixy_raw_output
├── geophilous
│ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.c70.mod_dxy.txt
│ ├── eut.per.R30.tot.fSEL.fTH99.fDP95.c70.mod_dxy.txt
│ └── eut.per.w.R30.tot.fSEL.fTH99.fDP95.c70.mod_dxy.txt
└── psammophilous
└── eut.eph.R30.tot.fSEL.fTH99.fDP95.c70.mod_dxy.txt
It includes all input files to calculate pairwise genetic divergence (Dxy) using the PIXY
software. It is divided into:
- The
0_vcf_files
directory, which includes the input*.vcf
files and the corresponding*.tbi
files. Both variant and invariant sites that are present in at least 70% of the individuals are included. The files have been generated following the guidelines of thePIXY
documentation (see https://pixy.readthedocs.io/en/latest/#) - The
1_info
directory, which includes population maps. - The
2_pixy_raw_output
directory, which includes the raw output files ofPIXY
. These files can be used to estimate average pairwise genetic divergence.
2_genetic_differentiation_Fst_pixy
1_genetic_variation_statistics/2_genetic_differentiation _Fst_pixy/
├── 0_vcf_files
│ ├── geophilous
│ │ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz
│ │ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz.tbi
│ │ ├── eut.per.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz
│ │ ├── eut.per.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz.tbi
│ │ ├── eut.per.w.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz
│ │ └── eut.per.w.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz.tbi
│ └── psammophilous
│ ├── eut.eph.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz
│ └── eut.eph.R30.tot.fSEL.fTH99.fDP95.c70.mod.vcf.gz.tbi
├── 1_info
│ ├── geophilous
│ │ ├── popmap.per.nc.pairs.tsv
│ │ ├── popmap.per.pairs.tsv
│ │ └── popmap.per.w.pairs.tsv
│ └── psammophilous
│ └── popmap.eph.pairs.tsv
└── 2_pixy_raw_output
├── geophilous
│ ├── eut.per.nc.R30.tot.fSEL.fTH99.fDP95.c70.mod_fstHUD.txt
│ ├── eut.per.R30.tot.fSEL.fTH99.fDP95.c70.mod_fstHUD.txt
│ └── eut.per.w.R30.tot.fSEL.fTH99.fDP95.c70.mod_fstHUD.txt
└── psammophilous
└── eut.eph.R30.tot.fSEL.fTH99.fDP95.c70.mod_fstHUD.txt
It includes all input files to calculate pairwise genetic differentiation (Fst) using the PIXY
software. It is divided into:
- The
0_vcf_files
directory, which includes the input*.vcf
files and the corresponding*.tbi
files. Both variant and invariant sites that are present in at least 70% of the individuals are included. The files have been generated following the guidelines of thePIXY
documentation (see https://pixy.readthedocs.io/en/latest/#) - The
1_info
directory, which includes population maps. - The
2_pixy_raw_output
directory, which includes the raw output files ofPIXY
. These files can be used to estimate average pairwise genetic differentiation.
2_genetic_structure.zip
The compressed master directory 2_genetic_structure
, contains a total of 18 directories and 615 files. It is divided into two main directories (one for each dataset/lineage) with essentially identical sub-directory structure. The directory tree structure is visible bellow (files not shown):
2_genetic_structure/
├── geophilous
│ ├── 0_structure_analyses
│ │ ├── all_islands
│ │ │ ├── input
│ │ │ └── k_runs
│ │ └── eastern_islands
│ │ ├── input
│ │ └── k_runs
│ └── 1_pca_analyses
│ ├── all_islands
│ │ └── input
│ └── eastern_islands
│ └── input
└── psammophilous
├── 0_structure_analyses
│ ├── input
│ └── k_runs
└── 1_pca_analyses
Sub-directories
0_structure_analyses
##################################################
### GEOPHILOUS lineage dataset
##################################################
2_genetic_structure/geophilous/0_structure_analyses/
├── all_islands
│ ├── input
│ │ ├── eut.per.R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp.str
│ │ ├── extraparams
│ │ ├── mainparams
│ │ └── popmap.per.pairs.tsv
│ └── k_runs
│ ├── eut_per_R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp_K1.1_f
│ ├── eut_per_R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp_K1.2_f
│ ├-- [...]
│ ├── eut_per_R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp_K10.19_f
│ └── eut_per_R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp_K10.20_f
└── eastern_islands
├── input
│ ├── eut.per.nc.R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp.str
│ ├── extraparams
│ ├── mainparams
│ └── popmap.per.nc.pairs.tsv
└── k_runs
├── eut_per.nc_R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp_K1.1_f
├── eut_per.nc_R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp_K1.2_f
├-- [...]
├── eut_per.nc_R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp_K10.19_f
└── eut_per.nc_R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp_K10.20_f
##################################################
### PSAMMOPHILOUS lineage dataset
##################################################
2_genetic_structure/psammophilous/0_structure_analyses/
├── input
│ ├── eut.eph.R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp.str
│ ├── extraparams
│ ├── mainparams
│ └── popmap.eph.pairs.tsv
└── k_runs
├── eut_eph_R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp_K1.1_f
├── eut_eph_R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp_K1.2_f
├-- [...]
├── eut_eph_R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp_K10.19_f
└── eut_eph_R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp_K10.20_f
It includes all input files to infer population genetic structure using the STRUCTURE v2.3.4
software. For each dataset/lineage, there are two sub-directories:
- The
input
directory, which includes:- The input
*.str
file for theSTRUCTURE
analysis that includes one variant per RAD locus (present in at least 80% of all individuals). Each diploid individual is represented by two consecutive rows. The input file includes the following columns:- 1st column: Name of individual
- 2nd column: Population information
- Loci columns: Genotype data. The coding of information follows the pattern:
-9
=missing data1
= A2
= T3
= G4
= C
- The input
- The
k_runs
directory, which includes the output of 10 differentSTRUCTURE
runs (K=1-10) with 20 replicates per run.
Note, that for the geophilous lineage, there are two directories, the all_islands
which includes the aforementioned files for the analysis including all geophilous demes, and the eastern islands
which includes the same type of files after excluding individuals sampled from the western islands and repeating the analysis.
1_pca_analyses
##################################################
### GEOPHILOUS lineage dataset
##################################################
2_genetic_structure/geophilous/1_pca_analyses/
├── all_islands
│ └── input
│ └── eut.per.R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp.vcf.gz
└── eastern_islands
└── input
└── eut.per.nc.R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp.vcf.gz
##################################################
### PSAMMOPHILOUS lineage dataset
##################################################
2_genetic_structure/psammophilous/1_pca_analyses/
└── eut.eph.R30.all.fSEL.fTH99.fDP95.c80.mac3.onesnp.vcf.gz
It includes the *.vcf
input files to perform a complementary PCA analyses using the adegenet
R package, in order to evaluate the results of the aforementioned STRUCTURE
analyses.
3_genealogical_inference_svdquartets.zip
The compressed master directory 3_genealogical_inference_svdquartets
, contains a total of 2 directories and 4 files. It is divided into two main directories (one for each dataset/lineage). The directory tree structure is visible bellow:
3_genealogical_inference_svdquartets/
├── geophilous
│ ├── eut.per.R30.all.fSEL.fTH99.fDP95.c30.mac3.onesnp.taxpart.nexus
│ └── eut.per.R30.all.fSEL.fTH99.fDP95.c30.mac3.onesnp.vcf.gz
└── psammophilous
├── eut.eph.R30.all.fSEL.fTH99.fDP95.c30.mac3.onesnp.taxpart.nexus
└── eut.eph.R30.all.fSEL.fTH99.fDP95.c30.mac3.onesnp.vcf.gz
It includes the *.nexus
input files for the SVDQuartets
analyses and the corresponding *.vcf
files that were used to generate them.
4_demographic_modelling_fastsimcoal2.zip
The compressed master directory 4_demographic_modelling_fastsimcoal2
, contains a total of 706 directories and 1,404 files. It is divided into two main directories (one for each dataset/lineage) with identical sub-directory structure. The directory tree structure is visible bellow (files not shown):
4_demographic_modelling_fastsimcoal2/
├── geophilous
│ ├── single_deme_models
│ │ ├── 0_vcf_files
│ │ └── 1_fastsimcoal2
│ │ ├── 0_sfs_data
│ │ ├── 1_info
│ │ └── 2_models
│ │ ├── AD1
│ │ │ ├── per1.constant
│ │ │ ├── per2.expansion
│ │ │ ├── per3.contraction
│ │ │ └── per4.instbot
│ │ ├── AD2
│ │ │ ├── per1.constant
│ │ │ ├── per2.expansion
│ │ │ ├── per3.contraction
│ │ │ └── per4.instbot
│ │ ├-- [...]
│ │ ├── TI2
│ │ │ ├── per1.constant
│ │ │ ├── per2.expansion
│ │ │ ├── per3.contraction
│ │ │ └── per4.instbot
│ │ └── TI3
│ │ ├── per1.constant
│ │ ├── per2.expansion
│ │ ├── per3.contraction
│ │ └── per4.instbot
│ └── two_deme_models
│ ├── 0_vcf_files
│ └── 1_fastsimcoal2
│ ├── 0_sfs_data
│ ├── 1_info
│ └── 2_models
│ ├── AD1AD2
│ │ ├── iso1.complete_iso
│ │ ├── iso2.contemporary_iso
│ │ ├── mig1.contemporary_mig
│ │ └── mig2.continuous_mig
│ ├── AD1AD3
│ │ ├── iso1.complete_iso
│ │ ├── iso2.contemporary_iso
│ │ ├── mig1.contemporary_mig
│ │ └── mig2.continuous_mig
│ ├-- [...]
│ ├── TI2SY3
│ │ ├── iso1.complete_iso
│ │ ├── iso2.contemporary_iso
│ │ ├── mig1.contemporary_mig
│ │ └── mig2.continuous_mig
│ └── TI2TI3
│ ├── iso1.complete_iso
│ ├── iso2.contemporary_iso
│ ├── mig1.contemporary_mig
│ └── mig2.continuous_mig
└── psammophilous
├── single_deme_models
│ ├── 0_vcf_files
│ └── 1_fastsimcoal2
│ ├── 0_sfs_data
│ ├── 1_info
│ └── 2_models
│ ├── AD5
│ │ ├── per1.constant
│ │ ├── per2.expansion
│ │ ├── per3.contraction
│ │ └── per4.instbot
│ ├── AD6
│ │ ├── per1.constant
│ │ ├── per2.expansion
│ │ ├── per3.contraction
│ │ └── per4.instbot
│ ├-- [...]
│ ├── TI5
│ │ ├── per1.constant
│ │ ├── per2.expansion
│ │ ├── per3.contraction
│ │ └── per4.instbot
│ └── TI6
│ ├── per1.constant
│ ├── per2.expansion
│ ├── per3.contraction
│ └── per4.instbot
└── two_deme_models
├── 0_vcf_files
└── 1_fastsimcoal2
├── 0_sfs_data
├── 1_info
└── 2_models
├── AD5AD6
│ ├── iso1.complete_iso
│ ├── iso2.contemporary_iso
│ ├── mig1.contemporary_mig
│ └── mig2.continuous_mig
├── AD5AD8
│ ├── iso1.complete_iso
│ ├── iso2.contemporary_iso
│ ├── mig1.contemporary_mig
│ └── mig2.continuous_mig
├-- [...]
├── TI6SY4
│ ├── iso1.complete_iso
│ ├── iso2.contemporary_iso
│ ├── mig1.contemporary_mig
│ └── mig2.continuous_mig
└── TI6SY5
├── iso1.complete_iso
├── iso2.contemporary_iso
├── mig1.contemporary_mig
└── mig2.continuous_mig
Sub-directories
single_deme_models
##################################################
### GEOPHILOUS lineage dataset
##################################################
4_demographic_modelling_fastsimcoal2/geophilous/single_deme_models/
├── 0_vcf_files
│ ├── eut.per.nc.R30.all.fSEL.fTH99.fDP95.AD1.c70.onesnp.vcf.gz
│ ├── eut.per.nc.R30.all.fSEL.fTH99.fDP95.AD2.c70.onesnp.vcf.gz
│ ├-- [...]
│ ├── eut.per.nc.R30.all.fSEL.fTH99.fDP95.TI2.c70.onesnp.vcf.gz
│ └── eut.per.nc.R30.all.fSEL.fTH99.fDP95.TI3.c70.onesnp.vcf.gz
└── 1_fastsimcoal2
├── 0_sfs_data
│ ├── AD1_MAFpop0.obs
│ ├── AD2_MAFpop0.obs
│ ├-- [...]
│ ├── TI2_MAFpop0.obs
│ └── TI3_MAFpop0.obs
├── 1_info
│ ├── list.models.txt
│ ├── list.popfile.txt
│ ├── template.per1.constant.est
│ ├── template.per1.constant.tpl
│ ├── template.per2.expansion.est
│ ├── template.per2.expansion.tpl
│ ├── template.per3.contraction.est
│ ├── template.per3.contraction.tpl
│ ├── template.per4.instbot.est
│ └── template.per4.instbot.tpl
├── 2_models
│ ├── AD1
│ │ ├── per1.constant
│ │ │ ├── AD1.per1.constant.est
│ │ │ └── AD1.per1.constant.tpl
│ │ ├── per2.expansion
│ │ │ ├── AD1.per2.expansion.est
│ │ │ └── AD1.per2.expansion.tpl
│ │ ├── per3.contraction
│ │ │ ├── AD1.per3.contraction.est
│ │ │ └── AD1.per3.contraction.tpl
│ │ └── per4.instbot
│ │ ├── AD1.per4.instbot.est
│ │ └── AD1.per4.instbot.tpl
│ ├── AD2
│ │ ├── per1.constant
│ │ │ ├── AD2.per1.constant.est
│ │ │ └── AD2.per1.constant.tpl
│ │ ├── per2.expansion
│ │ │ ├── AD2.per2.expansion.est
│ │ │ └── AD2.per2.expansion.tpl
│ │ ├── per3.contraction
│ │ │ ├── AD2.per3.contraction.est
│ │ │ └── AD2.per3.contraction.tpl
│ │ └── per4.instbot
│ │ ├── AD2.per4.instbot.est
│ │ └── AD2.per4.instbot.tpl
│ ├-- [...]
│ ├── TI2
│ │ ├── per1.constant
│ │ │ ├── TI2.per1.constant.est
│ │ │ └── TI2.per1.constant.tpl
│ │ ├── per2.expansion
│ │ │ ├── TI2.per2.expansion.est
│ │ │ └── TI2.per2.expansion.tpl
│ │ ├── per3.contraction
│ │ │ ├── TI2.per3.contraction.est
│ │ │ └── TI2.per3.contraction.tpl
│ │ └── per4.instbot
│ │ ├── TI2.per4.instbot.est
│ │ └── TI2.per4.instbot.tpl
│ └── TI3
│ ├── per1.constant
│ │ ├── TI3.per1.constant.est
│ │ └── TI3.per1.constant.tpl
│ ├── per2.expansion
│ │ ├── TI3.per2.expansion.est
│ │ └── TI3.per2.expansion.tpl
│ ├── per3.contraction
│ │ ├── TI3.per3.contraction.est
│ │ └── TI3.per3.contraction.tpl
│ └── per4.instbot
│ ├── TI3.per4.instbot.est
│ └── TI3.per4.instbot.tpl
└── setup.models.sh
##################################################
### PSAMMOPHILOUS lineage dataset
##################################################
4_demographic_modelling_fastsimcoal2/psammophilous/single_deme_models/
├── 0_vcf_files
│ ├── eut.eph.R30.all.fSEL.fTH99.fDP95.AD5.c70.onesnp.vcf.gz
│ ├── eut.eph.R30.all.fSEL.fTH99.fDP95.AD6.c70.onesnp.vcf.gz
│ ├-- [...]
│ ├── eut.eph.R30.all.fSEL.fTH99.fDP95.TI5.c70.onesnp.vcf.gz
│ └── eut.eph.R30.all.fSEL.fTH99.fDP95.TI6.c70.onesnp.vcf.gz
└── 1_fastsimcoal2
├── 0_sfs_data
│ ├── AD5_MAFpop0.obs
│ ├── AD6_MAFpop0.obs
│ ├-- [...]
│ ├── TI5_MAFpop0.obs
│ └── TI6_MAFpop0.obs
├── 1_info
│ ├── list.models.txt
│ ├── list.popfile.txt
│ ├── template.per1.constant.est
│ ├── template.per1.constant.tpl
│ ├── template.per2.expansion.est
│ ├── template.per2.expansion.tpl
│ ├── template.per3.contraction.est
│ ├── template.per3.contraction.tpl
│ ├── template.per4.instbot.est
│ └── template.per4.instbot.tpl
├── 2_models
│ ├── AD5
│ │ ├── per1.constant
│ │ │ ├── AD5.per1.constant.est
│ │ │ └── AD5.per1.constant.tpl
│ │ ├── per2.expansion
│ │ │ ├── AD5.per2.expansion.est
│ │ │ └── AD5.per2.expansion.tpl
│ │ ├── per3.contraction
│ │ │ ├── AD5.per3.contraction.est
│ │ │ └── AD5.per3.contraction.tpl
│ │ └── per4.instbot
│ │ ├── AD5.per4.instbot.est
│ │ └── AD5.per4.instbot.tpl
│ ├── AD6
│ │ ├── per1.constant
│ │ │ ├── AD6.per1.constant.est
│ │ │ └── AD6.per1.constant.tpl
│ │ ├── per2.expansion
│ │ │ ├── AD6.per2.expansion.est
│ │ │ └── AD6.per2.expansion.tpl
│ │ ├── per3.contraction
│ │ │ ├── AD6.per3.contraction.est
│ │ │ └── AD6.per3.contraction.tpl
│ │ └── per4.instbot
│ │ ├── AD6.per4.instbot.est
│ │ └── AD6.per4.instbot.tpl
│ ├-- [...]
│ ├── TI5
│ │ ├── per1.constant
│ │ │ ├── TI5.per1.constant.est
│ │ │ └── TI5.per1.constant.tpl
│ │ ├── per2.expansion
│ │ │ ├── TI5.per2.expansion.est
│ │ │ └── TI5.per2.expansion.tpl
│ │ ├── per3.contraction
│ │ │ ├── TI5.per3.contraction.est
│ │ │ └── TI5.per3.contraction.tpl
│ │ └── per4.instbot
│ │ ├── TI5.per4.instbot.est
│ │ └── TI5.per4.instbot.tpl
│ └── TI6
│ ├── per1.constant
│ │ ├── TI6.per1.constant.est
│ │ └── TI6.per1.constant.tpl
│ ├── per2.expansion
│ │ ├── TI6.per2.expansion.est
│ │ └── TI6.per2.expansion.tpl
│ ├── per3.contraction
│ │ ├── TI6.per3.contraction.est
│ │ └── TI6.per3.contraction.tpl
│ └── per4.instbot
│ ├── TI6.per4.instbot.est
│ └── TI6.per4.instbot.tpl
└── setup.models.sh
It includes all input files for single-deme demographic analyses as implemented in FASTSIMCOAL2 v27093
. For each dataset/lineage, there are two sub-directories and a relevant script:
- The
0_vcf_files
directory, which includes a single*.vcf
file per deme. These are the input files for theeasySFS.py
script (https://github.com/isaacovercast/easySFS), which was used to generate the deme-specific site frequency spectra (SFS). - The
1_fastsimcoal2
directory, which is further divided into:- The
0_sfs_data
directory includes one SFS file infastsimcoal2
format per deme, as estimated by theeasySFS.py
script. - The
1_info
directory includes:- A
list.models.txt
file with all model names - A
list.popfile.txt
file with three columns, where:- Column 1: deme identifiers
- Column 2: deme effective population size (Ne), as estimated based on the Ne=(θ/4μ) equation, approximating θ using the nucleotide diversity (π) values from the PIXY analyses and an assumed mutation rate (μ) of 2.8×10⁻⁹.
- Column 3: sample size after projecting down, as estimated by the
easySFS.py
script.
- Template
*.tpl
and*.est
files forfastsimcoal2
that are used by the providedsetup.models.sh
bash script to automatically create the four demographic models per deme
- A
- The
2_models
directory, which includes one sub-directory per deme, with the needed*.tpl
and*.est
files to runfastsimcoal2
. - The
setup.models.sh
bash script, which uses the files in1_info
directory to automatically create the2_models
directory and its contents.
- The
two_deme_models
##################################################
### GEOPHILOUS lineage dataset
##################################################
4_demographic_modelling_fastsimcoal2/geophilous/two_deme_models/
├── 0_vcf_files
│ ├── eut.per.nc.R30.all.fSEL.fTH99.fDP95.AD1AD2.c70.onesnp.vcf.gz
│ ├── eut.per.nc.R30.all.fSEL.fTH99.fDP95.AD1AD3.c70.onesnp.vcf.gz
│ ├-- [...]
│ ├── eut.per.nc.R30.all.fSEL.fTH99.fDP95.TI2SY3.c70.onesnp.vcf.gz
│ └── eut.per.nc.R30.all.fSEL.fTH99.fDP95.TI2TI3.c70.onesnp.vcf.gz
└── 1_fastsimcoal2
├── 0_sfs_data
│ ├── AD1AD2.c70.onesnp_jointMAFpop1_0.obs
│ ├── AD1AD3.c70.onesnp_jointMAFpop1_0.obs
│ ├-- [...]
│ ├── TI2SY3.c70.onesnp_jointMAFpop1_0.obs
│ └── TI2TI3.c70.onesnp_jointMAFpop1_0.obs
├── 1_info
│ ├── list.models.txt
│ ├── list.popfile.txt
│ ├── template.iso1.complete_iso.est
│ ├── template.iso1.complete_iso.tpl
│ ├── template.iso2.contemporary_iso.est
│ ├── template.iso2.contemporary_iso.tpl
│ ├── template.mig1.contemporary_mig.est
│ ├── template.mig1.contemporary_mig.tpl
│ ├── template.mig2.continuous_mig.est
│ └── template.mig2.continuous_mig.tpl
├── 2_models
│ ├── AD1AD2
│ │ ├── iso1.complete_iso
│ │ │ ├── AD1AD2.iso1.complete_iso.est
│ │ │ └── AD1AD2.iso1.complete_iso.tpl
│ │ ├── iso2.contemporary_iso
│ │ │ ├── AD1AD2.iso2.contemporary_iso.est
│ │ │ └── AD1AD2.iso2.contemporary_iso.tpl
│ │ ├── mig1.contemporary_mig
│ │ │ ├── AD1AD2.mig1.contemporary_mig.est
│ │ │ └── AD1AD2.mig1.contemporary_mig.tpl
│ │ └── mig2.continuous_mig
│ │ ├── AD1AD2.mig2.continuous_mig.est
│ │ └── AD1AD2.mig2.continuous_mig.tpl
│ ├── AD1AD3
│ │ ├── iso1.complete_iso
│ │ │ ├── AD1AD3.iso1.complete_iso.est
│ │ │ └── AD1AD3.iso1.complete_iso.tpl
│ │ ├── iso2.contemporary_iso
│ │ │ ├── AD1AD3.iso2.contemporary_iso.est
│ │ │ └── AD1AD3.iso2.contemporary_iso.tpl
│ │ ├── mig1.contemporary_mig
│ │ │ ├── AD1AD3.mig1.contemporary_mig.est
│ │ │ └── AD1AD3.mig1.contemporary_mig.tpl
│ │ └── mig2.continuous_mig
│ │ ├── AD1AD3.mig2.continuous_mig.est
│ │ └── AD1AD3.mig2.continuous_mig.tpl
│ ├-- [...]
│ ├── TI2SY3
│ │ ├── iso1.complete_iso
│ │ │ ├── TI2SY3.iso1.complete_iso.est
│ │ │ └── TI2SY3.iso1.complete_iso.tpl
│ │ ├── iso2.contemporary_iso
│ │ │ ├── TI2SY3.iso2.contemporary_iso.est
│ │ │ └── TI2SY3.iso2.contemporary_iso.tpl
│ │ ├── mig1.contemporary_mig
│ │ │ ├── TI2SY3.mig1.contemporary_mig.est
│ │ │ └── TI2SY3.mig1.contemporary_mig.tpl
│ │ └── mig2.continuous_mig
│ │ ├── TI2SY3.mig2.continuous_mig.est
│ │ └── TI2SY3.mig2.continuous_mig.tpl
│ └── TI2TI3
│ ├── iso1.complete_iso
│ │ ├── TI2TI3.iso1.complete_iso.est
│ │ └── TI2TI3.iso1.complete_iso.tpl
│ ├── iso2.contemporary_iso
│ │ ├── TI2TI3.iso2.contemporary_iso.est
│ │ └── TI2TI3.iso2.contemporary_iso.tpl
│ ├── mig1.contemporary_mig
│ │ ├── TI2TI3.mig1.contemporary_mig.est
│ │ └── TI2TI3.mig1.contemporary_mig.tpl
│ └── mig2.continuous_mig
│ ├── TI2TI3.mig2.continuous_mig.est
│ └── TI2TI3.mig2.continuous_mig.tpl
└── setup.models.sh
##################################################
### PSAMMOPHILOUS lineage dataset
##################################################
4_demographic_modelling_fastsimcoal2/psammophilous/two_deme_models/
├── 0_vcf_files
│ ├── eut.eph.R30.all.fSEL.fTH99.fDP95.AD5AD6.c70.onesnp.vcf.gz
│ ├── eut.eph.R30.all.fSEL.fTH99.fDP95.AD5AD8.c70.onesnp.vcf.gz
│ ├-- [...]
│ ├── eut.eph.R30.all.fSEL.fTH99.fDP95.TI6SY4.c70.onesnp.vcf.gz
│ └── eut.eph.R30.all.fSEL.fTH99.fDP95.TI6SY5.c70.onesnp.vcf.gz
└── 1_fastsimcoal2
├── 0_sfs_data
│ ├── AD5AD6.c70.onesnp_jointMAFpop1_0.obs
│ ├── AD5AD8.c70.onesnp_jointMAFpop1_0.obs
│ ├-- [...]
│ ├── TI6SY4.c70.onesnp_jointMAFpop1_0.obs
│ └── TI6SY5.c70.onesnp_jointMAFpop1_0.obs
├── 1_info
│ ├── list.models.txt
│ ├── list.popfile.txt
│ ├── template.iso1.complete_iso.est
│ ├── template.iso1.complete_iso.tpl
│ ├── template.iso2.contemporary_iso.est
│ ├── template.iso2.contemporary_iso.tpl
│ ├── template.mig1.contemporary_mig.est
│ ├── template.mig1.contemporary_mig.tpl
│ ├── template.mig2.continuous_mig.est
│ └── template.mig2.continuous_mig.tpl
├── 2_models
│ ├── AD5AD6
│ │ ├── iso1.complete_iso
│ │ │ ├── AD5AD6.iso1.complete_iso.est
│ │ │ └── AD5AD6.iso1.complete_iso.tpl
│ │ ├── iso2.contemporary_iso
│ │ │ ├── AD5AD6.iso2.contemporary_iso.est
│ │ │ └── AD5AD6.iso2.contemporary_iso.tpl
│ │ ├── mig1.contemporary_mig
│ │ │ ├── AD5AD6.mig1.contemporary_mig.est
│ │ │ └── AD5AD6.mig1.contemporary_mig.tpl
│ │ └── mig2.continuous_mig
│ │ ├── AD5AD6.mig2.continuous_mig.est
│ │ └── AD5AD6.mig2.continuous_mig.tpl
│ ├── AD5AD8
│ │ ├── iso1.complete_iso
│ │ │ ├── AD5AD8.iso1.complete_iso.est
│ │ │ └── AD5AD8.iso1.complete_iso.tpl
│ │ ├── iso2.contemporary_iso
│ │ │ ├── AD5AD8.iso2.contemporary_iso.est
│ │ │ └── AD5AD8.iso2.contemporary_iso.tpl
│ │ ├── mig1.contemporary_mig
│ │ │ ├── AD5AD8.mig1.contemporary_mig.est
│ │ │ └── AD5AD8.mig1.contemporary_mig.tpl
│ │ └── mig2.continuous_mig
│ │ ├── AD5AD8.mig2.continuous_mig.est
│ │ └── AD5AD8.mig2.continuous_mig.tpl
│ ├-- [...]
│ ├── TI6SY4
│ │ ├── iso1.complete_iso
│ │ │ ├── TI6SY4.iso1.complete_iso.est
│ │ │ └── TI6SY4.iso1.complete_iso.tpl
│ │ ├── iso2.contemporary_iso
│ │ │ ├── TI6SY4.iso2.contemporary_iso.est
│ │ │ └── TI6SY4.iso2.contemporary_iso.tpl
│ │ ├── mig1.contemporary_mig
│ │ │ ├── TI6SY4.mig1.contemporary_mig.est
│ │ │ └── TI6SY4.mig1.contemporary_mig.tpl
│ │ └── mig2.continuous_mig
│ │ ├── TI6SY4.mig2.continuous_mig.est
│ │ └── TI6SY4.mig2.continuous_mig.tpl
│ └── TI6SY5
│ ├── iso1.complete_iso
│ │ ├── TI6SY5.iso1.complete_iso.est
│ │ └── TI6SY5.iso1.complete_iso.tpl
│ ├── iso2.contemporary_iso
│ │ ├── TI6SY5.iso2.contemporary_iso.est
│ │ └── TI6SY5.iso2.contemporary_iso.tpl
│ ├── mig1.contemporary_mig
│ │ ├── TI6SY5.mig1.contemporary_mig.est
│ │ └── TI6SY5.mig1.contemporary_mig.tpl
│ └── mig2.continuous_mig
│ ├── TI6SY5.mig2.continuous_mig.est
│ └── TI6SY5.mig2.continuous_mig.tpl
└── setup.models.sh
It includes all input files for two-deme demographic analyses as implemented in FASTSIMCOAL2 v27093
. For each dataset/lineage, there are two sub-directories and a relevant script:
- The
0_vcf_files
directory, which includes a single*.vcf
file per deme pair. These are the input files for theeasySFS.py
script, which were used to generate the joint site frequency spectra (jSFS) of the analysed deme pairs. - The
1_fastsimcoal2
directory, which is further divided into:- The
0_sfs_data
directory includes one jSFS file infastsimcoal2
format per deme pair, as estimated by theeasySFS.py
script. - The
1_info
directory includes:- A
list.models.txt
file with all demographic model names - A
list.popfile.txt
file with two columns, where:- Column 1: deme pair identifiers
- Column 2: effective population size (Ne), as estimated based on the Ne=(θ/4μ) equation, approximating θ using the nucleotide diversity (π) values from the PIXY analyses and an assumed mutation rate (μ) of 2.8×10⁻⁹. This effective population size corresponds to one of the two demes in the pair and was used as a fixed value in the demographic analyses.
- Template
*.tpl
and*.est
files forfastsimcoal2
that are used by the providedsetup.models.sh
bash script to automatically create the four demographic models per deme pair
- A
- The
2_models
directory, which includes one sub-directory per deme pair, with the needed*.tpl
and*.est
files to runfastsimcoal2
. - The
setup.models.sh
bash script, which uses the files in1_info
directory to automatically create the2_models
directory and its contents.
- The
For a detailed description of data collection and processing, please refer to the relevant manuscript:
Meramveliotakis, E., Ortego, J., Anastasiou, I., Vogler, A. P., Papadopoulou, A. (ACCEPTED; Sep-2024) Habitat association predicts population connectivity and persistence in flightless beetles: a population genomics approach within a dynamic archipelago. Molecular Ecology
Briefly, 318 samples were collected from 47 sampling sites (24 demes of the compact-soil "geophilous" Eutagenia clade and 23 demes of the sand-obligate "psammophilous" Eutagenia clade) across 14 Cyclades islands (Aegean, Greece). Each specimen was individually preserved in 100% ethanol and stored at -20°C. DNA extraction was performed following a commercial bead-based protocol (Biosprint® 96 DNA Blood kit, Qiagen®) as implemented in the automated KingFisher Flex system (Thermo Fisher scientific). Library preparation followed the double-digest Restriction site Associated DNA sequencing (ddRADseq) protocol of Peterson et al. (2012) (PLOS One, 7: e37135) with small modifications (Papadopoulou and Knowles, 2017; Evolution, 71: 2901-2917). DNA was double-digested with restriction enzymes EcoRI and Msel, unique barcodes (10bp) and adaptors were ligated to the digested fragments and individually barcoded products were pooled into libraries of 72-80 samples each. Each library was size-selected between 350 to 450 bp using a Pippin Prep™ instrument (Sage Science Inc.). Following size-selection, the fragments were PCR-amplified (8-10 amplification cycles) using high-fidelity DNA polymerase (iProof™ , Bio-Rad) and they were sequenced on the Illumina HiSeq2500 platform (single-end, 150bp reads) at the Centre for Applied Genomics (SickKids, Toronto, Ontario, Canada). Demultiplexing of the raw Illumina reads, de novo assembly to putative loci and genotype calling, was performed using the STACKS2 v2.64 pipeline (Rochette et al., 2019; Molecular Ecology, 28: 4737-4754).