Parallel evolution despite low genetic diversity in three-spined sticklebacks
Data files
Mar 15, 2024 version files 14.09 GB
-
1-Mapping_and_calling.zip
-
2-ROH.zip
-
3-PCA_and_parallelism.zip
-
4-SFS_Stairway.zip
-
5-VCFs.zip
-
README.md
Abstract
The three-spined stickleback (Gasterosteus aculeatus) is a model organism for studies of parallel evolution in the wild; marine stickleback populations have repeatedly colonized and adapted to different brackish and freshwater habitats. Population genetic studies of European three-spined sticklebacks have usually been conducted only in high-latitude areas. Here, we analysed southern and northern European samples of marine and freshwater three-spined stickleback to test two hypotheses. First, southern European freshwater populations – which currently lack or have limited connection to marine populations – have lost genetic diversity due to population bottlenecks and inbreeding compared to their northern European counterparts. Second, the degree of genetic parallelism in response to freshwater colonisation is higher among northern than southern European populations as the latter have been isolated and likely subjected to strong genetic drift. The results show that southern populations exhibit lower genetic diversity but a higher degree of genetic parallelism than northern populations. Hence, they confirm the hypothesis that southern populations have lost genetic diversity, but this loss likely happened after they had already adapted to freshwater conditions, explaining the high degree of genetic parallelism in the south.
README: Parallel evolution despite low genetic diversity in three-spined sticklebacks
https://doi.org/10.5061/dryad.fxpnvx102
The raw data for this manuscript can be found in the NCBI Short Reads Archive (Bioproject PRJNA1074927, runs SRR27925730-SRR27925878 ). In this DRYAd repository you can find the scripts used to analyse the raw data, as well as the called genotype data in Variant Call Format (VCF):
1- The entire analytical pipeline, consisting of ten bash scripts, starting from genome indexing and ending with genotype calling and filtering. These scripts allow the recreation of the main VCF data files that are used in the analyses reported in the manuscript (see 5-VCF.zip folder).
2- Scripts for Runs of Homozygosity Analyses. These include a bash script to run the ROH analyses in bcftools and an R script to run the geographic test reported in the paper.
3- The R scripts to replicate the Principal Component Analyses and the Quadratic Discriminant Analyses reported in the manuscript, as well as their input data. Within this folder you can find, for each Linkage Disequilibrium cluster analysed, the SNP genotype data files (in 012 format, for the projected data), the covariance matrices produced by PCANGSD (for the non-projected data) as well as the principal component scores (for both the projected and non-projected data).
4- All scripts to generate the the site frequency spectra in ANGSD, the generated site frequency spectra for each population as well as the blueprint files to run the stairwayplot analyses.
5- All VCF files used in different analyses along with a log file showing filtering parameters (produced by the pipeline described in the folder 1-Mapping_and_calling.zip).