Raw nucleotide counts for clinal Misty Lake and stream stickleback samples
Data files
Dec 04, 2020 version files 25.36 GB
-
FreqSumFiles.tar.gz
Abstract
How ecological divergence causes strong reproductive isolation between populations in close geographic contact remains poorly understood at the genomic level. We here study this question in a stickleback fish population pair adapted to contiguous, ecologically different lake and stream habitats. Clinal whole-genome sequence data reveal numerous genome regions (nearly) fixed for alternative alleles over a distance of just a few hundred meters. This strong polygenic adaptive divergence must constitute a genome-wide barrier to gene flow because a steep cline in allele frequencies is observed across the entire genome, and because the cline center co-localizes with the habitat transition. Simulations confirm that such strong divergence can be maintained by polygenic selection despite high dispersal and small per-locus selection coefficients. Finally, comparing samples from near the habitat transition before and after an unusual ecological perturbation demonstrates the fragility of the balance between gene flow and selection. Overall, our study highlights the efficacy of divergent selection in maintaining reproductive isolation without physical isolation, and the analytical power of studying speciation at a fine eco-geographic and genomic scale.
Methods
This data set contains raw nucleotide counts across all genome-wide base positions for 13 total clinal stickleback field samples from the Misty Lake watershed on Vancouver Island, Canada (11 different clinal locations; one location is represented by 3 temporal replicates). Each sample represents a DNA pool obtained by combining equimolar DNA from dozens of individuals. The samples were Illumina-sequenced and aligned to the stickleback reference genome. Finally, the nucleotide counts were generated with the pileup R function applied to each of the 13 alignments.
Usage notes
For each sample, the nucleotide count data ('freqSum') are provided separately for each of the 21 chromosomes, plus for a scaffold lacking in the stickleback genome assembly (pitx1), for the artificial chromosome combining all unanchored scaffolds (chrUn), and for the mitochondrial chromosome (chrM). Hence, there are 24 total nucleotide count data files per sample site. The sample site identifiers are incorporated in all file names and correspond to the sample site names used in the paper. For the marsh site M1, 'M1' denotes the main sample used for the standard genomic analyses, while 'M1_F' and 'M1_17' represent the same site sampled during the flood and one year later (2017 as oppopsed to 2016). In each nucleotide count file, the first column specifies the chromosome, the second column gives the genomic position, and the remaining columns indicate how often each of the four nucleotides was observed across the alignment. The data are ordinary text files (.txt) provided as a gzip-compressed tar archive. The compressed archive uses 25.4 Gb, the uncompressed archive 139.1 Gb of disk space.