Skip to main content
Dryad logo

Pooled whole genome sequencing from year 2004 and Early-Late SNP data from year 2014

Citation

Kulmuni, Jonna et al. (2020), Pooled whole genome sequencing from year 2004 and Early-Late SNP data from year 2014, Dryad, Dataset, https://doi.org/10.5061/dryad.w9ghx3fmn

Abstract

Speciation underlies the generation of novel biodiversity. Yet, there is much to learn about how natural selection shapes genomes during speciation. Selection is assumed to act against gene flow at barrier loci, promoting reproductive isolation. However, evidence for gene flow and selection is often indirect and we know very little about the temporal stability of barrier loci. Here we utilize haplodiploidy to identify candidate male barrier loci in hybrids between two wood ant species. As ant males are haploid they are expected to reveal recessive barrier loci, which can be masked in diploid females if heterozygous. We then test for barrier stability in a sample collected ten years later and use survival analysis to provide a direct measure of natural selection acting on candidate male barrier loci. We find multiple candidate male barrier loci scattered throughout the genome. Surprisingly, a proportion of them are not stable after ten years, natural selection apparently switching from acting against to favoring introgression in the later sample. Instability of barrier effect and natural selection for introgressed alleles could be due to environment-dependent selection, emphasizing the need to consider temporal variation in the strength of natural selection and the stability of barrier effect at putative barrier loci in future speciation work.

Methods

Pooled whole genome sequencing

We collected the samples used for pooled sequencing from the Långholmen hybrid population in the year 2004. The samples were freshly frozen and kept in -20°C until genomic DNA was extracted in the year 2010 from half a body using a Qiagen kit. We sequenced four samples, each consisting of 24 individuals: R males, R females (unmated queens), W males and W females (unmated queens). These individuals were classified into the two lineages based on six to seven diagnostic microsatellite alleles. The sample concentrations were checked with Qubit and pooled into the four pools in equal amounts. Due to haplodiploidy, this resulted in 24 and 48 chromosomes sampled for male and female pools, respectively.

Each pool was sequenced with 100bp paired-end sequencing on its own lane in Illumina HiSeq2000 in the Institute for Molecular Medicine Finland (FIMM). This resulted in 46,106,000 to 108,204,481 total number of reads per pool. We quality trimmed reads by removing up to 20 bp that had phred score < 20 using FASTX-Toolkit. Next, we made de novo assemblies of each of the four samples with Soapdenovo (Li et al. 2010) trying out different kmer sizes (31, 41, 51, 61, 71) for each assembly. The R male assembly with kmer size of 41 was best in terms of completeness and quality (genome size: 222.6 Mb, 327480 contigs, average contig length: 679 bp, N50 = 1748 bp) and chosen as our reference assembly. We then mapped each sample back to the R male reference assembly after removing contigs of the assembly shorter than 500bp using Bowtie2 v2.0.2 (Langmead & Salzberg 2012). Reads mapped in proper pairs and with a mapping quality superior to 20 were filtered and combined in a single mpileup file using samtools 1.4 (Li et al. 2009). Since coverage of the W male pool was low (mean = 16, s.d. = 38), overlaps between read pairs were kept.

SNP genotyping

Individuals for genotyping were collected from field colonies in year 2014. DNA was extracted using Qiagen kit and protocol for insects. Genotyping was done at individual level for a total of 196 individuals. Samples were randomly assigned to 96-well plates for genotyping. Primer design and genotyping were done at LGC genomics using KASP genotyping chemistry. After removal of SNPs or individuals with more than 10% missing data, diploid males and three ambiguous individuals, we were left with a total of 185 genotyped individuals genotyped at 300 SNPs.

Usage Notes

Please see ReadMe.