Skip to main content
Dryad

Linked-read sequencing enables haplotype-resolved resequencing at population scale

Cite this dataset

Lutgen, Dave et al. (2020). Linked-read sequencing enables haplotype-resolved resequencing at population scale [Dataset]. Dryad. https://doi.org/10.5061/dryad.9zw3r22bf

Abstract

The feasibility to sequence entire genomes of virtually any organism provides unprecedented insights into the evolutionary history of populations and species. Nevertheless, many population genomic inferences – including the quantification and dating of admixture, introgression and demographic events, and inference of selective sweeps – are still limited by the lack of high-quality haplotype information. The newest generation of sequencing technology now promises significant progress. To establish the feasibility of haplotype-resolved genome resequencing at population scale, we investigated properties of linked-read sequencing data of songbirds of the genus Oenanthe across a range of sequencing depths. Our results based on the comparison of downsampled (25x, 20x, 15x, 10x, 7x, and 5x) with high-coverage data (46-68x) of seven bird genomes mapped to a reference suggest that phasing contiguities and accuracies adequate for most population genomic analyses can be reached already with moderate sequencing effort. At 15x coverage, phased haplotypes span about 90% of the genome assembly, with 50 and 90 percent of phased sequences located in phase blocks longer than 1.25-4.6 Mb (N50) and 0.27-0.72 Mb (N90). Phasing accuracy reaches beyond 99% starting from 15x coverage. Higher coverages yielded higher contiguities (up to about 7 Mb/1Mb (N50/N90) at 25x coverage), but only marginally improved phasing accuracy. Phase block contiguity improved with input DNA molecule length; thus, higher-quality DNA may help keeping sequencing costs at bay. In conclusion, even for organisms with gigabase-sized genomes like birds, linked-read sequencing at moderate depth opens an affordable avenue towards haplotype-resolved genome resequencing at population scale.

Methods

10X Genomics linked-reads (60x coverage) were assembled using the Supernova 2.1 assembler. To remove duplicate scaffolds of at least 99% identity from the pseudohaploid assembly, we ran the dedupe procedure in BBTools (https://sourceforge.net/projects/bbmap/) allowing up to 7,000 edits. This reduced the assembly to 11,030 scaffolds. We then aimed to ensure that all duplicate scaffolds were removed and retain only scaffolds whose integrity can be confirmed by the presence of syntenic regions in another songbird genome. To this end, we performed a lastz alignment against the collared flycatcher assembly version 1.5, which is the highest-quality assembly available from the Muscicapidae family. For this we used lastz 1.04 with settings M=254, K=4500, L=3000, Y=15000, C=2, T=2, and --matchcount=10000. This resulted in 295 scaffolds with unique hits in the flycatcher assembly.

Funding

German Research Foundation, Award: BU3456/3-1

Science for Life Laboratory Swedish Biodiversity Program, Award: 2015-R14

Deutsche Forschungsgemeinschaft, Award: BU3456/3-1

Science for Life Laboratory Swedish Biodiversity Program, Award: 2015-R14