Skip to main content
Dryad

Improved contiguity of the threespine stickleback genome using long-read sequencing

Cite this dataset

Nath, Shivangi; Shaw, Daniel; White, Michael (2021). Improved contiguity of the threespine stickleback genome using long-read sequencing [Dataset]. Dryad. https://doi.org/10.5061/dryad.qjq2bvqff

Abstract

While the cost and time for assembling a genome has drastically decreased, it still remains a challenge to assemble a highly contiguous genome. These challenges are rapidly being overcome by the integration of long-read sequencing technologies. Here, we use long-read sequencing to improve the contiguity of the threespine stickleback fish (Gasterosteus aculeatus) genome, a prominent genetic model species. Using Pacific Biosciences sequencing, we assembled a highly contiguous genome of a freshwater fish from Paxton Lake. Using contigs from this genome, we were able to fill over 76% of the gaps in the existing reference genome assembly, improving contiguity over five-fold. Our gap filling approach was highly accurate, validated by 10X Genomics long-distance linked-reads. In addition to closing a majority of gaps, we were able to assemble segments of telomeres and centromeres throughout the genome. This highlights the power of using long sequencing reads to assemble highly repetitive and difficult to assemble regions of genomes. This latest genome build has been released through a newly designed community genome browser that aims to consolidate the growing number of genomics datasets available for the threespine stickleback fish.

Usage notes

paxton_lake_benthic.fa.gz is the denovo assembly constructed using PacBio Long reads from a Paxton Benthic Male.

GAculeatus_UGA_version5_UN_merged.fasta is the v. 5 genome constructed by filling in gaps in v. 4 Hi-C assembled genome.