Skip to main content
Dryad

De novo genome assembly of human cell line CHM13 nanopore ultra-long reads using Shasta

Cite this dataset

Simmonds, Sara (2022). De novo genome assembly of human cell line CHM13 nanopore ultra-long reads using Shasta [Dataset]. Dryad. https://doi.org/10.5068/D1GQ3S

Abstract

Advances in Oxford Nanopore Technologies (ONT) sequencing, basecalling, and updates to Shasta are outpacing the publishing cycle. We aim to update users on the state of the art using the latest and greatest ONT data assembled with Shasta. This release encompassed our latest assembly of CHM13, the AssemblySummary.html and Assembly.gfa along with our evaluation presented in tables and figures. We assembled ultra-long nanopore reads of CHM13 using Shasta 0.9.0 with the iterative assembly mode to produce a haploid de novo genome assembly.

Methods

We downloaded publicly available reads created by the "Telomere-to-Telomere" (T2T) Consortium to assemble CHM13. For a description of sequencing methods by the T2T Consortium, please see https://github.com/marbl/CHM13#oxford-nanopore-data. Release 8 of the data was re-called with Guppy v5.0.7 in super accuracy mode.

 

T2T CHM13 rel8 reads

https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/nanopore/rel8-guppy-5.0.7/reads.fastq.gz

 

We assembled the reads using Shasta 0.9.0 (Shafin et al., 2020) in the iterative assembly mode by calling the -Nanopore-Sep2020 configuration, plus additional command line options listed below. We performed the assembly on McCloud, a service that runs Shasta in the cloud.

 

Shasta 0.9.0 command line options

 

--Reads.minReadLength 50000 --Kmers.k 10 --MinHash.minHashIterationCount 100 --Align.minAlignedFraction 0.35 --Align.minAlignedMarkerCount 600 --Align.maxSkip 50 --Align.maxDrift 30 --Align.maxTrim 30 --ReadGraph.creationMethod 0 --ReadGraph.maxAlignmentCount 12 --ReadGraph.crossStrandMaxDistance 0 --MarkerGraph.refineThreshold 0 --MarkerGraph.minCoveragePerStrand 3 --MarkerGraph.simplifyMaxLength 10,100,1000,10000 --Assembly.iterative --Assembly.pruneLength 10000 --Assembly.consensusCaller Bayesian:guppy-5.0.7-a

References


 

Shafin,K. et al. (2020) Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol., 38, 1044–1053.

 

Usage notes

Files


Genome assembly file

Assembly of CHM13 in FASTA format (one strand only).

shasta_0.9.0_chm13_assembly.fasta

 

Genome assembly summary file

Assembly summary information in html format.

AssemblySummary.html

 

Graphical fragment assembly file

Assembly in GFA format (one strand only).

Assembly.gfa

 

Assembly evaluation

Results of our evaluation of the assembly using QUAST and asmgene presented in tables and figures.

shasta_0.9.0_chm13_evaluation.pdf

 

Funding

Chan Zuckerberg Initiative (United States)