Skip to main content

De novo genome assembly of human cell line CHM13 nanopore ultra-long reads using Shasta

Cite this dataset

Simmonds, Sara (2022). De novo genome assembly of human cell line CHM13 nanopore ultra-long reads using Shasta [Dataset]. Dryad.


Advances in Oxford Nanopore Technologies (ONT) sequencing, basecalling, and updates to Shasta are outpacing the publishing cycle. We aim to update users on the state of the art using the latest and greatest ONT data assembled with Shasta. This release encompassed our latest assembly of CHM13, the AssemblySummary.html and Assembly.gfa along with our evaluation presented in tables and figures. We assembled ultra-long nanopore reads of CHM13 using Shasta 0.9.0 with the iterative assembly mode to produce a haploid de novo genome assembly.


We downloaded publicly available reads created by the "Telomere-to-Telomere" (T2T) Consortium to assemble CHM13. For a description of sequencing methods by the T2T Consortium, please see Release 8 of the data was re-called with Guppy v5.0.7 in super accuracy mode.


T2T CHM13 rel8 reads


We assembled the reads using Shasta 0.9.0 (Shafin et al., 2020) in the iterative assembly mode by calling the -Nanopore-Sep2020 configuration, plus additional command line options listed below. We performed the assembly on McCloud, a service that runs Shasta in the cloud.


Shasta 0.9.0 command line options


--Reads.minReadLength 50000 --Kmers.k 10 --MinHash.minHashIterationCount 100 --Align.minAlignedFraction 0.35 --Align.minAlignedMarkerCount 600 --Align.maxSkip 50 --Align.maxDrift 30 --Align.maxTrim 30 --ReadGraph.creationMethod 0 --ReadGraph.maxAlignmentCount 12 --ReadGraph.crossStrandMaxDistance 0 --MarkerGraph.refineThreshold 0 --MarkerGraph.minCoveragePerStrand 3 --MarkerGraph.simplifyMaxLength 10,100,1000,10000 --Assembly.iterative --Assembly.pruneLength 10000 --Assembly.consensusCaller Bayesian:guppy-5.0.7-a



Shafin,K. et al. (2020) Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol., 38, 1044–1053.


Usage notes


Genome assembly file

Assembly of CHM13 in FASTA format (one strand only).



Genome assembly summary file

Assembly summary information in html format.



Graphical fragment assembly file

Assembly in GFA format (one strand only).



Assembly evaluation

Results of our evaluation of the assembly using QUAST and asmgene presented in tables and figures.




Chan Zuckerberg Initiative (United States)