Adaptively integrated sequencing and assembly of near-complete genomes
Data files
Jul 16, 2025 version files 39.22 GB
-
cornetto-animal-asm.tar.gz
5.32 GB
-
cornetto-hg002-asm.tar.gz
33.91 GB
-
README.md
12.31 KB
Abstract
Recent advances in long-read sequencing (LRS) and assembly algorithms have made it possible to create highly complete genome assemblies for humans, animals, plants, and other eukaryotes. However, there is a need for ongoing development to improve accessibility and affordability of the required data, increase the range of usable sample types, and reliably resolve the most challenging, repetitive genome regions. 'Cornetto' is a new experimental paradigm in which the genome assembly process is adaptively integrated with programmable selective nanopore sequencing, with target regions being iteratively updated to focus LRS data production onto the unsolved regions of a nascent assembly. This improves assembly quality and streamlines the process, both for human individuals and diverse non-human vertebrates, including endemic Australian endangered species, tested here. Cornetto enables us to generate highly complete diploid human genome assemblies using only a single LRS platform, surpassing the quality of previous efforts at a fraction of the cost. Cornetto enables genome assembly from challenging sample types like human saliva, for the first time, further enhancing accessibility. Finally, we obtain complete and accurate assemblies for clinically-relevant repetitive loci at the extremes of the genome, demonstrating valid approaches for genetic diagnosis in facioscapulohumeral muscular dystrophy (FSHD) and MUC1-autosomal dominant tubulointerstitial kidney disease (MUC1-ADTKD) - inherited diseases for which diagnosis is complicated by an inability to sequence the genes involved. In summary, Cornetto will improve, accelerate, and democratise genome assembly, delivering impacts across a range of bioscience domains.
Dataset DOI: 10.5061/dryad.kkwh70sfr
Description of the data and file structure
These assemblies were generated using the cornetto adaptive sampling method described in our manuscript. The documentation and source code used for the process are available at https://github.com/hasindu2008/cornetto.
Files and variables
There are two files, namely cornetto-hg002-asm.tar.gz and cornetto-animal-asm.tar.gz that are described below.
File: cornetto-hg002-asm.tar.gz
Description: This tarball contains all the FASTA assemblies for the hg002 sample.The below files are found inside the cornetto-hg002-asm directory created when you download and extract the cornetto-hg002-asm.tar.gz
Assembly name | Assembly Description | Assembly File |
---|---|---|
hg002-Hifi-1 | Hg002 DNA; 1x HiFi SMRT cell; primary asm | hg002-Hifi-1/hg002-Hifi-1.fasta |
hg002-Hifi-2 | Hg002 DNA; 2x HiFi SMRT cell; primary asm | hg002-Hifi-2/hg002-Hifi-2.fasta |
hg002-Hifi-3 | Hg002 DNA; 3x HiFi SMRT cell; primary asm | hg002-Hifi-3/hg002-Hifi-3.fasta |
hg002-NonCornetto-1 | hg002 DNA; 1x HiFi SMRT cell + 3x ONT duplex FC (without cornetto); primary asm | hg002-NonCornetto-1/hg002-NonCornetto-1.fasta |
hg002-NonCornetto-2 | hg002 DNA; 1x HiFi SMRT cell + 9x ONT duplex FC (without cornetto); primary asm | hg002-NonCornetto-2/hg002-NonCornetto-2.fasta |
hg002-Cornetto-1.1 | Hg002 DNA; 1x HiFi SMRT cell + 1x ONT duplex FC (cornetto cycle 1); primary asm | hg002-Cornetto-1/hg002-Cornetto-1.1.fasta |
hg002-Cornetto-1.2 | Hg002 DNA; 1x HiFi SMRT cell + 1x ONT duplex FC (cornetto cycle 2); primary asm | hg002-Cornetto-1/hg002-Cornetto-1.2.fasta |
hg002-Cornetto-1.3 | Hg002 DNA; 1x HiFi SMRT cell + 1x ONT duplex FC (cornetto cycle 3); primary asm | hg002-Cornetto-1/hg002-Cornetto-1.3.fasta |
hg002-Cornetto-1.4 | Hg002 DNA; 1x HiFi SMRT cell + 2x ONT duplex FC (cornetto cycle 4); primary asm | hg002-Cornetto-1/hg002-Cornetto-1.4.fasta |
hg002-Cornetto-1.5 | Hg002 DNA; 1x HiFi SMRT cell + 2x ONT duplex FC (cornetto cycle 5); primary asm | hg002-Cornetto-1/hg002-Cornetto-1.5.fasta |
hg002-Cornetto-1.6 | Hg002 DNA; 1x HiFi SMRT cell + 2x ONT duplex FC (cornetto cycle 6); primary asm | hg002-Cornetto-1/hg002-Cornetto-1.6.fasta |
hg002-Cornetto-1.7 | Hg002 DNA; 1x HiFi SMRT cell + 3x ONT duplex FC (cornetto cycle 7); primary asm | hg002-Cornetto-1/hg002-Cornetto-1.7.fasta |
hg002-Cornetto-1.8 | Hg002 DNA; 1x HiFi SMRT cell + 3x ONT duplex FC (cornetto cycle 8); primary asm | hg002-Cornetto-1/hg002-Cornetto-1.8.fasta |
hg002-Cornetto-1.9 (final) | hg002 DNA; 1x HiFi SMRT cell + 3x ONT duplex FC (cornetto cycle 9); primary asm | hg002-Cornetto-1/hg002-Cornetto-1.9.fasta |
hg002-Cornetto-2.1 | Hg002 DNA; 1x HiFi SMRT cell + 1x ONT duplex FC (cornetto cycle 1); primary asm | hg002-Cornetto-2/hg002-Cornetto-2.1.fasta |
hg002-Cornetto-2.2 | Hg002 DNA; 1x HiFi SMRT cell + 1x ONT duplex FC (cornetto cycle 2); primary asm | hg002-Cornetto-2/hg002-Cornetto-2.2.fasta |
hg002-Cornetto-2.3 | Hg002 DNA; 1x HiFi SMRT cell + 1x ONT duplex FC (cornetto cycle 3); primary asm | hg002-Cornetto-2/hg002-Cornetto-2.3.fasta |
hg002-Cornetto-2.4 | Hg002 DNA; 1x HiFi SMRT cell + 2x ONT duplex FC (cornetto cycle 4); primary asm | hg002-Cornetto-2/hg002-Cornetto-2.4.fasta |
hg002-Cornetto-2.5 | Hg002 DNA; 1x HiFi SMRT cell + 2x ONT duplex FC (cornetto cycle 5); primary asm | hg002-Cornetto-2/hg002-Cornetto-2.5.fasta |
hg002-Cornetto-2.6 | Hg002 DNA; 1x HiFi SMRT cell + 2x ONT duplex FC (cornetto cycle 6); primary asm | hg002-Cornetto-2/hg002-Cornetto-2.6.fasta |
hg002-Cornetto-2.7 | Hg002 DNA; 1x HiFi SMRT cell + 3x ONT duplex FC (cornetto cycle 7); primary asm | hg002-Cornetto-2/hg002-Cornetto-2.7.fasta |
hg002-Cornetto-2.8 | Hg002 DNA; 1x HiFi SMRT cell + 3x ONT duplex FC (cornetto cycle 8); primary asm | hg002-Cornetto-2/hg002-Cornetto-2.8.fasta |
hg002-Cornetto-2.9 (final) | hg002 DNA; 1x HiFi SMRT cell + 3x ONT duplex FC (cornetto cycle 9); primary asm | hg002-Cornetto-2/hg002-Cornetto-2.9.fasta |
hg002-Base, primary | hg002 DNA; 1x ONT flow cell (LSK114, simplex reads, SUP); primary asm | hg002-ONT_base/hg002-ONT_base.fasta |
hg002-Base, h1 | hg002 DNA; 1x ONT flow cell (LSK114, simplex reads, SUP); haplotype 1 | hg002-ONT_base/hg002-ONT_base.hap1.fasta |
hg002-Base, h2 | hg002 DNA; 1x ONT flow cell (LSK114, simplex reads, SUP); haplotype 2 | hg002-ONT_base/hg002-ONT_base.hap2.fasta |
hg002-NonCornetto-3, primary | hg002 DNA; 3x ONT flow cell (LSK114, simplex reads, SUP); primary asm | hg002-NonCornetto-3/hg002-NonCornetto-3.fasta |
hg002-NonCornetto-3, h1 | hg002 DNA; 3x ONT flow cell (LSK114, simplex reads, SUP); haplotype 1 | hg002-NonCornetto-3/hg002-NonCornetto-3.hap1.fasta |
hg002-NonCornetto-3, h2 | hg002 DNA; 3x ONT flow cell (LSK114, simplex reads, SUP); haplotype 2 | hg002-NonCornetto-3/hg002-NonCornetto-3.hap2.fasta |
hg002-Cornetto-3.1, primary | hg002 DNA; 1x ONT FC (standard) + 1x ONT FC (cornetto; cycle 1; LSK114, simplex reads, SUP); primary asm | hg002-Cornetto-3/hg002-Cornetto-3.1.fasta |
hg002-Cornetto-3.2, primary | hg002 DNA; 1x ONT FC (standard) + 1x ONT FC (cornetto; cycle 2; LSK114, simplex reads, SUP); primary asm | hg002-Cornetto-3/hg002-Cornetto-3.2.fasta |
hg002-Cornetto-3.3 (final), primary | hg002 DNA; 1x ONT FC (standard) + 1x ONT FC (cornetto; cycle 3; LSK114, simplex reads, SUP); primary asm | hg002-Cornetto-3/hg002-Cornetto-3.3.fasta |
hg002-Cornetto-4.1, h1 | hg002 DNA; 1x ONT FC (standard) + 1x ONT FC (cornetto; cycle 1; LSK114, simplex reads, SUP); haplotype 1 | hg002-Cornetto-4/hg002-Cornetto-4.1.hap1.fasta |
hg002-Cornetto-4.1, h2 | hg002 DNA; 1x ONT FC (standard) + 1x ONT FC (cornetto; cycle 1; LSK114, simplex reads, SUP); haplotype 2 | hg002-Cornetto-4/hg002-Cornetto-4.1.hap2.fasta |
hg002-Cornetto-4.2, h1 | hg002 DNA; 1x ONT FC (standard) + 1x ONT FC (cornetto; cycle 2; LSK114, simplex reads, SUP); haplotype 1 | hg002-Cornetto-4/hg002-Cornetto-4.2.hap1.fasta |
hg002-Cornetto-4.2, h2 | hg002 DNA; 1x ONT FC (standard) + 1x ONT FC (cornetto; cycle 2; LSK114, simplex reads, SUP); haplotype 2 | hg002-Cornetto-4/hg002-Cornetto-4.2.hap2.fasta |
hg002-Cornetto-4.3 (final), h1 | hg002 DNA; 1x ONT FC (standard) + 1x ONT FC (cornetto; cycle 3; LSK114, simplex reads, SUP); haplotype 1 | hg002-Cornetto-4/hg002-Cornetto-4.3.hap1.fasta |
hg002-Cornetto-4.3 (final), h2 | hg002 DNA; 1x ONT FC (standard) + 1x ONT FC (cornetto; cycle 3; LSK114, simplex reads, SUP); haplotype 2 | hg002-Cornetto-4/hg002-Cornetto-4.3.hap2.fasta |
File: cornetto-animal-asm.tar.gz
Description: This tarball contains FASTA assemblies for the animal samples (the base assembly and the assembly after the final cornetto cycle). The below files are found inside the cornetto-animal-asm directory created when you download and extract the cornetto-animal-asm.tar.gz
Assembly Description | Assembly File | Assembly File |
---|---|---|
Petrel-Base, primary | Petrel DNA; 1x HiFi SMRT cell; primary asm | Petrel-Base/Petrel-Base.fasta |
Petrel-Cornetto, primary | Petrel DNA; 1x HiFi SMRT cell + 2x ONT duplex FC (7x cycles); primary asm | Petrel-Cornetto/Petrel-Cornetto.fasta |
Cichlid-Base, primary | Cichlid DNA; 1x HiFi SMRT cell; primary asm | Cichlid-Base/Cichlid-Base.fasta |
Cichlid-Cornetto, primary | Cichlid DNA; 1x HiFi SMRT cell + 2x ONT duplex FC (6x cycles); primary asm | Cichlid-Cornetto/Cichlid-Cornetto.fasta |
Turtle-Base, h1 | Turtle DNA; 1x ONT FC (standard; LSK114, simplex reads, SUP basecalling); haplotype 1 | Turtle-Base/Turtle-Base.hap1.fasta |
Turtle-Base, h2 | Turtle DNA; 1x ONT FC (standard; LSK114, simplex reads, SUP basecalling); haplotype 2 | Turtle-Base/Turtle-Base.hap2.fasta |
Turtle-Cornetto, h1 | Turtle DNA; 1x ONT FC (standard) + 1x ONT FC (cornetto; 3x cycles; LSK114, simplex reads, SUP basecalling); haplotype 1 | Turtle-Cornetto/Turtle-Cornetto.hap1.fasta |
Turtle-Cornetto, h2 | Turtle DNA; 1x ONT FC (standard) + 1x ONT FC (cornetto; 3x cycles; LSK114, simplex reads, SUP basecalling); haplotype 2 | Turtle-Cornetto/Turtle-Cornetto.hap2.fasta |
Parrot-Base, h1 | Parrot DNA; 2x ONT FC (standard; LSK114, simplex reads, SUP basecalling); haplotype 1 | Parrot-Base/Parrot-Base.hap1.fasta |
Parrot-Base, h2 | Parrot DNA; 2x ONT FC (standard; LSK114, simplex reads, SUP basecalling); haplotype 2 | Parrot-Base/Parrot-Base.hap2.fasta |
Parrot-Cornetto, h1 | Parrot DNA; 2x ONT FC (standard) + 1x ONT FC (cornetto; 4x cycles; LSK114, simplex reads, SUP basecalling); haplotype 1 | Parrot-Cornetto/Parrot-Cornetto.hap1.fasta |
Parrot-Cornetto, h2 | Parrot DNA; 2x ONT FC (standard) + 1x ONT FC (cornetto; 4x cycles; LSK114, simplex reads, SUP basecalling); haplotype 2 | Parrot-Cornetto/Parrot-Cornetto.hap2.fasta |
Code/software
Standard tar command in unix is adequate to extract the tar.gz file. Once extracted, the FASTA files can be opened using any text viewing/editing software or common genomics software such as samtools, seqtk or seqkit.
Access information
The raw FASTQ files that were used to generate the assemblies here are available to be downloaded from the ENA project PRJEB86853.