Chromosome-scale thraustochytrid genome assembly
Cite this dataset
Rest, Joshua; Collier, Jackie; Gallot-Lavallée, Lucie; Archibald, John (2023). Chromosome-scale thraustochytrid genome assembly [Dataset]. Dryad. https://doi.org/10.5061/dryad.2fqz612t6
Abstract
We used long-read sequencing to produce a telomere-to-telomere genome assembly for the osmoheterotrophic stramenopile protist Aurantiochytrium limacinum MYA-1381. Its ~62 Mb genome is mainly organized in 26 linear chromosomes with a novel configuration: subtelomeric rDNAs are interspersed with long repeated sequence elements denoted as LOng REpeated - TElomere And Rdna Spacers (LORE-TEARS). These repeats may play a role in chromosome end maintenance. A putative circular mirusvirus genome is present at a high copy number (called circular element 1; CE1). The presence of another mirusvirus genome at the end of chromosome 15 in superposition between two complete sets of rRNA and LORE-TEAR elements suggests a dynamic process at the chromosome ends.
Methods
For Nanopore sequencing, Aurantiochytrium ATCC MYA-1381 and two putative crtIBY knockout mutants (designated KO32 and KO33; (Rius et al. 2023)) were cultured for three days in 50 ml ATCC 790 By+ medium. Genomic DNA was extracted as described in the manuscript. The precipitated DNA was left to dissolve in water by spontaneous diffusion for 48+ hours at room temperature to avoid shearing and subsequently purified using QIAGEN Genomic-tip 20/G. Agarose gel electrophoresis (1%) was used to visually assess and confirm the integrity of high molecular weight (20+ Kb) DNA. DNA quality was evaluated using a NanoPhotometer P360 (Implen) to measure A260/280 (~1.8) and A260/230 (2.0-2.2) ratios. The quantity of DNA was calculated using a Qubit 2.0 Fluorometer (ThermoFisher Scientific) with the dsDNA broad range assay kit.
A multiplexed Nanopore (MinION, Oxford Nanopore Technologies) sequencing library was prepared using the Oxford Nanopore Technology (ONT) ligation sequencing kit (SQK-LSK109) and the PCR-free native barcoding expansion kit 1-12 (EXP-NBD103) according to the Oxford Nanopore Technologies protocol “1D Native barcoding genomic DNA with EXP-NBD103 and SQK-LSK109 (version NBE_9065_v109_revB_23May2018). Approximately 2 µg of purified genomic DNA per sample were used as input. Unfragmented genomic DNA for the wild-type and putative knockouts was repaired using the NEBNext FFPE DNA repair module (NEB cat. no. M6630) and prepared for adapter ligation using the NEBNext End repair/dA-tailing module (NEB cat. no. E7546) with incubations at 20°C and 65°C for 10 minutes each. The DNA repaired/end-prepped samples were purified with a 1:1 volume of AMPure XP beads (Beckman), and subjected to an incubation at room temperature for 10 minutes; the pelleted beads were subsequently washed twice with 80% ethanol. The DNA was eluted off the beads in 25 µl nuclease free water for 10 minutes at 37°C to encourage the elution of long molecules from the beads. The native barcodes NB07, NB08, and NB09 were ligated to the WT, KO32, and KO33 repaired/end-prepped DNA samples, respectively, in a 1.36x scaled ligation reaction. Each native barcoded sample was pooled in approximately equimolar amounts (~1.3 µg each). The 1D barcode sequencing adapters (BAM 1D) were then ligated to the pooled and barcoded DNA using a 1-hour incubation at 25°C. The adapter ligated DNA was purified by a 0.4x AMPure XP bead clean-up including a 10 minute incubation at room temperature and two washes using the Long Fragment Buffer mix to enrich for DNA fragments >3 Kbp. The final adapter ligated library was incubated in 15 µl Elution Buffer for 10 minutes at 37°C. A total of 1.2 µg of prepared library was loaded on a single MinION R9.4.1 chemistry SpotON flow cell (FLO-MIN106) and sequenced via Oxford Nanopore Technology's MinKNOW software (v2.1.12) without live basecalling. Binning of the raw reads was performed in real time using Deepbinner v0.2.0 (Wick, Judd, and Holt 2018)(Rius et al. 2023).The raw fast5 MinION data has been deposited in the NCBI SRA database BioProject PRJNA680238 (WT accession: SRR13108467; KO32 accession: SRR13108466; KO33 accession: SRR13108465).
As described previously (Rius et al. 2023), the demultiplexed fast5 files were base called using Albacore v2.3.1, adapters were removed by Porechop v0.2.3 (Wick et al. 2017), and the resulting data were used for preliminary genome assembly by Canu v1.7.1 (Koren et al. 2017) with parameters adjusted to the expected genome size of 60 Mbp. The resulting consensus sequence was improved by Nanopolish v0.10.1 (Loman, Quick, and Simpson 2015) (https://github.com/jts/nanopolish). For wild-type, the genome assembly totaled 61.9 Mbp in 55 contigs, while the genomes of KO mutants 32 and 33 both assembled as 62.5 Mbp into 50 and 47 contigs, respectively. Analysis by Mauve v2.4.0 (Darling et al. 2004) revealed ~27 homologous contigs among the three assemblies, ranging from ~0.35 up to ~4 Mbp. The wild-type assembly was the least contiguous, perhaps reflecting its lower mean read length (4913 bp vs 8508 and 7951). In an effort to resolve the differences among the three assemblies and gain greater coverage, reads for all three strains were concatenated into one file as input for Canu v1.7.1, and read trimming and assembly were performed with default settings and an estimated genome size of 60 Mb. The combined read file had 789085 reads >Q7 and 293668 reads >Q10, mean read length 7124 bp and the longest read was 180475 bp (quality score 9.8). This combined Nanopore-based Canu assembly contained 62 contigs and a total of 63.71Mbp. The 27 contigs described in the associated manuscript total 61.77 Mbp. The remaining 35 contigs included five putative mitochondrial contigs and another 30 contigs ranging from 20 Kb to 230 Kb, of which 14 represented single reads.
References
- Darling, Aaron C. E., Bob Mau, Frederick R. Blattner, and Nicole T. Perna. 2004. “Mauve: Multiple Alignment of Conserved Genomic Sequence with Rearrangements.” Genome Research 14 (7): 1394–1403.
- Koren, Sergey, Brian P. Walenz, Konstantin Berlin, Jason R. Miller, Nicholas H. Bergman, and Adam M. Phillippy. 2017. “Canu: Scalable and Accurate Long-Read Assembly via Adaptive K-Mer Weighting and Repeat Separation.” Genome Research 27 (5): 722–36.
- Loman, Nicholas J., Joshua Quick, and Jared T. Simpson. 2015. “A Complete Bacterial Genome Assembled de Novo Using Only Nanopore Sequencing Data.” Nature Methods 12 (8): 733–35.
- Rius, Mariana, Joshua S. Rest, Gina V. Filloramo, Anna M. G. Novák Vanclová, John M. Archibald, and Jackie L. Collier. 2023. “Horizontal Gene Transfer and Fusion Spread Carotenogenesis among Diverse Heterotrophic Protists.” Genome Biology and Evolution, February. https://doi.org/10.1093/gbe/evad029.
- Wick, Ryan R., Louise M. Judd, Claire L. Gorrie, and Kathryn E. Holt. 2017. “Completing Bacterial Genome Assemblies with Multiplex MinION Sequencing.” Microbial Genomics 3 (10): e000132.
- Wick, Ryan R., Louise M. Judd, and Kathryn E. Holt. 2018. “Deepbinner: Demultiplexing Barcoded Oxford Nanopore Reads with Deep Convolutional Neural Networks.” PLoS Computational Biology 14 (11): e1006583.
Funding
Gordon and Betty Moore Foundation