Skip to main content
Dryad

A chromosome-level genome assembly of the orange wheat blossom midge, Sitodiplosis mosellana Géhin (Diptera: Cecidomyiidae)

Cite this dataset

Gong, Zhongjun et al. (2021). A chromosome-level genome assembly of the orange wheat blossom midge, Sitodiplosis mosellana Géhin (Diptera: Cecidomyiidae) [Dataset]. Dryad. https://doi.org/10.5061/dryad.dr7sqv9zx

Abstract

The Orange wheat blossom midge Sitodiplosis mosellana Géhin (Diptera: Cecidomyiidae), an important insect pest, has caused serious yield losses in most wheat-growing areas worldwide in the past half-century. In this study, we assembled the first chromosomal level genome for S. mosellana using PacBio long-read, Illumina short-read sequences and high-throughput chromatin conformation capture (Hi-C) genome scaffolding techniques. The final genome assembly was 180.69 Mb, with contig and scaffold N50 sizes of 998.71 kb and 44.56 Mb, respectively. Hi-C scaffolding reliably anchored four pseudochromosomes, accounting for 99.67% of the assembled genome. The assembly showed high integrity and quality, with 91.7% of short reads mapped to the genome and a coverage rate of 99.8%. The assembly quality was evaluated using Core Eukaryotic Genes Mapping Approach and Benchmarking Universal Single-Copy Orthologs. In total, 12,269 protein-coding genes were predicted, of which 91% were functionally annotated. Phylogenetic analysis indicated that S. mosellana and its close relative the swede midge Contarinia nasturtii diverged about 32.7 million years ago. S. mosellana genome showed high chromosomal synteny with the genome of Drosophila melanogaster and Anopheles gambiae. The key gene families involved in chemosensation and detoxification of plant secondary chemistry were analysed. The high-quality S. mosellana genome data will provide an invaluable resource for research in a broad range of areas, including the biology, ecology, genetics, and evolution of midges as well as insect-plant interactions and co-evolution, and their relatives more generally.

Methods

High-quality DNA extracted from the larvae was used for library preparation and high-throughput sequencing. Short-insert (350 bp) paired-end libraries were prepared according to the Illumina protocol and sequenced on the Illumina NovaSeq 6000 (Illumina, Inc.) with paired-end 150 (PE150) read layout. Whole genome sequencing was performed using the PacBio Sequel sequencer (Pacific Biosciences). 20-Kb single-molecule real-time sequencing (SMRT) bell libraries were constructed using the standard protocol. In total, 46.49 Gb of raw data for S. mosellana was obtained.

All raw reads from the PacBio platform were aligned to each other using ‘daligner’ (Myers, 2014) executed using the mail script of the FALCON assembler (Chin et al., 2016). Overlapping reads and raw subreads were processed to generate consensus sequences, and error correction of the assembly was polished using the consensus-calling algorithm Quiver (Chin et al., 2013). The paired-end clean reads from the Illumina platform were further corrected any remaining errors using Pilon (Walker et al., 2014), and the reads obtained after strict error correction were further used for the subsequent scaffolding.

To assist the chromosome-level assembly, we used the high-throughput chromosome conformation capture (Hi-C) technique to capture genome-wide chromatin interactions (Belaghzal et al., 2017). For Hi-C sequencing, chromosomal structure was fixed by formaldehyde crosslinking, and then MboI enzyme was used to shear DNA. Hi-C library with 350 bp insert size was constructed, which was sequenced on Illumina NovaSeq 6000 according to the manufacturer’s instructions. We then used the pruning, partition, rescue, optimization, and building of the ALLHiC pipeline (Dudchenko et al., 2017; Zhang et al., 2019) to construct the chromosomal-level scaffolds of S. mosellana genome.

Usage notes

Genome.tar.gz

The genome file of Sitodiplosis mosellana .

repeat_annotation.tar.gz

The repeat annotation files of Sitodiplosis mosellana.

ncRNA_annotation.tar.gz

The ncRNA annotation files of Sitodiplosis mosellana.

Structure_annotation.tar.gz

The structure annotation files of Sitodiplosis mosellana.

Function_annotation.tar.gz

The function annotation files of Sitodiplosis mosellana.

Funding

Agriculture Research System of China, Award: CARS-03

Ministry of Science and Technology of the People's Republic of China, Award: 2017YFD0301104