Skip to main content
Dryad logo

Genomic insights into evolution and control of Wohlfahrtia magnifica, a widely distributed myiasis-causing fly of warm-blooded vertebrates

Citation

Jia, Zhipeng; Hasi, Surong; Vogl, Claus; Burger, Pamela (2022), Genomic insights into evolution and control of Wohlfahrtia magnifica, a widely distributed myiasis-causing fly of warm-blooded vertebrates, Dryad, Dataset, https://doi.org/10.5061/dryad.qfttdz0j8

Abstract

Wohlfahrtia magnifica is a pest fly species, invading livestock in many European, African and Asian countries, and causing heavy agro-economic losses. In the life cycle of this obligatory parasite, adult flies infect the host by depositing the first-stage larvae into body cavities or open wounds. The feeding larvae cause severe (skin) tissue damage and potentially fatal infections if untreated. Despite serious health detriments and agro-economic concerns, genomic resources for understanding the biology of W. magnifica have so far been lacking. Here, we present a complete genome assembly from a single adult female W. magnifica using a Low-DNA Input workflow for PacBio HiFi library preparation. The de novo assembled genome is 753.99 Mb in length, with a scaffold N50 of 5.00 Mb, consisting of 16,718 predicted protein-encoding genes. Comparative genomic analysis revealed that W. magnifica has the closest phylogenetic relationship to Sarcophaga bullata followed by Lucilia cuprina. Evolutionary analysis of gene families showed expansions of 173 gene families in W. magnifica that were enriched for gene ontology (GO) categories related to immunity, insecticide-resistance mechanisms, heat stress response and cuticle development. In addition, 45 positively selected genes displaying various functions were identified. This new genomic resource contributes to the evolutionary and comparative analysis of dipterous flies and an in-depth understanding of many aspects of W. magnifica biology. Furthermore, it will facilitate the development of novel tools for controlling W. magnifica infection in livestock.

Methods

Genome assembly

To correct sequencing errors and generate highly accurate consensus reads, we converted raw reads into circular consensus sequences (CCS; hereafter HiFi sequences) using the program ccs v. 5.0.0 with default settings (https://github.com/PacificBiosciences/ccs). Next, we used Icecreamfinder v. 38.84 (https://sourceforge.net/projects/bbmap/) to filter out and/ or trim HiFi sequences with inverted repeats and remaining adapter sequences with default settings. Then, we filtered the resulting HiFi reads for potential bacterial contamination using SendSketch v. 38.87 (https://sourceforge.net/projects/bbmap/) to send a reduced representation of the trimmed/ filtered HiFi reads against drafts from the NCBI nucleotide database inspecting up to 1000 records in the results. We used NCBI datasets v. 10.9.0 to retrieve matching bacterial genome sequences and Seal v. 38.87 (https://sourceforge.net/projects/bbmap/) with k=31 and minkmerfraction=0.5 to assign and remove HiFi sequences with at least 50% of each HiFi sequence’s 31-mers matching the bacterial genomes. For the genome assembly based on the filtered HiFi sequences, the official PacBio software for HiFi genome assembly, the Improved Phased Assembler (IPA) v. 1.3.2 (https://github.com/PacificBiosciences/pbbioconda/wiki/Improved-Phased-Assembler), was employed with default settings.

Annotation 

We annotated the genome assembly with BRAKER v. 2.1.5, Augustus v. 3.3.3, and GeneMarkES v. 4.6.3. We used proteins from Arthropoda v100_odb10, RNA-Seq alignments made between RNA-Seq libraries aligned to the genome with HISAT2 v. 2.2.1 using --max-intronlen 100000 and --dta. For BRAKER we used the softmasking, etpmode, and the following augustus settings: --alternatives-from-sampling=true --minexonintronprob=0.2 --minmeanexonintronprob=0.5 --sample=100 --maxtracks=3 –temperature=2. Then, we employed MAKER v. 3.01.03 to merge the annotations by Augustus and GeneMark using the hintsfile.gff produced by BRAKER as the protein_gff passed to MAKER and the concatenated augustus.hints.gtf and GeneMark-ETP’s genemark.f.multi_anchored.gtf filtered by GFFread v. 0.12.3 using the settings --adj-stop -J --sort-alpha -E --keep-genes as the pred_gff passed to MAKER. We functionally annotated the MAKER filtered genes using proteins with a combination of blastp searches against UniProt/Sprot release 2020_05 implemented with Diamond v. 2.0.4 using the settings ultra-sensitive, evalue 1e-6 and max-target-seqs 1. The resulting annotations were reformatted with GAG (http://genomeannotation.github.io/GAG/; Geib et al., 2018) and Annie (http://genomeannotation.github.io/annie/).

Funding

China Scholarship Council, Award: 201909150004

Austrian Science Fund, Award: P29623-B25