Skip to main content

Concatenated amino acid (AA) phylogenetic dataset of nuclear gene orthologs for Ephydroidea (Diptera)

Cite this dataset

Wiegmann, Brian (2022). Concatenated amino acid (AA) phylogenetic dataset of nuclear gene orthologs for Ephydroidea (Diptera) [Dataset]. Dryad.


The schizophoran superfamily Ephydroidea (Diptera: Cyclorrhapha) includes eight families, ranging from the well-known vinegar flies (Drosophilidae) and shore flies (Ephydridae), to several small, relatively unusual groups, the phylogenetic placement of which has been particularly challenging for systematists. Extraordinary diversity in life histories, feeding habits, and morphology are hallmarks of fly biology, and the Ephydroidea are no exception. Extreme specialization can lead to “orphaned” taxa with no clear evidence for their phylogenetic position. To resolve relationships among a diverse sample of Ephydroidea, including the highly modified flies in the families Braulidae and Mormotomyiidae, we conducted phylogenomic sampling. Using exon capture from Anchored Hybrid Enrichment and transcriptomics to obtain 320 orthologous nuclear genes sampled for 32 species of Ephydroidea and 11 outgroups, we evaluate a new phylogenetic hypothesis for representatives of the superfamily. These data strongly support monophyly of Ephydroidea with Ephydridae as an early branching radiation and the placement of Mormotomyiidae as a family-level lineage sister to all remaining families. We confirm the placement of Cryptochetidae as a sister taxon to a large clade containing both Drosophilidae and Braulidae – the latter a family of honeybee ectoparasites. Our results reaffirm that sampling of both taxa and characters is critical in hyperdiverse clades and that these factors have a major influence on phylogenomic reconstruction of the history of the schizophoran fly radiation.


These are data obtained by anchored hybrid enrichment using the NCSU_Wiegmann Diptera Probes or from transcriptome data available in public databases.  Each locus in the concatenated dataset (327 genes) was identified as a  single-copy ortholog using the Orthograph program and compared against a reference library of brachyceran Diptera orthologs  (Brachybase) designed from genomic resources for Brachycera in the OrthoDB database.  Each locus partition was separately aligned in MAFTT.  

Usage notes

Phylogenetic analysis was conducted in IQTREE v1.44 with the partition file ''.


National Science Foundation, Award: DEB-1257960

National Science Foundation, Award: DEB-2030345