Data from: Study of mitogenomes provides implications for the phylogenetics and evolution of the infraorder Muscomorpha in Diptera
Data files
Jan 10, 2025 version files 1.11 MB
-
File_S1.txt
116.15 KB
-
matrix.zip
994.42 KB
-
README.md
3.61 KB
Abstract
The Muscomorpha is one of the most species-rich brachyceran groups in Diptera, with many species serving as important disease vectors; however, its high-level phylogenetic relationships have long been controversial and unsolved. This study comparatively analyzed the characteristics of mitogenomes of 131 species that represent 18 superfamilies in Muscomorpha, in which mitogenomes of 16 species have been newly sequenced and annotated, demonstrating that their gene composition, order, AT bias, length variation, and codon usage are consistent with documented dipteran mitogenomes. The phylogenetic topologies demonstrated that the robustness of Muscomorpha and major clades within Muscomorpha are monophyletic: Cyclorrhapha, Schizophora, and Calyptratae. A clade of Empidoidea were recovered as the sister-group to Cyclorrhapha. Within Cyclorrhapha, Platypezoidea and Syrphoidea were sequentially placed as basal groups of the Cyclorrhapha. The remaining cyclorrhaph superfamilies gathered as two main clades. Ephydroidea were, in most cases, placed as the sister-group to Calyptratae. Within Calyptratae, Hippoboscoidea were sister to an assemblage of lineages composed of a Oestroid grade and Muscoidea. The Muscomorpha was proposed to originate in the early Jurassic, and the main clade diversified near the Cretaceous–Paleogeneextinction event, estimated using the MCMCtree and six fossil calibration points. The ancestral area of origin and geographic range of Muscomor- pha was deduced to be the Palaearctic region with 56.9% probability using the RASP software based on a dated tree.
README: Study of mitogenomes provides implications for the phylogenetics and evolution of the infraorder Muscomorpha in Diptera
https://doi.org/10.5061/dryad.wwpzgmsvm
Description of the data and file structure
FILE LIST
File S1.txt BioGeoBEARS (output generated by BioGeoBEARS, which was used to perform biogeographical analysis on phylogenetic trees) result file of Name
Fig.s1.tiff Predicted secondary structures of 22 tRNAs in the mitogenomes of Loxoneura sp.. The tRNAs are labelled with their corresponding amino acids.
Fig.s2.tiff Predicted secondary structures of 22 tRNAs in the mitogenomes of Systropus daiyunshanus. The 22 tRNAs are labelled with their corresponding amino acids.
Fig.s3.pdf Phylogenetic relationships of 131 species inferred from PCGsRNA dataset. Numbers beside the nodes are the ultrafast bootstrap value (BS) / the approximate likelihood ratio test support (SH-aLRT) for partitioned maximum likelihood analysis.
Fig.s4.pdf Phylogenetic relationships of 131 species inferred from PCGsRNA dataset. Numbers beside the nodes are the ultrafast bootstrap value (BS)/a bayes test support (PP) for GTR+FO*H4 maximum likelihood analysis.
Fig.s5.pdf Phylogenetic relationships of 131 species inferred from PCGs dataset. Numbers beside the nodes are the ultrafast bootstrap value (BS)/the approximate likelihood ratio test support (SH-aLRT) for partitioned maximum likelihood analysis.
Fig.s6.pdf Phylogenetic relationships of 131 species inferred from PCGs dataset. Numbers beside the nodes are the ultrafast bootstrap value (BS)/a bayes test support (PP) for GTR+FO*H4 maximum likelihood analysis.
Fig.s7.pdf Phylogenetic relationships of 131 species inferred from PCGs12RNA dataset. Numbers beside the nodes are the ultrafast bootstrap value (BS)/the approximate likelihood ratio test support (SH aLRT) for partitioned maximum likelihood analysis.
Table S1.doc Collect information on 16 newly sequenced species in Muscomorpha.
Table S2.xlsx A total of 131 species of mitogenomes were selected for comparative and phylogenetic analyses in the infraorder Muscomorpha, with 16 species in 11 families being newly sequenced in this study.
Table S3.xlsx Calibration point, fossil information, calibrating scheme and references used in the analyses for estimating divergence dates.
Table S4.xlsx The distribution information for ancestral state estimation in the present study. A (Afrotropical region), B (Palaearctic region), C (Oriental region), D (Australasian), E (Nearctic region) and F (Neotropical region) respectively.
Table S5.xlsx The results of model test for inferring biogeographic history.
Table S6.xlsx The long branch scores of the phylogenetic tree are based on the PCGsRNA dataset.
Table S7.xlsx Summary of support values that support specific taxonomic groups in 6 phylogenetic analyses.
matirx.zip all matrix used in this paper (this folder consists of 4 NEX files namely PCGsRNA, AA, PCGs, and PCGs12RNA)
Abbreviations
Mitogenome: mitochondrial genome; PCGs: protein-coding genes; rRNAs: ribosomal RNA genes; tRNAs: transfer RNA genes; CR: control region; PCGsRNA: dataset of 13 protein-coding genes plus two rRNA genes; PCGs12RNA: dataset of 13 protein-coding genes removing 3rd coding site then plus two rRNA genes; AA: amino acid translated by 13 protein-coding genes; ML: maximum likelihood; BP: bootstrap value; SH: aLRT SH-like supports value; K-Pg: the Cretaceous–Paleogene; HPD: highest posterior density.
Methods
Our phylogenetic analysis included published sequences from 131 muscomorpha, representing 53 families from 18 superfamilies. Multiple sequence alignment precedes matrix generation, we employed the codon-aware program MACSE v2.06 for 13 PCGs and MAFFT version 7.0 with the G-INS-i strategy for 2 rRNAs, thereafter, the 13 PCGs were subjected to trimming using Gblocks under the invertebrate mitochondrial genetic code, while the two rRNA sequences underwent trimming using trimAl v1.2rev57, subsequently, all individual alignments were concatenated into a supermatrix using the Phylosuite\_v1.2.3 platform with default settings. We constructed 4 datasets for phylogenetic analyses: (1) PCGsrRNA, the combination of 13 protein-coding genes plus two rRNA genes, resulting in a total sequence length of 12,641 nucleotides; (2) PCGs12rRNA, to mitigate substitution saturation, the third codon positions of 13 PCGs were excluded; (3) PCGs, all codon positions; and (4) AA, amino acids translated by PCGs. Before phylogenetic analyses, the substitution saturation of each codon position of the 13 mitochondrial PCGs was assessed using the index (Iss) with DAMBE v.6. the completeness of multiple sequence alignments was quantified by AliStat, and the heterogeneity of sequence was visualized using AliGROOVE v.1.08. To determine the optimal partitioning schemes and corresponding nucleotide substitution models for each dataset, we employed ModelFinder to select the best-fit substitution model for each partition in maximum likelihood (ML) analysis. To avoid the influence of heterotachous evolutionary sequences on phylogenetic inference, we used the single topology (GHOST) model in IQ-TREE, the Bayesian information criterion (BIC) and the 'greedy' algorithm were used, with branch lengths estimated as 'unlinked', to search for the best-fit scheme in the partition model. To mitigate the effects of long-branch attraction (LBA) artefacts, the posterior mean site frequency (PMSF) model was manipulated in IQ-TREE too. In the concatenated analyses, support values were assessed using the ultrafast bootstrap (UFBoot), the approximate likelihood ratio test (SH-aLRT), and a Bayes test.