Exploring the phylogeny of the Leptanillinae (Hymenoptera: Formicidae) through high-throughput sequencing and Bayesian total-evidence inference, with accommodation of systematic bias
Data files
Feb 10, 2026 version files 1.46 GB
-
anomaly_finder_output.zip
193.80 KB
-
data.zip
29.26 MB
-
Phylogenomics.zip
1.43 GB
-
README.md
4.18 KB
-
TE_output.zip
110.64 KB
Abstract
Ants of the subfamily Leptanillinae (Hymenoptera: Formicidae) are minute and subterranean, with taxonomy impeded by the dissociation of disparate male and female forms. The advent of phylogenomic inference can remedy this dissociation while resolving leptanilline phylogeny with strong statistical support. However, genome-scale molecular data are vulnerable to systematic biases that may result in decisive statistical support for erroneous conclusions, and previous phylogenomic study of the Leptanillinae did not address this, nor attempted to resolve the phylogenetic positions of the several aberrant leptanilline lineages for which UCEs cannot be obtained. I query the phylogeny of the Leptanillinae from ultra-conserved elements (UCEs) using a Bayesian framework and mitigate the effects of several potential systematic biases by replicating analyses using multiple curated UCE alignments. In addition, I implemented Bayesian total-evidence inference from UCEs and male morphological data to resolve the phylogenetic positions of the monotypic former genus Scyphodon and three other enigmatic terminals for which molecular data are unavailable. The phylogeny of the Leptanillinae inferred herein is robust to compositional biases in phylogenomic data. The synonymy of Scyphodon with Leptanilla is confirmed, with the former being recovered within the Leptanilla havilandi species-group with high Bayesian posterior probability, but the phylogenetic positions of the remaining three morphology-only terminals were unresolved by Bayesian total-evidence inference due to insufficient availability phylogenetic information or inferential artifacts.
Description of data and file structure
The phylogeny of the Leptanillinae was inferred under model-based statistical frameworks from 5 curated alignments of ultra-conserved elements (UCEs), which were generated from contigs in phyluce v. 1.7.1. Contigs were assembled in SPAdes. Each of these 5 alignments was trimmed for potential misalignment using Spruceup across a range of 4 predetermined lognormal cutoff thresholds, resulting in 25 alignments inclusive of those that were untrimmed in Spruceup.
Maximum-likelihood phylogenetic inference was implemented in IQ-Tree v. 2.2.2.5, and coalescent-based inference in ASTRAL-III, using all 25 alignments. Bayesian phylogenetic inference was implemented in ExaBayes v. 1.5.1 for the 5 alignments trimmed with lognormal cutoff = 0.90 in Spruceup. Bayesian total-evidence inference was implemented in RevBayes v. 1.2.2 using a subsampled version of Matrix 0.90B and 65 binary male morphological characters.
Scripts used for Bayesian inference are presented on Zenodo (10.5281/zenodo.16781993).
Data location
Raw sequence reads are stored in the following locations:
NCBI, PRJNA629360: "Systematic revision of the ant subfamily Leptanillinae (Hymenoptera: Formicidae) grounded in phylogenomic inference"
NCBI, PRJEB48742: "Ant phylogenomics reveal a natural selection hotspot preceding the origin of advanced eusociality"
NCBI, PRJNA379583: "Phylogenomic insights into the evolution of stinging wasps and the origins of ants and bees"
NCBI, PRJNA360290: "Enriching the ant tree of life: enhanced UCE bait set for genome-scale phylogenetics of ants and other Hymenoptera"
NCBI, PRJNA759281: "Systematics and biogeography of Dorymyrmex (Hymenoptera: Formicidae)"
NCBI, PRJNA248919: "Hymenopteran UCE sequences"
Hymenoptera Genome Database: (https://hymenoptera.elsiklab.missouri.edu/)
Additional data used in this study are cited in Table S1 of Romiguier et al. (2022) (https://www.cell.com/current-biology/fulltext/S0960-9822(22)00760-6).
Analytical software
Programs numbered in each workflow according to order of use.
Maximum-Likelihood Phylogenomics, concatenated
**1. PartitionUCE
* SWSC-EN
**2. IQ-Tree v. 2.2.2.5
** ModelFinder
PartitionUCE was not used for by-locus partitioning.
All analysis parameters for 2 (.log) are located in Phylogenomics\ML\by-locus or Phylogenomics\ML\within-locus.
Two-Step Coalescent-based Inference
**1. IQ-Tree v. 2.1.2
**2. #### ASTRAL-III
* λ = 0.5
All analysis parameters for 2 (.log) are located in Phylogenomics\ML\SH-aLRT.
Bayesian Phylogenomics
**1. PartitionUCE
* SWSC-EN
**2. IQ-Tree 2.2.2.5
** ModelFinder
**3. ExaBayes v. 1.5.1
** consense
* threshold = 50
* burnin value = 0.25
All analysis parameters for 2 are located under Phylogenomics\ML\within-locus\ for Matrices 0.90A-C, E (.log); and under Phylogenomics\ML\by-locus\IQT_by-locus_0.90D.log for Matrix 0.90D.
All analysis parameters for 3 are located in the configuration file EB_1e6.conf.
Total-Evidence Bayesian Phylogenetic Inference
**1. IQ-Tree v. 2.2.2.2
**2. spruceup.py
**3. IQ-Tree 2.1.2
**a. ModelFinder
**4. genesortR
**5. RevBayes v. 1.2.2
All analysis parameters for 3 are located in Phylogenomics\ML\SH-aLRT\gene_trees_0.90B.log.
All analysis parameters for 5 are located in TE_phylogeny_no-partitioning.rev.
Folder list
For all folder contents, see respective readme.md files contained within each folder
