Phylogenomics resolve the systematics and biogeography of the ant tribe Myrmicini and tribal relationships within the hyperdiverse ant subfamily Myrmicinae
Data files
Jun 12, 2025 version files 463.65 MB
-
model_adequacy.zip
334.53 MB
-
morphometrics.zip
15.44 KB
-
phylogenies.zip
129.09 MB
-
README.md
18.61 KB
Abstract
Ants are a globally distributed and highly diverse group of eusocial animals, playing key ecological roles in most of the world’s terrestrial ecosystems. Our understanding of the processes involved in the evolution this diverse family is contingent upon our knowledge of the phylogeny of the ants. While relationships among most subfamilies have come into resolution recently, several of the tribal relationships within the hyperdiverse subfamily Myrmicinae persistently conflict between or within studies, mirroring the controversial relationships of the Leptanillinae and Martialinae to the remaining ant subfamilies. Another persistent issue of debate in ant phylogenetics is the timing of major evolutionary events as inferred via divergence dating. Here, we test the topology of the myrmicine tribes using genome scale data, inspect gene tree-species tree concordance, and use posterior predictive checks and tests of compositional heterogeneity to infer sequence characteristics which potentially introduce systematic bias in myrmicine tribal topology. Furthermore, we test the placement of the fossil †Manica andrannae by integrating phylogenomic and morphological data from nearly all species within the genus Manica, and a broad sampling of its sister genus Myrmica. Subsequently, we demonstrate the effect of fossil placement on overall divergence times in the Myrmicinae. We then re-evaluate the historical biogeography of the Myrmicini and Pogonomyrmecini considering newly generated genetic data and insights from our phylogenomic results. We find that our current understanding of tribal topology in the Myrmicinae is strongly supported, but this topology is highly sensitive to compositional heterogeneity and gene-tree species-tree conflict. Our fossil placement analyses strongly suggest that †Manica andrannae is a stem Manica species, and that placement of this fossil in the crown group affects not only divergence dates within the tribe Myrmicini, but also has broad implications for divergence times throughout the formicoid clade. The results of our biogeographic reconstructions indicate a South American origin for the Pogonomyrmecini + Myrmicini, with the MRCA of Myrmica inhabiting the western Nearctic in the early Miocene prior to repeated dispersal across Beringia throughout the Miocene and Pliocene. The MRCA of Manica, on the other hand, was inferred to have a Holarctic range prior to vicariance during the Pliocene. Unexpectedly, we found strong support in the Pogonomyrmecini for three coordinated dispersal events from South to Central America during the early Miocene, which has been previously proposed as an early biotic interchange event prior to the more commonly accepted 3.5 Ma closure of the Isthmus of Panama.
Supplementary materials for “Phylogenomics resolve the systematics and biogeography of the ant tribe Myrmicini and tribal relationships within the hyperdiverse ant subfamily Myrmicinae”
https://doi.org/10.5061/dryad.c59zw3rhg
Authors: Matthew Prebus & Christian Rabeling
Year: 2025
Contact: Matthew Prebus, mprebus@gmail.com
Description of the data and file structure
This dataset contains three compressed directories:
model_adequacy.zip
morphometrics.zip
phylogenies.zip
The file structure of model_adequacy.zip
is as follows:
model_adequacy
├── matrices
│ ├── full_matrix
│ ├── max_PD_pass
│ ├── mean_GC_pass
│ ├── p4_corrected
│ ├── p4_uncorrected
│ └── var_GC_pass
└── posterior_predictive_checks
├── data
│ ├── GTR_Gamma
│ ├── GTR_Inv
│ ├── HKY_Gamma
│ ├── HKY_Inv
│ ├── JC_Gamma
│ ├── JC_Inv
│ ├── K80_Gamma
│ ├── K80_Inv
│ ├── K81_Gamma
│ ├── K81_Inv
│ ├── TIM_Gamma
│ ├── TIM_Inv
│ ├── TrN_Gamma
│ ├── TrN_Inv
│ ├── TVM_Gamma
│ └── TVM_Inv
└── scripts
This directory contains all scripts, matrices, and results used for posterior predictive checks on the Myrmicinae_176t dataset:
matrices
contains data and results of phylogenetic analyses of datasets filtered following posterior predictive checks or analysis of compositional heterogeneity:
full_matrix
contains the full unfiltered Myrmicinae_176t dataset.max_PD_pass
contains the Myrmicinae_176t dataset filtered to include only loci that passed the max PD statistic.max_GC_pass
contains the Myrmicinae_176t dataset filtered to include only loci that passed the max GC statistic.p4_corrected
contains the Myrmicinae_176t dataset filtered to include only loci that passed the phylogeny-corrected p4 analysis.p4_uncorrected*
contains the Myrmicinae_176t dataset filtered to include only loci that passed the standard chi-squared p4 analysis.var_GC_pass
contains the Myrmicinae_176t dataset filtered to include only loci that passed the var GC statistic.
Each directory in matrices
contains four files:
*_b10.trees
: Newick formatted gene partition trees estimated with IQTREE v2.1.1; all nodes with < 10 bootstrap support have been collapsed into polytomies using newick_utils.*_models.nex
: NEXUS formatted matrix partitions with associated substitution models estimated with IQTREE2.*.trees
: Newick formatted gene partition trees estimated with IQTREE.*.nex
: NEXUS formatted ultraconserved elements (UCE) nucleotide sequence alignments.
posterior_predictive_checks
contains data and scripts used to run posterior predictive checks on UCE alignments.
data
contains subdirectories named by best-fitting nucleotide substitution model inferred with ModelTest in IQTREE, with each directory containing UCE alignments in FASTA (*.fas
) format that have been partitioned with the SWSC-EN algorithm.rb_pps_batch.sh
is a bash file used to run posterior predictive checks with RevBayes v1.2.1.scripts
contains all RevBayes and R v4.1.3 scripts used to run posterior predictive checks.
The file structure for morphometrics.zip
is as follows:
morphometrics
├── morphometrics_HW_scaled_pca.R
├── morphometrics_HW_scaled.csv
├── morphometrics.csv
└── morphometrics.R
This directory contains all scripts and datasets used to generate figures S2 and S3 in Supplementary_File_1_extended_methods_morphometrics.pdf
:
morphometrics_HW_scaled_pca.R
: an R script for principal components analysis of morphometric data from extant Manica, extant Myrmica, and the fossil †Manica andrannae scaled to head width.
morphometrics_HW_scaled.csv
: morphometric data from 63 samples of extant Manica, extant Myrmica, and the fossil †Manica andrannae scaled to head width. The fields in this file are as follows:
- SpecimenCode: a unique alphanumeric code physically associated with an individual specimen, primarily used for databasing.
- Genus: genus of the specimen.
- Species: species of the specimen.
- DNA extract: a unique alphanumeric code physically associated with an individual specimen if DNA has been extracted from it for analysis.
The following measurements were taken in millimeters:
- HL: head length: the head must be carefully tilted to the position, providing the true maximum. If excavations of the posterior margin of the head capsule and/or anterior margin of the clypeus are present, then the measurement is taken from an imaginary line that spans the excavations from the posterior- or anterior-most margins.
- HW: head width: maximum width of the head, measured directly behind the compound eyes.
- FW: frontal lobe width: the maximum width measured between the frontal lobes;
- MdL: mandible length: length of the masticatory margin of the mandible.
- SL: scape length: maximum scape length, excluding the basal neck and the articular condyle.
- PdL: antennal pedicel length, excluding basal condyle.
- FI1: the length of the first flagellomere.
- FI2: the length of the second flagellomere.
- OL: ocular length: the maximum diameter of the compound eye.
- WL: Weber’s length: distance between the caudal most point of propodeal lobe to the inflection point between the pronotal neck and the pronotal declivity.
- PrdH: propodeum height: the height of the propodeum in profile, measured as the orthogonal distance from the ventral edge of the metapleuron to the highest point of the propodeum.
- ESL: propodeal spine length: distance between the center of the propodeal spiracle and tip of the propodeal spine.
- PtL: petiole length: diagonal petiolar length in lateral view; measured from anterior corner of subpetiolar process to dorso-caudal corner of caudal cylinder.
- PtH: petiole height: the height of the petiolar node in profile, measured as the perpendicular distance from an imaginary line joining the base of the subpetiolar tooth and the ventral junction of the petiole and postpetiole to the highest point of the petiolar node.
- PPL: postpetiole length: the longest distance, perpendicular to the posterior margin of the postpetiole, between the posterior postpetiolar margin and the anterior postpetiolar margin.
- PPH: postpetiole height: the height of the postpetiole in profile, measured as the perpendicular distance from the ventral edge to the highest point of the postpetiole.
- PNW: pronotum width: the maximum width of the pronotum in dorsal view.
- PrdL: propodeum length: the maximum length of the propodeum in dorsal view, excluding the propodeal lobes.
- ESD: propodeal spine distance: distance between the tips of propodeal tubercles/spines in dorsal view.
- PtW: petiole width: the maximum width of the petiolar node in dorsal view.
- PPW: postpetiole width: the maximum width of the postpetiole in dorsal view.
- HFL: hind femur length: the maximum length of hind femur, measured in dorsal view.
- HTL: hind tibia length: the maximum length of hind tibia, measured in dorsal view, excluding the proximal condyle.
The following are indices calculated from the above measurements:
- CI: cephalic index: HL/HW*100.
- SI1: scape length index 1: SL/HL*100.
- SI2: scape length index 2: SL/HW*100.
- FLI: frontal lobe index: FW/HW*100.
- OI1: ocular index 1: OL/HL*100.
- OI2: ocular index 2: OL/HW*100.
- PI1: petiole height index: PtL/PtH*100.
- PI2: petiole width index: PtL/PtW*100.
- PPI1: postpetiole height index: PPL/PPH*100.
- PPI2: postpetiole width index: PPH/PPW*100.
- PPI3: postpetiole-petiole index: PPW/PtW*100.
- ESLI: propodeal spine length index: ESL/HW*100.
- ESDI: propodeal spine distance index: ESD/HW*100.
- MI: mesosoma index: WL/PNW*100.
- PRI: propodeum index: PrdL/PrdH*100.
The following are binary characters:
- antennal club count > 3: 1; ≤ 3: 0.
- mandibular teeth > 10: 1; ≤ 10: 0.
morphometrics.csv
: raw morphometric data from 63 samples of extant Manica, extant Myrmica, and the fossil †Manica andrannae. The variables in this dataset are identical to those in morphometrics_HW_scaled.csv
, but raw measurements and indices are not scaled to head width.
morphometrics.R
: an R script for comparison of mean morphometrics between extant Manica, extant Myrmica, and the fossil †Manica andrannae.
The file structure for phylogenies.zip
is as follows:
phylogenies
├── astral
├── iqtree
├── mcmctree
│ ├── MandrannaeManica_mcmctree
│ └── MandrannaeMyessensis_mcmctree
├── mrbayes
└── revbayes
├── biogeography_ase_v1_no_dispersal
│ ├── data
│ ├── output
│ └── scripts
├── biogeography_ase_v2_dispersal
│ ├── data
│ ├── output
│ └── scripts
├── divergence_dating
│ ├── data
│ ├── output
│ └── scripts
└── topology
├── data
├── output
└── scripts
This directory contains all scripts and datasets used to generate the phylogenies in this study.
astral
contains input data used to generate summary coalescent trees with ASTRAL III v5.7.4 as well as the output trees.
Myrmicinae_176t_2255l_BS10_astral.tre
: a Newick tree file containing summary coalescent analysis output from ASTRAL inferred from 176 taxa in the Myrmicinae with 2255 UCE loci. Node support is in local posterior probability.Myrmicinae_176t_2255l_BS10.tre
: a Newick tree file containing 2255 UCE trees, one for each locus, inferred from 176 taxa in the Myrmicinae with IQTREE. Nodes supported by a maximum likelihood bootstrap score of less than 10 were collapsed into polytomies using newick_utils.Myrmicini_62t_2177l_BS10.tre
: a Newick tree file containing summary coalescent analysis output from ASTRAL inferred from 62 taxa in the Myrmicini with 2177 UCE loci. Node support is in local posterior probability.Myrmicini_62t_2177l_BS10.tre
: a Newick tree file containing 2177 UCE trees, one for each locus, inferred from 176 taxa in the Myrmicinae with IQTREE. Nodes supported by a maximum likelihood bootstrap score of less than 10 were collapsed into polytomies using newick_utils.
iqtree
contains input data for IQTREE used to generate, as well as output trees from the analyses:
- the constrained trees for the topology test analysis of the Myrmicinae dataset (files beginning with
Myrmicinae_176t_2255l*
). Constraint trees used to compare topologies are denoted with the author names from each relevant study (Branstetter et al. 2017; Romiguier et al. 2022; Ward et al. 2015). They consist of two files:*_constraint.tre
is the constraint file used in the analysis of theMyrmicinae_176t_2255l.nex
alignment with theMyrmicinae_176t_2255l_SWSCEN_partitions.txt.best_scheme.nex
partition scheme, which resulted in the second file:*_constrained.tre
.Myrmicinae_176t_2255l_SWSCEN_partitions.txt
is a partition file used to find the best partitioning scheme in an unconstrained analysis of theMyrmicinae_176t_2255l.nex
alignment, which resulted inMyrmicinae_176t_2255l.contree
. - comparison of the topology of the full dataset 62 taxon dataset (files beginning with
Myrmicini_62t_2177l*
) and reduced dataset (files beginning withMyrmicini_62t_50l*
). Three files are associated with each analysis:*_SWSCEN_partitions.txt
are UCE partitions found by the SWSC-EN algorithm;*.nex
are the UCE alignments;*.contree
are the resulting trees.
mcmctree
contains input data used as input for MCMCTREE to generate divergence dating trees for the Myrmicinae comparing alternate fossil calibrations.
MandrannaeManica_mcmctree
contains a branch variation filein.BV
generated by mcmctree and the mcmctree control filemcmctree.ctl
to run the divergence dating analysis assuming that the fossil †Manica andrannae was a stem Manica species.MandrannaeManica_mcmctree.tre
is the resulting divergence dated tree.MandrannaeMyessensis_mcmctree
contains a branch variation filein.BV
generated by mcmctree and the mcmctree control filemcmctree.ctl
to run the divergence dating analysis assuming that the fossil †Manica andrannae is sister to Manica yessensis.MandrannaeMyessensis_mcmctree.tre
is the resulting divergence dated tree.Myrmicinae_176t_50_best.phy
is a phylip file containing the 50 best performing UCE loci in the 176 taxon Myrmicinae dataset found using SortaDate.Myrmicinae_176t_50l_SWSCEN_partitions_calibrated_MandrannaeMyessensis.tre
is a Newick tree file used as input for theMandrannaeMyessensis_mcmctree
analysis.Myrmicinae_176t_50l_SWSCEN_partitions_calibrated.tre
is a Newick tree file used as input for theMandrannaeManica_mcmctree
analysis.
mrbayes
contains nexus formatted input files used to generate the fossil placement analyses for †Manica andrannae with MRBAYES v3.2.6, as well as their output trees, all of which end in *.con.tre
.
Myrmicini_9t_50l_combined_igr_mrbayes.nex
is used to infer the diversified sampling analysis for fossil placement using a cutoff of 9 Ma.Myrmicini_13t_50l_combined_igr_mrbayes.nex
is used to infer the diversified sampling analysis for fossil placement using a cutoff of 5 Ma.Myrmicini_62t_50l_mrbayes.nex
is used to infer the topology of extant Manica and Myrmica using only molecular data, without a clock.Myrmicini_63t_50l_combined_igr_mrbayes_MandrannaeManica.nex
is a combined DNA sequence and morphology alignment used to infer divergence dates constraining †Manica andrannae to be a stem species of Manica.Myrmicini_63t_50l_combined_igr_mrbayes_MandrannaeMyessensis.nex
is a combined DNA sequence and morphology alignment used to infer divergence dates constraining †Manica andrannae to be sister to Manica yessensis.Myrmicini_63t_50l_combined_igr_mrbayes_unconstrained.nex
is a combined DNA sequence and morphology alignment used to infer divergence dates without constraining †Manica andrannae to any position.Myrmicini_63t_50l_combined_mrbayes_no_clock.nex
is used to infer the topology of extant Manica, Myrmica, and the fossil species †Manica andrannae using molecular and morphological data, without a clock.Myrmicini_63t_50l_morphology.nex
is used to infer the topology of extant Manica, Myrmica, and the fossil species †Manica andrannae using only morphological data, without a clock.
revbayes
contains input data used to generate the historical biogeography analyses for the Myrmicini and Pogonomyrmecini using RevBayes.
biogeography_ase_v1_no_dispersal
contains files used in the analysis of historical biogeography without over-water dispersal; it contains two subdirectories.data
contains files used as input for running an epoch model using 10 time periods.myrmicini_pogonomyrmecini_biogeo_dating.mcc.tre
is a Newick file containing the results of the divergence dating analysis (see below).myrmicini_pogonomyrmecini_bioregions.nex
is a nexus file containing bioregion occupancy data for each taxon in the analysis.myrmicini_pogonomyrmecini_divergence_dating.mcc.tre
is a Newick file containing the results of the divergence dating analysis (see below), without node posterior probability or confidence intervals.- files beginning with
myrmicini_pogonomyrmecini.connectivity.*
are landmass connectivity files, one for each time period. - files beginning with
myrmicini_pogonomyrmecini.distances.*
are landmass distance files, one for each time period myrmicini_pogonomyrmecini.range_colors.txt
contains unique colors for each bioregion (or combination of bioregions) in the analysis.myrmicini_pogonomyrmecini.times.txt
defines each time period used in the epoch analysis.
output
contains the raw ancestral state estimation tree output.scripts
contains files used for running and analyzing the analysis.make_anc_state.Rev
is a RevBayes script used to analyze the output ofMyrmicini_Pogonomyrmecini_run_epoch.Rev
.Myrmicini_Pogonomyrmecini_run_epoch.Rev
runs the epoch model with RevBayes.plot_anc_range.epoch_phy.R
is an R script that plots the results ofMyrmicini_Pogonomyrmecini_run_epoch.Rev
andmake_anc_state.Rev
.plot_anc_range.util.R
is a utility script used byplot_anc_range.epoch_phy.R
.
biogeography_ase_v2_dispersal
contains files used in the analysis of historical biogeography with over-water dispersal; it is structured identically to thebiogeography_ase_v1_no_dispersal
directory above.divergence_dating
contains files used to generate the divergence dated tree used as input for the two historical biogeography analyses above.data
contains nexus formatted DNA sequence alignment files, each named for the best-fitting nucleotide substitution model found by ModelFinder in IQTREE2 for that alignment.output
contains the divergence dating tree output.scripts
contains RevBayes scripts called by theMyrmicini_Pogonomyrmecini_dating_master.Rev
script.myrmicini_pogonomyrmecini_181t_substitution_partitioned.Rev
specifies the nucleotide substitution models used in the analysis.myrmicini_pogonomyrmecini_clock.Rev
specifies the molecular clock.Myrmicini_Pogonomyrmecini_dating_master.Rev
is a RevBayes script used to run the divergence dating analysis.myrmicini_pogonomyrmecini_tree.Rev
specifies the tree model.
topology
files used to infer the topology of the Myrmicini and Pogonomyrmecini tree without a molecular clock.data
contains nexus formatted DNA sequence alignment files, each named for the best-fitting nucleotide substitution model found by ModelFinder in IQTREE2 for that alignment.output
contains the topology tree outputscripts
contains:Myrmicini_Pogonomyrmecini_topology.Rev
is a RevBayes script that runs the analysis.