Evolutionary history, novel lineages, and symbiont coevolution in the ant tribe Camponotini (Hymenoptera: Formicidae)
Data files
Feb 26, 2025 version files 6.08 GB
-
1_Assembled_ant_contigs.zip
1.61 GB
-
2_Phylogenetic_input_and_result_files_for_ant_analyses.zip
4.46 GB
-
3_Biogeography.zip
33.46 KB
-
4_Blochmannia_datasets.zip
2.10 MB
-
5_Blochannia_MLsearches.zip
2.12 MB
-
6_Blochmannia_topology_tests.zip
1.83 MB
-
README.md
15.77 KB
Abstract
Many insect groups have acquired obligate microbial symbionts, and the resulting associations can have important ecological and evolutionary consequences. A notable example among ants is the species-rich tribe Camponotini, whose members derive nutritional benefits from a vertically inherited, bacterial endosymbiont Blochmannia. We generate ultraconserved element (UCE) phylogenomic data for 220 ingroup and 5 outgroup taxa to reconstruct a detailed evolutionary history of the Camponotini, including inference of divergence times and dispersal events. Under multiple modes of analysis, including both concatenation and species-tree approaches, we recover a well-supported backbone phylogeny comprising eight lineages: three large genera (Camponotus, Colobopsis, Polyrhachis) and several smaller genera or clusters of genera. Three novel lineages are uncovered that cannot be placed in any existing genus: Lathidris gen. n., from the mountains of Mesoamerica; Retalimyrma gen. n., from the Indian Himalayas; and Uwari gen. n., from eastern Asia. The species in these new genera were described and placed erroneously in Camponotus. The tribe Camponotini is estimated to have a crown origin in the Eocene (median age 38.4 Ma), with successively younger crown ages for Colobopsis (22.5 Ma), Camponotus (18.6 Ma), and Polyrhachis (18.5 Ma). We infer an Australasian or Indomalayan origin for the tribe, with multiple dispersal events to the Afrotropics, Palearctic region, and New World. Phylogenetic analysis of selected Blochmannia genes from a subset of 97 camponotine taxa yields results that are largely congruent with the ant host phylogeny, at least for well-supported nodes, but we find evidence that Blochmannia from some old lineages—especially Lathidris—may have discordant histories, suggesting possible lability of this symbiosis in the early evolution of camponotine ants.
https://doi.org/10.5061/dryad.hx3ffbgqd
Description of the data and file structure
The data in this repository were collected as part of a phylogenetic study on the ant tribe Camponotini and their Blochmannia symbionts, based on ultraconserved element (UCE) phylogenomic data for 225 ant species and selected genes from 97 symbiont taxa. We also estimated the divergence times and biogeographic history of the ants and examined patterns of coevolution between the two symbiotic partners.
Sequence data from target capture and WGS from ants were combined and assembled into contigs using SPAdes, and subsequently processed with Phyluce to filter out ultraconserved element loci. Alignment of matrices was performed with Mafft in Phyluce. Poor quality sequence regions were trimmed further using the program Spruceup. Partitioning was implemented using the SWSC-EN algorithm.
Blochmannia sequence data was captured from de novo assemblies of WGS data. De novo assemblies were performed using Unicycler; contigs >2500 bp were blasted against all protein-coding genes of published Blochmannia genome sequences, using blastx in NCBI-BLAST 2.12.0. Gene regions were extracted from contig sequences based on the coordinates of highly significant blast matches; the main analysis is based on seven concatenated protein-coding genes (dnaE, gidA, groEL, gyrA, gyrB, rpoB, and rpoC). Poorly aligned regions were trimmed using Trimal
Maximum likelihood phylogenies were inferred for both ants and symbionts in IQ-TREE. Divergence times for ants were estimated in MCMCtree and biogeographic history was estimated in BioGeoBEARS.
Files and variables
File: 1_Assembled_ant_contigs.zip
Description: SPAdes assemblies of UCE+WGS data for all 225 ant taxa analyzed in the study.
File: 2_Phylogenetic_input_and_result_files_for_ant_analyses.zip
Description:
2.1_ML_IQTREE_analyses: Eight subfolders contain files for each maximum likelihood (ML) analysis performed with IQ-TREE (v2.1.3).
2.1.1_90%-0.97: Unpartitioned analysis performed on the 90% taxon-complete nucleotide matrix after employing 0.97 spruceup trimming. Contains .phylip, .log, and .tre files for ML analysis performed with IQTREE (v2.1.3).
2.1.2_90%-0.97-SWSC: SWSC-partitioned analysis performed on the 90% taxon-complete nucleotide matrix after employing 0.97 spruceup trimming. The directory contains a subfolder “SWSC+PartitionFinder” with the .charsets and .nex input files for analysis with the Sliding-Window Site Characteristics Entropy algorithm (SWSC-EN). The output from this analysis was used to find the best data partitioning scheme using PartitionFinder2 (v2.1.1) (output file _best_scheme.txt) and run the subsequent ML analysis using these data partitions in IQ-TREE (v2.1.3). Included are .phylip and .nex input files, and .log and .tre output files for this partitioned ML analysis performed in IQ-TREE.
2.1.3_90%-0.98: Unpartitioned analysis performed on the 90% taxon-complete nucleotide matrix after employing 0.98 spruceup trimming. Contains .phylip, .log, and .tre files for ML analysis performed with IQTREE (v2.1.3).
2.1.4_90%-0.98-SWSC: SWSC-partitioned analysis performed on the 90% taxon-complete nucleotide matrix after employing 0.98 spruceup trimming. The directory contains a subfolder “SWSC+PartitionFinder” with the .charsets and .nex input files for analysis with the Sliding-Window Site Characteristics Entropy algorithm (SWSC-EN). The output from this analysis was used to find the best data partitioning scheme using PartitionFinder2 (v2.1.1) (output file _best_scheme.txt), and run the subsequent ML analysis using these data partitions in IQ-TREE (v2.1.3). Included are .phylip and .nex input files, and .log and .tre output files for this partitioned ML analysis performed in IQ-TREE.
2.1.5_80%-0.97: Unpartitioned analysis performed on the 80% taxon-complete nucleotide matrix after employing 0.97 spruceup trimming. Contains .phylip, .log, and .tre files for ML analysis performed with IQTREE (v2.1.3).
2.1.6_80%-0.97-SWSC: SWSC-partitioned analysis performed on the 80% taxon-complete nucleotide matrix after employing 0.97 spruceup trimming. The directory contains a subfolder “SWSC+PartitionFinder” with the .charsets and .nex input files for analysis with the Sliding-Window Site Characteristics Entropy algorithm (SWSC-EN). The output from this analysis was used to find the best data partitioning scheme using PartitionFinder2 (v2.1.1) (output file _best_scheme.txt) and run the subsequent ML analysis using these data partitions in IQ-TREE (v2.1.3). Included are .phylip and .nex input files, and .log and .tre output files for this partitioned ML analysis performed in IQ-TREE.
2.1.7_80%-0.98: Unpartitioned analysis performed on the 80% taxon-complete nucleotide matrix after employing 0.98 spruceup trimming. Contains .phylip, .log, and .tre files for ML analysis performed with IQTREE (v2.1.3).
2.1.8_80%-0.98-SWSC: SWSC-partitioned analysis performed on the 80% taxon-complete nucleotide matrix after employing 0.98 spruceup trimming. The directory contains a subfolder “SWSC+PartitionFinder” with the .charsets and .nex input files for analysis with the Sliding-Window Site Characteristics Entropy algorithm (SWSC-EN). The output from this analysis was used to find the best data partitioning scheme using PartitionFinder2 (v2.1.1) (output file _best_scheme.txt) and run the subsequent ML analysis using these data partitions in IQ-TREE (v2.1.3). Included are .phylip and .nex input files, and .log and .tre output files for this partitioned ML analysis performed in IQ-TREE.
2.2_Genetrees_ASTRAL-III: Contains two subfolders with ML input trees (.trees) estimated with IQ-TREE (v2.1.3), as well as .log files and output trees (.tre) for each species tree analysis performed with ASTRAL-III v5.7.8.
2.2.1_90%: Input and output files for analyses based on 90% taxon completeness and 1440 loci.
2.2.2_80%: Input and output files for analyses based on 80% taxon completeness and 2076 loci.
2.3_Dating_MCMCTREE: Contains two subfolders with alignments (.phylip), configuration files (.ctl), and input tree files (input.tre), as well as the resulting MCMC file and chronogram (.tre) from each analysis performed with MCMCTREE in PAMLv4.9.
2.3.1_90%-0.98: Analyses constrained using the ML best tree from IQ-TREE analysis of the 90%-0.98 matrix as input topology.
2.3.2_80%-0.98: Analyses constrained using the ML best tree from IQ-TREE analysis of the 80%-0.98 matrix as input topology.
File: 3_Biogeography.zip
Description:
Contains input files and R code to run the 8 variations of biogeographic reconstructions with BioGeoBears in R presented in Ward et al. Specifically, included are an R notebook (.Rmd) containing the code to run the analyses, four chronograms (.phy), two distribution matrices in .csv format (geo.data.csv=excluding outgroups and geo.data_with_out.csv=including outgroups) and a file with dispersal constraints (.txt). The R notebook specifies as an example the code for the constrained analysis of the 80%-0.98 matrix while excluding the outgroup; for the remaining seven analyses, the code needs to be modified as indicated in the comments of the notebook. The tree files are labeled as follows:
Campo220_80_98.phy=chronogram from analysis of the 80%-0.98 matrix, excluding outgroup;
Campo220_90_98.phy=chronogram from analysis of the 90%-0.98 matrix, excluding outgroup;
Campo221_80_98_with_out.phy=chronogram from analysis of the 80%-0.98 matrix, including outgroup;
Campo221_90_98_with_out.phy=chronogram from analysis of the 90%-0.98 matrix, including outgroup.
File: 4_Blochmannia_datasets.zip
Description:
4.1_Bloch_fastas: Unaligned FASTA files of Blochmannia gene sequences assembled and analyzed in this study. Genes include 16S rDNA, 23S rDNA, and seven protein-coding genes (dnaE, gidA, groEL, gyrA, gyrB, rpoB, *and *rpoC).
4.2_Bloch_trimmedAlns: Quality-trimmed alignments used for Blochmannia phylogenetic analyses. Alignments include Blochmannia sequences newly obtained in the current study, and when applicable, previously published gene sequences from Blochmannia genomes and related bacterial taxa such as closely related endosymbionts and more distant bacterial outgroups. Accession numbers for any previously published sequences are noted in the corresponding tree legends in the text. These alignments are the phylip (.phy) or fasta (.fas) input files used in the phylogenetic analyses below.
File: 5_Blochannia_MLsearches.zip
Description:
Five subfolders contain files for each maximum likelihood (ML) analysis of Blochmannia alignments performed with IQ-TREE (v. 2.2.2.6 or 2.2.2.7). Each folder contains a .phy or .fas input file (the alignments that are also posted within “4.2_Bloch_trimmedAlns”) the resulting .log and .tre files for the ML analysis performed with IQ-TREE. The resulting ML tree and the bootstrap consensus tree are noted in the output file .iqtree. The command line executed is noted at the top of the .log file. When a specific model is noted in the command line, this model selection was based on a previous IQTREE run of the same data using the ModelFinder tool (-m MFP).
A note about IQ-TREE log files: For alignments of protein-coding genes, we indicated a sequence type of codons (-st CODON1) within IQ-TREE and the optimal codon model determined by ModelFinder. For these codon datasets, the log file notes that sequences “failed” an initial chi-square test for homogeneity of character composition. This was expected, as this test is an “exploratory tool” as a first step, before accounting for models of sequence evolution (as noted by IQ-TREE developers: http://www.iqtree.org/doc/Frequently-Asked-Questions#what-is-the-purpose-of-composition-test). This exploratory test treated the data as simple DNA sequences, without recognizing triplet codon structure and varying base compositions across codon positions. Thus, we were unconcerned by (and fully expected) sequences to fail this exploratory test. Of course, the subsequent selection of the optimal codon model in ModelFinder and ML phylogenetic analysis does account for the codon structure.
5.1_16S+23SrDNA_MLsearch: Analysis of concatenated 16S and 23S rDNA sequences of Blochmannia and closely related endosymbiont relatives. For certain taxa, only 16S rDNA sequences were available (such that 23S rDNA is represented by gaps).
5.2_50taxa_7ProtCodingGenes_MLsearch
Analysis of seven concatenated protein-coding genes for select Blochmannia, closely related endosymbiont relatives, and Pseudomonas aeruginosa as an outgroup. This 50-taxon dataset includes Blochmannia of Camponotus rectithorax_D2057, representing the new Lathidris genus within Camponotini.
5.3_49taxa_7ProtCodingGenes_MLsearch
Analysis of seven concatenated protein-coding genes for select Blochmannia, closely related endosymbiont relatives, and Pseudomonas aeruginosa as an outgroup. This 49-taxon dataset is identical to the 50-taxon dataset above, except it excludes Blochmannia of Camponotus rectithorax.
5.4_97Bloch_7ProtCodingGenes_MLsearch
Analysis of seven concatenated protein-coding genes for 97 Blochmannia newly sampled in the current study.
5.5_95Bloch_7ProtCodingGenes_MLsearch
Analysis of seven concatenated protein-coding genes for 95 Blochmannia newly sampled in the current study. This 95-taxon dataset is identical to the 97-taxon dataset above, except it excludes Blochmannia of Camponotus rectithorax_D2057 and Blochmannia of Camponotus_melinus_D2826, representing the new Lathidris genus within Camponotini.
File: 6_Blochmannia_topology_tests.zip
Description:
Four subfolders contain files for various topology tests performed using IQ-TREE (v. 2.2.2.6 or 2.2.2.7), labeled by the corresponding section of Supplementary Table S17 in the article, where the statistics are presented. Each folder contains a .phy input file (the alignments that are also posted within “4.2_Bloch_trimmedAlns”) and the resulting output files. The resulting test statistics are noted in the output file .iqtree. The command line executed is noted at the top of the .log file. When a specific model is noted in the command line, this model selection was based on a previous IQ-TREE run of the same data using the ModelFinder tool (-m MFP).
6.1_50Taxa_AUtest_TableS17a: Contains the input (.phy) file and test results comparing the log-likelihood of the 50taxa_7ProtCodingGenes dataset across two alternative tree topologies: 1) the ML tree estimated for this dataset, 2) the ML tree estimated for this dataset under the topological constraint that Colobopsis is basal within the Blochmannia clade. As noted above, this 50-taxon dataset includes Blochmannia of Camponotus rectithorax_D2057, representing the new Lathidris genus within Camponotini. Results for the Approximately Unbiased (AU) test and related topology tests appear in Supplementary Table S17a.
6.2_49Taxa_AUtest_TableS17b: Contains the input (.phy) file and test results comparing the log-likelihood of the 49taxa_7ProtCodingGenes dataset across two alternative tree topologies: 1) the ML tree estimated for this dataset, 2) the ML tree estimated for this dataset under the topological constraint that Colobopsis is basal within the Blochmannia clade. As noted above, this 49-taxon dataset excludes Blochmannia of Camponotus rectithorax_D2057. Results for the AU test and related topology tests appear in Supplementary Table S17b.
6.3_97Bloch_AUtest_TableS17c: Contains the input (.phy) file and test results comparing the log-likelihood of the 97Bloch_7ProtCodingGenes dataset across two alternative tree topologies: 1) the ML tree estimated for this dataset, 2) the ML tree of the ant host. This 97-taxon ant host tree was generated by pruning taxa from the host tree presented in Figure 2 of the text. This 97-taxon dataset includes Blochmannia of Camponotus rectithorax_D2057 and Blochmannia of Camponotus_melinus_D2826, representing the new Lathidris genus within Camponotini. Results for the AU test and related topology tests appear in Supplementary Table S17c.
6.4_95Bloch_AUtest_TableS17d: Contains the input (.phy) file and test results comparing the log-likelihood of the 95Bloch_7ProtCodingGenes dataset across two alternative tree topologies: 1) the ML tree estimated for this dataset, 2) the ML tree of the ant host. This 95-taxon ant host tree was generated by pruning taxa from the host tree presented in Figure 2 of the text. The dataset is identical to the 97-taxon dataset above, except it excludes Blochmannia of Camponotus rectithorax_D2057 and Blochmannia of Camponotus_melinus_D2826. Results for the AU test and related topology tests appear in Supplementary Table S17d.
Code/software
Bioinformatic processing:
- Phyluce v1.7.1 or higher
- Unicycler/0.5.0
- blastx in NCBI-BLAST 2.12.0
Phylogenetic analyses:
- IQ-TREE v2.1.3 or higher
- AMAS v1.0
- ASTRAL-III v5.7.8
- MCMCtree in PAML v4.9
- BioGeoBEARS in R
Visualization:
- Tracer
- FigTree
- R/Rstudio
Access information
Other publicly accessible locations of the data:
- Raw sequence data associated with this study are available under NCBI BioProject PRJNA1136826.
Phylogenomics
Taxon sampling and UCE data generation
Our taxon set comprises 220 species of Camponotini, representing all genera and most subgenera, and five outgroup species belonging to related genera in the subfamily Formicinae (Table S1). Camponotine ants were sampled roughly in proportion to the number of described species in each genus. Smaller taxon sets were employed for analyses comparing Blochmannia and ant phylogenies.
DNA was extracted from single ants, either adults or pupae, using the DNeasy Blood and Tissue Kit (Qiagen, Valencia, CA) and quantified with a Qubit fluorometer (HS Assay Kit, Life Technologies Inc., Carlsbad, CA). We sheared 5–50 ng input DNA to a target size of ~600 bp using either a Diagenode BioRuptor (Diagenode Inc., Denville, NJ) or QSonica Q800R3-110 (Qsonica Inc., Newtown, CT). This product served as input for the generation of ultraconserved element (UCE) sequence data, following a protocol described by Branstetter et al. (2017a), and involving the following steps: dual-indexed library preparation, library pooling, UCE-targeted enrichment, qPCR quantification of DNA concentrations, final pooling, and multiplex sequencing. UCE enrichment was carried out using a set of custom-designed probes, Hymenoptera 2.5Kv2A (MYcroarray, Inc., now ArborBiosciences, Ann Arbor, MI), targeting 2524 UCE loci (Branstetter et al., 2017a). Sequencing was performed on an Illumina HiSeq 2500 at the University of Utah Huntsman Cancer Center. For 142 samples, library preparation, target enrichment, and sequencing were performed by RAPiD Genomics (Gainesville, FL) with similar protocols. Most Blochmannia assemblies and analyses were based on whole genome sequencing (WGS) runs using the platforms above. For select specimens requiring deeper sequencing, WGS was performed using DNA-seq library construction and 300 bp PE sequencing with an Illumina MiSeq v3 at the Duke University Sequencing and Genome Technologies Center.
Processing of UCE data
Demultiplexed FASTQ data were cleaned and trimmed with Illumiprocessor, a wrapper program using Trimmomatic (Bolger et al., 2014), in PHYLUCE v. 1.7.1 (Faircloth, 2016). Most cleaned reads were assembled with SPAdes v. 3.12.0 (Bankevich et al., 2012); a minority of older samples, identified by a “D” extraction code less than D1700, were assembled with Trinity v2013-02-25 (Grabherr et al., 2011). Sequence statistics are given in Table S2. Matching of UCE loci to probes, alignment with Mafft, and internal alignment trimming with Gblocks (Castresana, 2000) was carried out within PHYLUCE as described in Blaimer et al. (2015; 2016). We then filtered the aligned, trimmed UCE loci based on the representation of UCE loci across taxa. We chose two subsets for further analyses: a dataset of 1440 loci in which each locus was represented in >90% of the taxa, and a dataset of 2076 loci in which each locus was represented in >80% of the taxa. These two subsets were then concatenated and further trimmed for misaligned sequences using the program Spruceup (Borowiec, 2019). We set the cutoff initially to 0.95, 0.97, and 0.98 and kept all other parameters at the default values. For the resulting spruceup-trimmed (90%-0.95-spruceup, 90%-0.97-spruceup, 90%-0.98-spruceup, 80%-0.95-spruceup, 80%-0.97-spruceup, 80%-0.98-spruceup hereafter), as well as the untrimmed 90% and 80% alignments, we calculated alignment statistics, such as amount of missing data, number of parsimony-informative sites (PIC), and base composition, using the program AMAS v1.0 (Borowiec, 2016) (Table S3).
Extraction of Blochmannia sequence data
Most Blochmannia analyses were based on de novo assemblies of WGS data. In these cases, demultiplexed, paired-end FASTQ data were cleaned and trimmed using Illumiprocessor, which invokes Trimmomatic (Bolger et al., 2014). De novo assemblies were performed using Unicycler/0.5.0, an assembly pipeline for bacterial genomes that functions as a SPAdes-optimizer when assembling Illumina data (Wick et al., 2017). For each assembly, contigs >2500 bp were blasted against all protein-coding genes of published Blochmannia genome sequences, using blastx in NCBI-BLAST 2.12.0. Gene regions were extracted from contig sequences based on the coordinates of highly significant blast matches. Blochmannia protein-coding genes were consistently the 'best hits,' typically with e-values of 0.0. Our main analysis is based on seven concatenated protein-coding genes (dnaE, gidA, groEL, gyrA, gyrB, rpoB, and rpoC), with an alignment totaling 20,019 bp positions. These genes were selected based on their distribution across the Blochmannia genome and their central role in bacterial functions. While assemblies varied in their completeness (ranging from numerous, shorter Blochmannia contigs to complete or near-complete genomes), this analysis is restricted to the 97 samples that are included in the host phylogeny and for which we could confidently detect Blochmannia genes. For select, deeper lineages with complete or near-complete Blochmannia genomes, we also extracted 16SrDNA and 23S rDNA genes. Alignments of protein-coding genes were performed in MUSCLE (Edgar, 2004) as translated amino acid sequences and, post-alignment, back-translated to nucleotide codons. Alignments of 16SrDNA and 23S rDNA were performed using SINA rRNA aligner hosted by the SILVA project (Pruesse et al., 2012). Poorly aligned regions were trimmed using Trimal version 1.5.10 (Capella-Gutiérrez et al., 2009). All datasets were examined by eye to remove any remaining ambiguous alignment regions.
To test the monophyly of Blochmannia and to evaluate root positions within Blochmannia, certain analyses included outgroups selected from published sequences of close relatives based on prior phylogenetic studies (Wernegreen et al., 2009; Jackson et al., 2022). These outgroups include closely related endosymbionts of other (non-camponotine) ant groups, as well as endosymbionts of mealybugs, psyllids, and various other insects as described in the legends of the supplementary figures.
Ant phylogenomic analyses
Phylogenetic analyses were performed both with and without employing data partitioning on four concatenated, spruceup-trimmed matrices (90%-0.97-spruceup, 90%-0.98-spruceup, 80%-0.97-spruceup, 80%-0.98-spruceup). We did not proceed with the 0.95 cutoff as this amount of trimming proved too stringent for these datasets. We partitioned our datasets using the Sliding-Window Site Characteristics (SWSC-EN) algorithm (Tagliacollo & Lanfear, 2018), which models patterns of rate variation within and among UCE loci by dividing loci into core and flanking regions. The r cluster algorithm (Lanfear et al., 2014) in PartitionFinder2 (Lanfear et al., 2017) was then used to combine subsets with similar properties. We analyzed these concatenated data matrices with 1142 (90%-0.98-spruceup), 1081 (90%-0.97-spruceup), 1587 (80%-0.98-spruceup), and 1585 (80%-0.97-spruceup) partitions, as well as their unpartitioned counterparts with Maximum Likelihood (ML) best-tree and 1000 ultrafast bootstrap searches in IQ-TREE v2.1.3 (Minh et al., 2020; Hoang et al., 2018). We employed ModelFinder in IQ-TREE (Kalyaanamoorthy et al., 2017) for unpartitioned matrices while implementing a GTR+G model for data subsets in partitioned matrices. Analyses specified the most distantly related taxon, Formica neogagates, as an outgroup. To perform coalescent analyses, we also estimated the best ML gene tree for each of the 2076 and 1440 UCE loci with >80% and >90% of taxa present, respectively, using IQ-TREE. These two sets of ML best trees were then used to perform coalescent species-tree analyses in ASTRAL-III v5.7.8 (Zhang et al., 2018).
Divergence dating
Until recently the tribe Camponotini contained two monotypic fossil genera, one fossil species attributed to Polyrhachis F. Smith, and about 30 fossil species assigned to Camponotus. The descriptions and illustrations of most of these fossils, however, inspire little confidence in their placement in the tribe Camponotini, since key features of the mandibles, antennal insertions, frontoclypeal complex, and metapleural gland (Bolton, 2003; Ward et al., 2016) are not discernable. Even for extant species, distinctions between Camponotus and some other genera in the same tribe are subtle and difficult to capture (Ward et al., 2016; Ward & Boudinot, 2021). For fossils, the uncertainty is much greater. Accordingly, we concur with Boudinot et al. (2024) that most of these fossils should be treated as incertae sedis in Formicinae. We are left with two described fossils that can be placed in the Camponotini with high confidence. (1) Eocamponotus mengei (Mayr), the only camponotine-like ant known from Baltic amber, shows a good degree of preservation of morphological features and was recovered as crown Camponotini by Boudinot et al. (2022) with strong support. It was not recovered in crown Camponotus, however. (2) Polyrhachis annosa Wappler et al., an impression fossil from late Miocene deposits of Greece (Wappler et al., 2009), can be reasonably assigned to its genus. Hence, for the purpose of divergence dating, we are limited to two fossil calibrations in the ingroup (Table S4): 1) a calibration on crown-group Camponotini with a minimum age of 36 Ma (Aleksandrova & Zaporozhets, 2008), and 2) a calibration on crown-group Polyrhachis with a minimum age of 5.3 Ma.
We performed divergence dating using approximate likelihood in MCMCTREE and codeml as included in PAMLv4.9 (Yang, 2007), using both the 80%-0.98-spruceup and 90%-0.98-spruceup matrices and the best maximum likelihood tree resulting from SWSC-EN partitioned analysis of these matrices. We pruned these matrices and trees to exclude all outgroups except the most closely related taxon, Myrmoteras iriodum Moffett, to prevent possible artifacts resulting from an imbalance in taxon sampling and rate heterogeneity between ingroup and outgroup (Duchêne et al., 2015; Spasojevic et al., 2021), and to reduce computational cost. In addition to the two fossil calibrations outlined above, we further applied a secondary calibration on the root node based on divergence ages between Camponotini and Myrmoteras estimated across Formicinae by Blaimer et al. (2015). We hereby applied a broad age bracket of 61.8–95.6 Ma representing the 95% HPD intervals estimated across three analyses in that study. We used the default settings for the calibration priors, a heavy-tailed density based on a truncated Cauchy distribution with an offset p=0.1, a scale parameter c=1, and a left tail probability of a=0.025. By default, all calibrations in MCMCTREE are implemented as soft bounds. We set up four independent runs using the independent-rates model as a clock model, a GTR model for substitutions, and otherwise default parameters. Assessing MCMC convergence and effective sample sizes using Tracer v1.7.2 (Rambaut et al., 2018), we achieved convergence (i.e., most ESS >200) using nsamples=500,000 with samplefreq=100 and burnin=100,000, and summarized across all four runs for each dataset (2,000,000 samples excluding burnin). To evaluate our calibrations and the informativeness of our data, we also performed analyses without sequence data using only the prior.
Biogeographic analyses
The biogeographic history of Camponotini was inferred using BioGeoBEARS (Matzke, 2013), following the tutorials available on the BioGeoBEARS PhyloWiki (http://phylo.wikidot.com/biogeobears). We constructed a distribution matrix by scoring all taxa for six designated biogeographic areas: Neotropical, Nearctic, Palearctic, Afrotropical (including Malagasy), Indomalayan, and Australasian (after Cox 2001) (Table S5). For the separation of Indomalaya and Australasia, we referred to the Wallace line, except that Sulawesi was included in Indomalaya in our analysis which is more consistent with general ant distribution patterns. We first used the chronograms resulting from MCMCTREE analyses of the 80%-0.98-spruceup and 90%-0.98-spruceup datasets, including all ingroup taxa and the outgroup Myrmoteras iriodium, and the distribution matrix for a set of unconstrained analyses without dispersal constraints between biogeographic areas. A second set of constrained analyses was then performed, implementing dispersal constraints defined based on the level of connectivity between these biogeographic areas: 1.0 for adjacent areas connected by a landmass, 0.5 for adjacent areas separated by large water gaps, i.e. Neotropical/Afrotropical, Neotropical/Australian and Afrotropical/Australian, and 0.0001 for non-adjacent areas (Table S6). Finally, we ran both these sets of unconstrained and constrained analyses again using modified input chronograms, in which we pruned the outgroup taxon Myrmoteras iriodum. For each combination (total of eight), we tested the three main models implemented in BioGeoBEARS: the dispersal and extinction cladogenesis (DEC) model (Ree & Smith, 2008), the DIVALIKE model, a likelihood version of the Dispersal-Vicariance model (Ronquist, 1997), and the BAYAREA-LIKE model, a likelihood version of the Bayesian Analysis of Biogeography model (Landis et al., 2013). We did not incorporate the jump dispersal parameter “j” into our models due to doubts about its statistical performance (Ree & Sanmartín, 2018). We defined max_range_size = 2, as no taxon in our analyses occupies more than two areas. We summarized log-likelihoods as well as AIC and AICc scores of all models and plotted results for the model with the highest AICc score in each set of analyses.
Blochmannia phylogenetic analyses
All Blochmannia phylogenies were estimated using IQ-TREE 2.2.2.7 for Linux or IQ-TREE 2.2.2.6 for MacOS (Minh et al., 2020). For each dataset, we used the ModelFinder option (Kalyaanamoorthy et al., 2017) to determine the best-fit substitution model for the sequence format analyzed (DNA or codon sequences). Under the best-fit model, we estimated the maximum likelihood (ML) best tree and bootstrap consensus tree based on an ultrafast bootstrap approximation with 1000 replicates (Hoang et al., 2018). To analyze possible root positions or congruence with host relationships, we also performed tree topology tests within IQ-TREE. These tests compute the log-likelihoods of a dataset across the ML best tree versus the dataset when constrained to one or more alternative topologies.