Ultraconserved element data from Integrating morphology with phylogenomics to describe four Siculo-Maltese endemic Temnothorax species (Hymenoptera, Formicidae)
Abstract
Temnothorax (Myrmicinae, Crematogastrini) is one of the most diverse Holarctic ant genera, and new taxonomic advancements are still frequent worldwide. The Mediterranean region, a global biodiversity hotspot characterized by a complex geographic history, is home to a substantial portion of its described diversity. Sicily is the region’s largest island and, as ongoing investigations are revealing, it is inhabited by a long-overlooked but highly diverse ant fauna that combines multiple biogeographic influences. We combined qualitative and quantitative morphology of multiple castes with phylogenomic analysis based on ultra-conserved elements (UCEs) to describe four Temnothorax species endemic to Sicily and the neighboring Maltese Islands (Sicilian Channel). Three of these species, T. marae sp. nov., T. poldii sp. nov. and T. vivianoi sp. nov., are new to science, while a redescription clarifies the identity of T. lagrecai (Baroni Urbani, 1964). These descriptions provide an opportunity to discuss the current difficulties of delimiting monophyletic species-groups of Temnothorax based on morphological characters. The intra-insular endemicity patterns we revealed highlight the importance of Mediterranean paleogeography to contemporary ant diversity and distribution.
README: Ultraconserved Element data from Integrating morphology with phylogenomics to describe four Siculo-Maltese endemic Temnothorax species (Hymenoptera, Formicidae)
https://doi.org/10.25338/B8K63Z
Description of the data and file structure
The file contains three directories:
- 'input_alignment' contains the alignment used for both analyses (partition clustering and tree inference).
- 'IQTREE_partition_clustering' contains the input partition file (from the SWSC-EN analysis) and the output of the clustering analysis.
- 'IQTREE_tree_inference' contains the input partition file from the clustering analysis and the output from tree inference in IQTREE.
Methods
DNA was extracted nondestructively from adult worker ants using a DNeasy Blood & Tissue Kit (Qiagen, Inc.) following the manufacturer’s protocols. Up to 50 ng of DNA was used as input, sheared to a target fragment size of 400–600 bp into a genomic DNA library preparation protocol for targeted enrichment of ultraconserved elements (UCEs) following Faircloth et al. (2015) as modified by Branstetter et al. (2017) using a unique combination of iTru barcoding adapters for each sample (Glenn et al. 2019; see Supplementary Table S2 for a list of the adapters used). Enrichments were performed on pooled libraries using the custom version of the Hym 2.5Kv2A ant-specific RNA probes (Branstetter et al. 2017; ArborBiosciences, Ann Arbor, MI), which target 2524 UCE loci in the Formicidae. The library enrichment procedures for the probe kit were followed, except that the RNA probe concentration was reduced to 0.1X (note that this step is only necessary for the custom kit; the currently available catalog kit is already diluted to 0.1X concentration), custom adapter blockers were used instead of the standard blockers, and enriched DNA was left bound to the streptavidin beads during PCR, as described in Faircloth et al. (2015). Following post-enrichment PCR, the resulting pools were purified using SpeedBead magnetic carboxylate beads (Rohland & Reich 2012; Sigma-Aldrich) and adjusted their volume to 22 μL.
Enrichment success was verified and measured size adjusted DNA concentrations of each pool with qPCR using a SYBR® FAST qPCR kit (Kapa Biosystems) and a Bio-Rad CFX96 RT-PCR thermal cycler (Bio-Rad Laboratories) and combined all pools into an equimolar final pool. The final pool was sequenced as a single lane at the High Throughput Genomics Facility at the University of Utah on an Illumina HiSeq 2500 (125 cycle paired end sequencing v4).
Following sequencing, raw reads were trimmed of adapter contamination, Illumina sequencing artefacts, and low-quality bases using the program illumiprocessor, which is included in PHYLUCE v1.7.1 (Faircloth, 2016). Cleaned reads were assembled denovo with PHYLUCE using SPAdes v3.12.0 (Bankevich et al. 2012). All newly generated raw sequence reads have been submitted to the National Center for Biotechnology Information (NCBI) Sequence Reads Archive (BioProject PRJNA770978).
The standard PHYLUCE protocol was followed for processing UCEs in preparation for phylogenomic analysis, aligning the monolithic unaligned fasta file with the phyluce_align_seqcap_align command, using MAFFT (Katoh & Standley 2013) as the aligner (--aligner mafft) and opting not to edge-trim the alignment (--no-trim). The resulting alignments were trimmed with the phyluce_align_get_gblocks_trimmed_alignments_from_untrimmed command in PHYLUCE, which uses GBlocks v0.91b (Castresana 2000), using the following settings: b1 0.5, b2 0.5, b3 12, b4 7. After removing UCE locus information from taxon labels using the command phyluce_align_remove_locus_name_from_nexus_lines, the alignment statistics were examined using the command phyluce_align_get_align_summary_data, and a dataset was generated in which each locus contains a minimum of 85% of all taxa using the command phyluce_align_get_only_loci_with_min_taxa.
Because the assumption that the evolutionary rates of sequence data are homogenous is often violated in empirical data (Buckley et al., 2001), we partitioned our UCE loci into sets of similarly evolving sites. To achieve this, we used the command phyluce_align_format_nexus_files_for_raxml to concatenate loci into a single alignment, and to generate a partition file for input into the SWSC-EN method (Tagliacollo & Lanfear 2018). The resulting datablocks were used as input for partitioning in IQTREE v2.1.2 (Nguyen et al., 2015), using the command -m TESTNEWMERGEONLY, the substitution model was set to ‘general time reversible’ (-mset GTR), and set the rate heterogeneity models were set to -mrate E, I, G, which includes everything except the combination of gamma and proportion of invariable sites (+I+G), which has been demonstrated to result in anomalies in likelihood estimation (Sullivan & Swofford, 2001; Yang, 2006). The search algorithm was set to -rclusterf 10. The resulting partitioned dataset was used as input for maximum likelihood tree inference in IQ-TREE, using 1000 ultrafast bootstrap replicates (-bb 1000).