Phylogenomics of the major lineages of Bembidion and related ground beetles (Coleoptera: Carabidae: Bembidiini)
Data files
Aug 26, 2024 version files 597.80 MB
Abstract
Bembidion Latreille is a genus of small ground beetles containing about 1,380 species. To test previous phylogenetic hypotheses about deeper lineages of Bembidion and near relatives, we examine over 1,800 nuclear protein-coding loci from 33 species representing the main lineages of Bembidion, 10 species of other bembidiine genera, and seven outgroups. We find that Bembidion exclusive of subgenus Phyla Motschulsky is monophyletic, and we reclassify Phyla as a separate genus. Within Bembidion we find two dominant clades, the Bembidion superseries (containing about 490 species in the subgenera Eupetedromus Netolitzky and Lindrochthus Maddison, the Philochthus Stephens complex, and the Bembidion series), and the Ocydromus Clairville superseries (containing almost all other Bembidion representing about 840 species). The only known lineages within Bembidion outside of these superseries are subgenus Hoquedela Müller-Motzfeld and the Desarmatocillenus Netolitzky complex, which combined contain less than 30 species. Most clades are insensitive to variations in analyses, and hold up under different sets of taxa and loci, analyses at the nucleotide or amino acid levels, and different analytical methods (maximum likelihood, including PMSF analyses, Bayesian analyses, invariant-based methods, and those that consider incomplete lineage sorting). Despite the clarity achieved in most aspects of the phylogeny, there are several unresolved regions, notably the relationships of Desarmatocillenus, Hoquedela, and Phyla to other bembidiines. A divergence dating analysis suggests that crown Bembidion is about 48 million years old (95% confidence intervals 40–58 Ma), and that the two large superseries are about 38 million years old (95% confidence intervals about 29–47 Ma).
README: Phylogenomics of the major lineages of Bembidion and related ground beetles (Coleoptera: Carabidae: Bembidiini)
https://doi.org/10.5061/dryad.2jm63xsxz
These files contain alignments of between 1,390 and 1,728 nuclear protein-coding genes derived from transcriptomes and genomes for between 42 and 50 taxa as well as inferred phylogenetic trees for most of data sets analyzed.
Description of the data and file structure
All files are Mesquite files in the NEXUS file format.
Data set names are of the following format:
- Eight Mesquite files containing separate matrices for every locus for eight of the data sets (and from which other data sets can be derived):
- 42Taxa.N123_All_Loci.nex
- 46Taxa.N123_All_Loci.nex
- 46Taxa.Occ66.AA_All_Loci.nex
- 46Taxa.Occ66.N123_All_Loci.nex
- 50Taxa.AA_All_Loci.nex
- 50Taxa.N123_All_Loci.nex
- 50Taxa.Occ66.AA_All_Loci.nex
- 50Taxa.Occ66.N123_All_Loci.nex
- Loci.with.Paralogy.Problems.nex: A Mesquite file containing alignments and gene trees for the five loci that were removed after visual inspection of the gene trees and alignments suggested they contained paralogs.
- BembidiiniPhylogenomics.Trees.nex: A Mesquite file containing all inferred trees for concatenated data and the ASTRAL analyses, including bootstrap and ultrabootstrap trees.
- Maximum.Likelihood.Gene.Trees.nex: A Mesquite file containing 10 tree blocks. Each tree block contains between 1,390 to 1,728 trees, corresponding to the maximum likelihood tree for each locus for the 50Taxa.N123, 50Taxa.Occ66.AA, 50Taxa.Occ66.N12, 50Taxa.Occ66.N123, 46Taxa.Occ66.AA, 46Taxa.Occ66.N12, 46Taxa.Occ66.N123, 42Taxa.Occ66.AA, 42Taxa.Occ66.N12, and 42Taxa.Occ66.N123 data sets.
Methods
Transcriptomes of 39 species and low-coverage genomes for six species were sequenced using Illumina sequencers. Combined with previously published transcriptomes and genomes, 50 species were analyzed. After read trimming (Trimmomatic) and assembly (Trinity) of transcriptomes, a suite of 5,197 purported single-copy orthologs were found using Agalma. These target loci were then sought using Exonerate in the low-coverage genome assemblies (trimming and assembly done in CLC Genomics Workbench). We removed low information-content loci using MARE, removed anomalous sequences using TreeShrink, discarded loci with overly heterogenous base compositions using BaCoCa, and trimmed anomalous sequence ends. This filtered the set down to 1,875 loci. We then inferred maximum likelihood trees of each locus, and visually inspected each gene tree and the corresponding alignment for signs of paralogy, and discarded five loci that clearly had at least two paralogs. We then formed three sets of matrices (with all 50 taxa, with the three longest-branch outgroup taxa plus one ingroup with limited data removed (yielding 46 taxa), and with four remaining non-Bembidiini removed from the 46 taxa set (yielding 42 taxa)). We also analyzed matrices of all loci and only those loci present in at least 66% of the taxa, and analyzed matrices of all nucleotides, only first and second positions, and translated into amino acids. Trees were inferred of concatenated data using maximum likelihood, PMSF likelihood, SVD Quartets, and Bayesian methods, and maximum likelihood trees of all loci via ASTRAL.