Beetles, barcodes, and big data: a deep dive into the phylogeny of Harpalinae (Carabidae)
Data files
Mar 04, 2026 version files 30.10 MB
-
archive_scripts_carabidae_phylogeny_260302.zip
29.68 KB
-
File_R1_raw_sequences_2024.zip
2.92 MB
-
File_R2_raw_sequences_2025.zip
6.89 MB
-
File_S1_methods.docx
38.71 KB
-
File_S4_sequences_models.zip
14.80 MB
-
File_S5_trees.zip
203.23 KB
-
README.md
4.70 KB
-
Table_S1_taxonomy_mitogenome.csv
273.07 KB
-
Table_S2_taxonomy_7000.csv
4.94 MB
Abstract
The ground beetles (Carabidae) are a highly species-rich lineage of the Coleoptera, with over half of their diversity concentrated in the ~20,000 described species in the subfamily Harpalinae sensu lato. As a presumed recent radiation lacking deeply distinct morphological divisions, their taxonomic classification has been challenging, while molecular studies remain limited in the number of genes and taxa sampled. Using ~450 mitochondrial genome sequences from across the Carabidae and the major biogeographic realms we investigate the tribal relationships in Harpalinae. Our phylogenetic analysis supports a revised system that broadly divides the harpalines into two major reciprocally monophyletic lineages, corresponding to a narrowly defined Harpalinae sensu novo and a distinct Lebiinae. Within Harpalinae, we recover well-supported subclades that mostly represent existing tribes (e.g., Harpalini, Pterostichini, Licinini, Platynini), while clades in Lebiinae required the recognition of three new or redefined clades: Lebiini, Agrini, and Odacanthini. We also establish the polyphyletic status of the ‘Truncatipennes’ defined by truncated elytra and traditionally encompassing most ‘lebiomorphs’, which are split into the Lebiinae and at least two additional lineages, corresponding to the Dryptinae and Brachininae (bombardier beetles) branching below the Harpalinae + Lebiinae clade. The mitogenome data were extended to include ~7000 species of Carabidae by adding all available cytochrome c oxidase subunit I (COI) barcodes and other legacy sequences. The resulting phylogeny broadly concurs with the tribal boundaries defined by mitogenomes and provides a curated barcode reference library for species identification. The unprecedented scale of mitogenome sequencing, combined with dense taxon sampling of barcodes, resolves a particularly complex portion of the beetle tree-of-life.
Authors: Beulah Garner, Aileen Scott and Alfried Vogler
Journal: Systematic Entomology
Dataset DOI: 10.5061/dryad.tht76hfdk
This dataset contains multiple-sequence alignments, partitions schemes, phylogenies and metadata from the above paper.
archive_scripts_carabidae_phylogeny_260302.zip
An archive of all original scripts used for this paper. See README file within for more detail. Also available at https://github.com/beetlephylo/carabidae_phylogeny
File_R1_raw_sequences_2024.zip
The unprocessed sequence files resulting from the BLAST and BOLD searches carried out in 2024. These sequences were processed and used for the mitogenome phylogeny as described in File_S1_methods.docx
File_R2_raw_sequences_2025.zip
The unprocessed seqeunce files resulting from the GenBank and BOLD searches carried out in 2025. These sequences were processed and used for the species level phylogeny as described in File_S1_methods.docx
File_S1_methods.docx
Description of the materials and methods including software and commands.
File_S4_sequences_models.zip
Novel sequences, supermatrices and partition schemes used for the mitogenome and species-level phylogenies. A description of each file is given below.
Novel Sequences
0_novel_mitogenomes.gb
GenBank format file of 287 mitogenomes sequenced by the Vogler Lab.
Mitogenome Phylogeny Dataset
1_mitogenome_ry_supermatrix.fasta
1_mitogenome_ry_partition_scheme.txt
2_mitogenome_nt_supermatix.fasta
2_mitogenome_nt_partition_scheme.txt
3_mitogenome_aa_supermatix.fasta
3_mitogenome_aa_partition_scheme.txt
Binary RY-recoded (ry), nucleotide (nt) and amino acid (aa) supermatrices and partition schemes for the 451 sequences in the mitogenome dataset.
Species-level Phylogeny Dataset
4_nt_ptp_supermatrix.fasta
4_nt_ptp_partition_schemes.txt
5_ry_ptp_supermatrix.fasta
5_ry_ptp_partition_schemes.txt
Nucleotide (nt) and RY-recoded (ry) supermatrices and partition files for the 7196 sequences in the species-level phylogeny.
Profile Sequences
6_aa_profiles
6_nt_profiles
Amino acid (aa) and nucleotide (nt) profile alignments for each gene
File_S5_trees.zip
This file contains the six mitogenome phylogenies (RY, NT and AA constrained and unconstrained), the species-level phylogeny, and the nuclear backbone constrained that was used for the constrained mitogenome phylogenies. A description of each file is given below.
0_vasilikopoulos_backbone_constraint.nwk
Backbone constraint based on the Adephaga phylogeny from Vasilikopoulos et al. (2021), used for three of the mitogenome trees (one for each data type).
1_mitogenome.ry_vas.tbe.nwk
Mitogenome phylogeny from binary RY-coded supermatrix with backbone constraint
2_mitogenome.ry_free.tbe.nwk
Unconstrained mitogenome.nwk from binary RY-coded supermatrix
3_mitogenome.nt_vas.tbe.nwk
Mitogenome phylogeny from nucleotide supermatrix with backbone constraint
4_mitogenome.nt_free.tbe.nwk
Unconstrained mitogenome phylogeny from nucleotide supermatrix
5_mitogenome.aa_vas.tbe.nwk
Mitogenome phylogeny from amino acid supermatrix with backbone constraint
6_mitogenome.aa_free.tbe.nwk
Unconstrained mitogenome phylogeny from amino acid supermatrix
7_species_level_tree.nwk
Species-level phylogeny from binary RY-coded supermatrix with backbone constraint. Backbone constraint was based on 1_mitogenome.ry_vas.tbe.nwk.
Metadata
Table_S1_taxonomy_mitogenome.csv
Metadata for all sequence records included in the mitogenome phylogeny including source, original and standardised taxonomy, genes, and genes in tree after concatenation (sequences from different sources were concatenated based on Linnean binomials). Flickr image IDs are available for novel sequences that have been imaged, and clade numbers are included for taxa in Harpalinae s.l., as assigned in the associated manuscript (Figure 3).
Table_S2_taxonomy_7000.csv
Metadata for all sequences included in the species-level phylogeny including source, original and standardised taxonomy, genes, genes in tree after concatenation (sequences from different sources were concatenated based on Linnean binomials), recovered clade, and location data where available. Different IDs were used for the mitogenome tree due different methods of processing, so sequences that were present in the mitogenome phylogeny have these IDs listed to allow cross-referencing. These IDs also indicate which sequences were constrained.
