Phylogenomics-based click-beetle classification tackles multiple origins of phenotypic modifications
Data files
Oct 06, 2025 version files 1.23 GB
-
AA_Q_ASTRAL_SG_H_Fig4.astral
15.08 KB
-
AA_SB.fasta
42.20 MB
-
AA_SB.part
31.68 KB
-
AA_SG.fasta
264 MB
-
AA_SG.part
134.36 KB
-
gc-violin-heatmap.py
4.79 KB
-
NT_IQ_ST_GOOD_3345_partitioned_Fig_3.treefile
13.01 KB
-
NT_SB.fasta
126.59 MB
-
NT_SB.part
36.03 KB
-
NT_SG.fasta
791.97 MB
-
NT_SG.part
153.35 KB
-
README.md
2.29 KB
Abstract
Click beetles (Elateridae) have garnered scientific interest due to their diversity, bioluminescence, and ontogenetic modifications. However, their classification is still poorly resolved. Here, we explore the internal relationships by analyzing 46 tribes and ~4,200 orthologs. We classify elaterids into 17 subfamilies and 52 tribes. We propose new ranks for Protelaterinae stat. nov., Thylacosternini stat. nov. (Lissominae), Semiotina stat. nov. (Dendrometrini), and Aplastini stat. nov. (Elaterinae). We resurrect Drapetini stat. nov. (Lissominae), Pachyderini stat. nov. (Agrypninae), and Corymbitina stat. nov. (Dendrometrini). and synonymize Pomachiliini syn. nov., and Synaptini syn. nov. to Agriotini and Quasimusini syn. nov. to Negastriini. Hypnoidinae retains its subfamily rank, regardless of its documented affinities to Dendrometrinae. Our research provides a phylogenomic framework for understanding the evolution of morphological disparity. The non-clicking forms have evolved repeatedly in separate groups since the early evolution of click beetles. Cebrionini, Pleonomini, and Plastocerini represent soil-dwelling click beetles living in seasonally arid areas. They exhibit high sexual dimorphism, but either both sexes are winged (Pleonomini, Plastocerini), or females have slightly shortened elytra (some Cebrionini). Omalisinae and Drilini (Agrypninae) have neotenic females with a modified thoracic morphology, vestigial to absent elytra, and always absent wings. Aplastine females resemble Omalisinae but share biological traits with Cebrionini. We propose that genomic data suggest different relationships among clicking and non-clicking elaterids than earlier morphology-based hypotheses, which suggested the placement of some modified click beetles in Dascilloidea and Cantharoidea, respectively, within a cantharoid clade of Elateroidea. Future research should investigate the molecular background of ontogenetic modifications, concentrating on potential differences between slightly modified groups from seasonally arid regions (e.g., Cebrio spp.) and predators with highly modified neotenic females (e.g., Omalisus spp. and Drilus spp.).
Dataset DOI: 10.5061/dryad.xksn02vts
Description of the data and file structure
Data description
Format of the data
XX_YY.ZZZZ
X - data type AA or NT
Y - genes tested by SymTest; SG - genes not rejected or SB genes rejected by SymTest
Z - file type: .PART for partition file or .FAS for matrix
Files and variables
File: NT_SB.part
Description: Partition file for SymTest refused genes at nucleotide level.
File: NT_SG.part
Description: Partition file for SymTest valid genes at nucleotide level.
File: NT_SB.fasta
Description: Fasta with with SymTest refused genes at nucleotide level.
File: AA_SB.part
Description: Partition file for SymTest refused genes at amino acid level.
File: AA_SB.fasta
Description: Fasta with with SymTest refused genes at amino acid level.
File: AA_SG.part
Description: Partition file for SymTest valid genes at amino acid level.
File: AA_SG.fasta
Description: Fasta with with SymTest valid genes at amino acid level.
File: NT_SG.fasta
Description: Fasta with with SymTest valid genes at nucleotide level.
File: NT_IQ_ST_GOOD_3345_partitioned_Fig_3.treefile
Description: Nexus tree file used in Figure 3.
File: AA_Q_ASTRAL_SG_H_Fig4.astral
Description: Nexus tree file used in Figure 4.
Code/software
Programs & scripts used for Phylogenomics-based click-beetle classification tackles multiple origins of phenotypic modifications
FasConCat.pl
Repository & Authors: https://github.com/PatrickKueck/FASconCAT
astral.5.15.5.jar
Repository & Authors: https://github.com/smirarab/ASTRAL
iqtree2.3.4
Repository & Authors: https://github.com/iqtree/iqtree2
gc-violin-heatmap.py
Custom scrip to analyse GC content with visualization
Access information
Other publicly accessible locations of the data:
- Raw sequencing reads have been deposited in GenBank under BioProject accession number PRJNA1331533.
Data was derived from the following sources:
- As stated in Supplementary part of the study.
For references, see the final manuscript!
Material and Methods
Data Collection, Assembly, and Processing
Whole genome sequencing (WGS) and transcriptomic (TSA) data were explicitly generated for this study to enhance the coverage of principal lineages (42% of taxa). Some genomic data were sourced from previous large-scale phylogenies and other genomic studies (4% of taxa; (Amaral et al., 2019; Fallon et al., 2018; McKenna et al., 2019). This was supplemented by newly assembled data from primary anchored hybrid enrichment (AHE) reads published by Douglas et al. (Douglas et al., 2021) (Table 1, S1). The AHE data were newly processed following the methods of Kusy et al. (2019, 2021). Raw reads were trimmed using fastp (parameters: -q 15 -u 40 -n 5 -l 50) and assembled using IDBA-UD (parameters: --step 5 --maxk 120). This approach substantially increased the volume of information compared to the earlier published dataset. Six polyphagan outgroup taxa were selected to root the tree (Table S1; reference genomes, see Zdobnov et al., 2017). The 4,425 single-copy ortholog set was utilized for the Orthograph v.0.6.3 search with default settings. Nucleotide (NT) alignments were produced using Pal2Nal. The amino acid (AA) sequences were aligned with Mafft v.7.407 (L-INS-i algorithm).
Using the OliInSeq v.0.9.5 software (https://github.com/cmayer/OliInSeq), integrated into the TEnriAn workflow (https://github.com/ZFMK/TEnriAn), outlier sequences were identified and masked at the amino acid level. The analysis utilized a sliding window approach with window sizes of 30 and the following parameters: --remove-gap-ambig-lowerCase-sites --windowsSize 30 --IQR-factor 1.5 -e 0.25. The software soft-masked identified outlier sequence segments using lowercase symbols. Entire sections were removed where outliers were detected in more than 25% of the valid windows. OliInSeq then automatically applied this masking to the corresponding nucleotide alignments. When outliers were excluded, all sequences shorter than 50 amino acids and 150 nucleotides were discarded. Additionally, all genes represented in fewer than 30 taxa were omitted.
We used Aliscore v.2.2 to identify randomly misaligned or ambiguous sequences. These segments were masked using ALICUT v.2.3. Data completeness was analyzed with AliStat v.1.11 (Figure S1, Wong et al., 2020)). The software SymTest v.2.0.49 (https://github.com/ottmi/symtest) was employed to test for deviations from stationarity, reversibility, and homogeneity (SRH) (Figure S2, Jermiin et al., 2008).
We assembled three datasets, each at NT and AA levels, to evaluate the robustness of alternative topologies. The datasets are designated All_4165 (all 4,165 identified genes), SG_3345 (3,345 genes compliant with SRH criteria), and SB_820 (820 genes rejected by SymTest; Figure S3). Furthermore, to assess the effect of guanine-cytosine (GC) content, all genes included in the SG_3345 dataset were categorized into one of five groups (concatenated length ~970 000 bp), designated as SG_Extr_Low, SG_Low, SG_Med, SG_High, and SG_Extr_High.
Phylogenetic analyses
Phylogenetic reconstructions were performed using the maximum-likelihood (ML) criterion with IQ-TREE v.2.3.4 (Minh et al., 2020). ModelFinder (Chernomor et al., 2016; Kalyaanamoorthy et al., 2017) was utilized to assess models of molecular evolution with the parameters -m MFP -mrate E, I, G, I+G, R -merit BIC -gmedian -cmin 4. The selected models of amino acid (AA) evolution were evaluated with the parameters -mset Dayhoff, DCmut, JTT, JTTDCmut, LG, WAG -madd LG4X, LG4M. The search for model parameters at the nucleotide (NT) level was restricted to the GTR model with -mset GTR. Branch support was assessed with the Ultrafast bootstrap (Hoang et al., 2018) and the SH-like approximate likelihood ratio test (SH-aLRT) using the options -bb 10 000 and -alrt 10 000.
Alternatively, we used the coalescent method to account for gene tree heterogeneity properly. This approach reflects biological processes such as incomplete lineage sorting and can result in conflicting phylogenetic signals pointing to ambiguities linked to various methods. We used the Accurate Species Tree Algorithm (ASTRAL-III v.5.6.3, (Zhang et al., 2018a). We calculated ML trees for every gene using the models that had already been identified in the analysis of the concatenated dataset. Poorly resolved topologies in single-gene trees can negatively impact the ASTRAL analysis. Therefore, we applied Newick utilities v.1.6 (Junier & Zdobnov, 2010) and collapsed all splits with Ultrafast bootstraps values below 50.
Additionally, we tested splits of interest using the four-cluster likelihood mapping (FcLM) approach implemented in IQ-TREE v.2.3.4 (Minh et al., 2020). To reduce the computation burden associated with the size of the dataset, we created a subset of taxa with the highest data completeness (high gene count, the best representation of all subfamilies, altogether 42 taxa). We defined clusters of taxa to test critical relationships (Figure 4C–E).
