Skip to main content
Dryad

Supplementary data from: Integrating secondary structure information enhances phylogenetic signal in mitochondrial protein coding genes

Data files

Mar 19, 2026 version files 6.01 MB

Click names to download individual files

Abstract

Accurate phylogenetic inference requires models that account for heterogeneity in molecular evolution. Mitochondrial protein-coding genes, which encode membrane-bound proteins composed of multiple transmembrane α-helices, exhibit considerable compositional and functional variation across structural regions, variation that is often overlooked in standard partitioning strategies. Here, we introduce TRAMPO (TRAnsMembrane Protein Order), a novel pipeline that incorporates predicted secondary structural features (i.e., matrix-facing, transmembrane, and intermembrane-facing domains) into phylogenetic partitioning schemes. We applied TRAMPO to seven mitochondrial datasets, spanning crustaceans, hexapods, and vertebrates, and evaluated eight partitioning strategies based on combinations of codon position, strand, and secondary structure. Transmembrane helices showed pronounced thymine enrichment at second codon positions and hydrophobic amino-acid composition, reflecting domain-specific evolutionary constraints. To assess whether these structural patterns influence phylogenetic reconstruction, we performed maximum likelihood analyses under Markov models with various degrees of complexity (ranging from standard Markov models, via Lie Markov and General Heterogeneous evolution on a Single Topology Markov models, to profile mixture Markov models). We also evaluated different models of rate-heterogeneity across sites (including the invariable sites model, gamma-distribution model, and FreeRate model) to examine their interaction with partitioning strategies and overall model performance. Incorporating structural information into partitioning schemes consistently improved model fit and reduced apparent heterogeneity, as reflected in lower AIC values and more compositionally homogeneous partitions. These improvements translated into more consistent and topologically congruent phylogenetic trees across most datasets, while also reducing computational time. Notably, second codon positions in DNA that encode transmembrane helices were consistently retained as distinct partitions during model optimization, even in Mammals and Vertebrates, where secondary structure contributed little to overall model performance, underscoring their strong and conserved evolutionary signal. Surveys of tree space using quartet distances further supported these findings, with structurally informed models yielding more tightly clustered and internally consistent tree topologies. The benefits of structural partitioning were most pronounced in lineages of intermediate evolutionary depth and declined in ancient vertebrate and mammalian clades, where substitutional saturation accumulates with evolutionary time and strand asymmetry tends to emerge more frequently. In some cases, models with the lowest AIC did not yield the most congruent topologies, underscoring the limitations of information criteria when comparing models of different complexity. Overall, our findings demonstrate that secondary structural features, particularly the repetitive architecture of transmembrane helices, harbour meaningful phylogenetic signal. Incorporating this information into partitioning schemes improves tree reconstruction and mitigates underlying heterogeneity. TRAMPO provides a scalable, open-source tool to implement this approach in mitochondrial phylogenetics.