Skip to main content
Dryad

Data from: Nearly complete rRNA genes from 371 Animalia: updated structure-based alignment and phylogenetic analysis

Cite this dataset

Mallatt, Jon M.; Craig, Catherine Waggoner; Yoder, Matthew J. (2012). Data from: Nearly complete rRNA genes from 371 Animalia: updated structure-based alignment and phylogenetic analysis [Dataset]. Dryad. https://doi.org/10.5061/dryad.1v62kr3q

Abstract

This study presents a manually constructed alignment of nearly complete rRNA genes from most animal clades (371 taxa from ∼33 of the ∼36 metazoan phyla), expanded from the 197 sequences in a previous study. This thorough, taxon-rich alignment, available at http://www.wsu.edu/≃jmallatt/research/rRNAalignment.html and in the Dryad Repository (doi: http://dx.doi.org/10.5061/dryad.1v62kr3q), is based rigidly on the secondary structure of the SSU and LSU rRNA molecules, and is annotated in detail, including labeling of the erroneous sequences (contaminants). The alignment can be used for future studies of the molecular evolution of rRNA. Here, we use it to explore if the larger number of sequences produces an improved phylogenetic tree of animal relationships. Disappointingly, the resolution did not improve, neither when the standard maximum-likelihood method was used, nor with more sophisticated methods that partitioned the rRNA into paired and unpaired sites (stem, loop, bulge, junction), or accounted for the evolution of the paired sites. For example, no doublet model of paired-site substitutions (16-state, 16A and 16B, 7A–F, or 6A–C models) corrected the placement of any rogue taxa or increased resolution. The following findings are from the simplest, standard, ML analysis. The 371-taxon tree only imperfectly supported the bilaterian clades of Lophotrochozoa and Ecdysozoa, and this problem remained after 17 taxa with unstably positioned sequences were omitted from the analysis. The problem seems to stem from base-compositional heterogeneity across taxa and from an overrepresentation of highly divergent sequences among the newly added taxa (e.g., sequences from Cephalopoda, Rotifera, Acoela, and Myxozoa). The rogue taxa continue to concentrate in two locations in the rRNA tree: near the base of Arthropoda and of Bilateria. The approximately uncertain (AU) test refuted the monophyly of Mollusca and of Chordata, probably due to long-branch attraction of the highly divergent cephalopod and urochordate sequences out of those clades. Unlikely to be correct, these refutations show for the first time that rRNA phylogeny can support some ‘wrong’ clades. Along with its weaknesses, the rRNA tree has strengths: It recovers many clades that are supported by independent evidence (e.g., Metazoa, Bilateria, Hexapoda, Nonoculata, Ambulacraria, Syndermata, and Thecostraca with Malacostraca) and shows good resolution within certain groups (e.g., in Platyhelminthes, Insecta, Cnidaria). As another strength, the newly added rRNA sequences yielded the first rRNA-based support for Carnivora and Cetartiodactyla (dolphin + llama) in Mammalia, for basic subdivisions of Bryozoa (‘Gymnolaemata + Stenolaemata’ versus Phylactolaemata), and for Oligostraca (ostracods + branchiurans + pentastomids + mystacocarids). Future improvement could come from better sequence-evolution models that account for base-compositional heterogeneity, and from combining rRNA with protein-coding genes in phylogenetic reconstruction.

Usage notes

Location

Worldwide