Skip to main content
Dryad

Data from: Phylotranscriptomics: saturated third codon positions radically influence the estimation of trees based on next-gen data

Cite this dataset

Breinholt, Jesse W.; Kawahara, Akito Y. (2014). Data from: Phylotranscriptomics: saturated third codon positions radically influence the estimation of trees based on next-gen data [Dataset]. Dryad. https://doi.org/10.5061/dryad.r5cq0

Abstract

The recent advancement in molecular sequencing techniques has led to a surge in the number of studies that incorporate large amounts of genetic data in phylogenetic studies. We test the assumption that analyzing large amounts of genetic data will lead to improvements in tree resolution and branch support using moths in the superfamily Bombycoidea, a group in which some of its inter-familial relationships have been difficult to resolve. Specifically, we examine how codon position and saturation might influence resolution and node support among three key families using a next-gen dataset that included 19 taxa and 938 genes (~1.2M bp). Maximum likelihood, parsimony, and species tree analysis using gene-tree parsimony, on numerous different nucleotide and amino acids datasets, resulted in largely congruent topologies with high bootstrap support, compared to prior studies that included a fewer number of loci. However, for a few shallow nodes, nucleotide and amino acid data provided high support for conflicting relationships. The third codon position was saturated and phylogenetic analysis of this position alone supported a completely different, potentially misleading sister group relationship. We used the program RADICAL to assess the number of genes needed to fix some of these difficult nodes. One such node needed a total of 850 genes, but only needed 250 when synonymous signal was removed. While transcriptomics can provide large amounts of data needed to resolve many difficult phylogenetic relationships, the importance of assessing the effect of synonymous substitutions and third codon positions in next-gen datasets still remains.

Usage notes