Show simple item record

dc.contributor.author Liu, Yang
dc.contributor.author Cox, Cymon J.
dc.contributor.author Wang, Wei
dc.contributor.author Goffinet, Bernard
dc.date.accessioned 2014-07-29T13:44:47Z
dc.date.available 2014-07-29T13:44:47Z
dc.date.issued 2014-07-28
dc.identifier doi:10.5061/dryad.7b470
dc.identifier.citation Liu Y, Cox CJ, Wang W, Goffinet B (2014) Mitochondrial phylogenomics of early land plants: mitigating the effects of saturation, compositional heterogeneity, and codon-usage bias. Systematic Biology 63(6): 862-878.
dc.identifier.uri http://hdl.handle.net/10255/dryad.58784
dc.description Phylogenetic analyses using concatenation of genomic-scale data have been seen as the panacea to resolving the incongruences among inferences from few or single genes. However, phylogenomics may also suffer from systematic errors, due to the, perhaps cumulative, effects of saturation, among-taxa compositional (GC content) heterogeneity, or codon-usage bias plaguing the individual nucleotide loci that are concatenated. Here we provide an example of how these factors affect the inferences of the phylogeny of early land plants based on mitochondrial genomic data. Mitochondrial sequences evolve slowly in plants and hence are thought to be suitable for resolving deep relationships. We newly assembled mitochondrial genomes from 20 bryophytes, complemented these with 40 other streptophytes (land plants plus algal outgroups), compiling a data matrix of 60 taxa and 41 mitochondrial genes. Homogeneous analyses of the concatenated nucleotide data resolve mosses as sister-group to the remaining land plants. However, the corresponding translated amino acid data support the liverwort lineage in this position. Both results receive weak to moderate support in maximum likelihood analyses, but strong support in Bayesian inferences. Tests of alternative hypotheses using either nucleotide or amino-acid data provide implicit support for the respective optimal topologies. By analyzing the nucleotide data, we found that the 3rd codon positions are more saturated than the 1st and 2nd codon positions, and excluding these from the analyses leads to a topology congruent with that obtained using amino-acid data. Further, we determined that land plant lineages differ in their nucleotide composition, and in their usage of synonymous codon variants. Composition heterogeneous Bayesian analyses employing a non-stationary model that accounts for variation in among-lineage composition, and inferences from degenerated nucleotide data that avoids the effects of synonymous mutations that underlie codon-usage bias, again recovered liverworts being sister to the remaining land plants. These analyses indicate that the discrepancy between the nucleotide-based and the amino acid-based trees is caused by the lineage specific, parallel compositional bias, or synonymous mutations driving codon-usage bias, as well as saturation in the 3rd codon positions. While genomic data may generate highly supported phylogenetic trees, these inferences may be artifacts. We suggest that phylogenomic analyses should assess the possible impact of potential biases through comparisons of protein coding gene data and their amino-acids translations, by analyzing data modeling compositional bias, and by excluding nucleotide noisy signals due to saturation or codon-usage bias. We caution against relying on any one presentation of the data (nucleotide or amino acid) or any one type of analysis even when analyzing large-scale data sets, no matter how well-supported, without fully exploring the effects of substitution models.
dc.relation.haspart doi:10.5061/dryad.7b470/1
dc.relation.haspart doi:10.5061/dryad.7b470/2
dc.relation.haspart doi:10.5061/dryad.7b470/3
dc.relation.haspart doi:10.5061/dryad.7b470/4
dc.relation.haspart doi:10.5061/dryad.7b470/5
dc.relation.haspart doi:10.5061/dryad.7b470/6
dc.relation.haspart doi:10.5061/dryad.7b470/7
dc.relation.haspart doi:10.5061/dryad.7b470/8
dc.relation.haspart doi:10.5061/dryad.7b470/9
dc.relation.isreferencedby doi:10.1093/sysbio/syu049
dc.relation.isreferencedby PMID:25070972
dc.subject early land plants
dc.subject phylogenomics
dc.subject mitochondrial genome
dc.subject evolutionary saturation
dc.subject composition heterogeneity
dc.subject synonymous codon-usage bias
dc.title Data from: Mitochondrial phylogenomics of early land plants: mitigating the effects of saturation, compositional heterogeneity, and codon-usage bias
dc.type Article
prism.publicationName Systematic Biology

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title Matrix_nt_nex
Downloaded 45 times
Description Matrix of nucleotide sequences of 41 mitochondrial genes for 60 streptophytes.
Download Matrix_nt.nex (1.891 Mb)
Details View File Details
Title Matrix_aa
Downloaded 21 times
Description Matrix of amino acid translations of 41 mitochondrial genes for 60 streptophytes
Download Matrix_aa.nex (632.3 Kb)
Details View File Details
Title Matrix_nt_degenerate
Downloaded 27 times
Description Matrix of degenerated nucleotide sequences of 41 mitochondrial genes for 60 streptophytes
Download Matrix_nt_degenerate.nex (1.892 Mb)
Details View File Details
Title stmtREV_model
Downloaded 31 times
Description Substitutional rate matrix of amino acid translations of mitochondrial genes of streptophytes
Download stmtREV_model.txt (2.962 Kb)
Details View File Details
Title extractdata
Downloaded 37 times
Description Script used for extracting the data matrix from the SEQ-GEN output files
Download extractdata.py (1.349 Kb)
Details View File Details
Title extract_likelihood_script
Downloaded 16 times
Description Script to extract likelihood scores from RAxML result
Download extractlikelihood (695.7 Kb)
Details View File Details
Title Extract-CDS
Downloaded 27 times
Description a Perl script for extracting mitochondrial coding genes from the GenBank file.
Download Extract-CDS.pl (1.089 Kb)
Details View File Details
Title extractdata
Downloaded 20 times
Description Script used for extracting the data matrix from the SEQ-GEN output files
Download extractdata.py (1.406 Kb)
Details View File Details
Title Supplementary_Material_Liu_et_al
Downloaded 139 times
Description Table S1. List of 60 taxa sampled for the mitochondrial genomic dataset in this study. Table S2. Characteristics of 41 mitochondrial genes. Table S3. Characteristics of data matrices and statistics of the best-scoring ML trees inferred from each partitioning strategy. Table S4. Average GC percentage of each plant group, and difference significance test among groups. Figure S1. The average pairwise distances of the 41mitochondrial genes. Figure S2. A summary of single gene ML tree topologies. Figure S3-S19. Supplementary trees.
Download Supplementary_Material_Liu_et_al.pdf (2.227 Mb)
Details View File Details

Search for data

Be part of Dryad

We encourage organizations to: