Show simple item record Wu, Dongying Wu, Martin Halpern, Aaron Rusch, Douglas B. Yooseph, Shibu Frazier, Marvin Venter, J. Craig Eisen, Jonathan A.
dc.coverage.spatial Sargasso Sea 2011-01-18T15:31:39Z 2011-01-18T15:31:39Z 2011-03-18
dc.identifier doi:10.5061/dryad.8384
dc.identifier.citation Wu D, Wu M, Halpern A, Rusch DB, Yooseph S, Frazier M, Venter JC, Eisen JA (2011) Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in phylogenetic trees of phylogenetic marker genes. PLoS ONE 6(3): e18011.
dc.description BACKGROUND: Most of our knowledge about the ancient evolutionary history of organisms has been derived from data associated with specific known organisms (i.e., organisms that we can study directly such as plants, metazoans, and culturable microbes). Recently, however, a new source of data for such studies has arrived: DNA sequence data generated directly from environmental samples. Such metagenomic data has enormous potential in a variety of areas including, as we argue here, in studies of very early events in the evolution of gene families and of species. METHODOLOGY/PRINCIPAL FINDINGS: We designed and implemented new methods for analyzing metagenomic data and used them to search the Global Ocean Sampling (GOS) Expedition data set for novel lineages in three gene families commonly used in phylogenetic studies of known and unknown organisms: small subunit rRNA and the recA and rpoB superfamilies. Though the methods available could not accurately identify very deeply branched ss-rRNAs (largely due to difficulties in making robust sequence alignments for novel rRNA fragments), our analysis revealed the existence of multiple novel branches in the recA and rpoB gene families. Analysis of available sequence data likely from the same genomes as these novel recA and rpoB homologs was then used to further characterize the possible organismal source of the novel sequences. CONCLUSIONS/SIGNIFICANCE: Of the novel recA and rpoB homologs identified in the metagenomic data, some likely come from uncharacterized viruses while others may represent ancient paralogs not yet seen in any cultured organism. A third possibility is that some come from novel cellular lineages that are only distantly related to any organisms for which sequence data is currently available. If there exist any major, but so-far-undiscovered, deeply branching lineages in the tree of life, we suggest that methods such as those described herein currently offer the best way to search for them.
dc.relation.haspart doi:10.5061/dryad.8384/1
dc.relation.isreferencedby doi:10.1371/journal.pone.0018011
dc.relation.isreferencedby PMID:21437252
dc.subject genomics
dc.subject evolution
dc.subject metagenomics
dc.subject phylogeny
dc.subject tree of life
dc.subject RecA
dc.subject RpoB
dc.subject GOS
dc.title Data from: Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in phylogenetic trees of phylogenetic marker genes
dc.type Article
dc.contributor.correspondingAuthor Eisen, Jonathan A.
prism.publicationName PLoS ONE

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title Wu_2011_Data
Downloaded 410 times
Description Stalking the fourth domain in metagenomic data: searching for, discovering, and interpreting novel, deep branches in phylogenetic trees of phylogenetic marker genes. Dongying Wu, Martin Wu, Aaron Halpern, Doug Rusch, Shibu Yooseph, Marvin Frazier, J. Craig Venter, Jonathan A. Eisen. Supplementary Data: (1) recA data: recA.tgz. recA.tgz contains the following files: recA_GOS.pep -- Amino acid sequences for GOS RecAs, recA_ref.pep -- Amino acid sequences for RecAs from NRAA and genome sequences, recA_cluster.txt -- Lek clusters of RecA sequences (Table 1), recA.ali -- Original alignment for the RecA tree (Figure 1), recA.trim.ali -- Trimmed RecA alignment that the RecA tree is built upon (Figure 1), recA.tre -- RecA tree in Newick format (Figure 1), -- The assembly IDs of the recA encoding GOS assemblies (Table 2), recA_linked.pep -- The Amino Acid sequences of the genes that share assemblies with the GOS novel recA (Table 2). (2) rpoB data: rpoB.tgz. rpoB.tgz contains the following files: rpoB_GOS.pep -- Amino acid sequences for GOS RpoBs, rpoB_ref.pep -- Amino acid sequences for RpoBs from NRAA and genome sequences, rpoB_cluster.txt -- Lek clusters of RpoB sequences (Table 3), rpoB.tre.ali -- Original alignment for the RpoB tree (Figure 3), rpoB.tre.trim -- Trimmed RpoB alignment that the RpoB tree is built upon (Figure 3), rpoB.tre -- RpoB tree in Newick format (Figure 3). (3) ss-rRNA data: ssu.tgz. ssu.tgz contains the following files: SSU_GOSreads.fa -- GOS ss-rRNA sequences, SSU_GOSreads_deepbrach.fa -- Potential GOS deep-branching ss-rRNA. (4) Lek Clustering Program: lek.tgz. lek.tgz contains scripts for the Lek clustering protocol. Instructions can be found in the included README file.
Download (3.501 Mb)
Download README.pdf (54.98 Kb)
Details View File Details

Search for data

Be part of Dryad

We encourage organizations to: