Show simple item record Lynch, Erin A. Langille, Morgan G. I. Darling, Aaron Wilbanks, Elizabeth G. Haltiner, Caitlin Shao, Katie S. Y. Starr, Michael O. Teiling, Clotilde Harkins, Timothy T. Edwards, Robert A. Eisen, Jonathan A. Facciotti, Marc T. Randau, Lennart 2012-08-08T18:46:58Z 2012-08-08T18:46:58Z 2012-07-24
dc.identifier doi:10.5061/dryad.j08jp
dc.identifier.citation Lynch EA, Langille MGI, Darling A, Wilbanks EG, Haltiner C, Shao KSY, Starr MO, Teiling C, Harkins TT, Edwards RA, Eisen JA, Facciotti MT, Randau L (2012) Sequencing of seven haloarchaeal genomes reveals patterns of genomic flux. PLoS ONE 7(7): e41389.
dc.description We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model group for the study of archaeal biology. The unique physiological properties of these organisms also make them good candidates for novel enzyme discovery for biotechnological applications. Seven genomes were sequenced to ~20×coverage and assembled to an average of 50 contigs (range 5 scaffolds - 168 contigs). Comparisons of protein-coding gene compliments revealed large-scale differences in COG functional group enrichment between these genera. Analysis of genes encoding machinery for DNA metabolism reveals genera-specific expansions of the general transcription factor TATA binding protein as well as a history of extensive duplication and horizontal transfer of the proliferating cell nuclear antigen. Insights gained from this study emphasize the importance of haloarchaea for investigation of archaeal biology.
dc.relation.haspart doi:10.5061/dryad.j08jp/1
dc.relation.haspart doi:10.5061/dryad.j08jp/2
dc.relation.haspart doi:10.5061/dryad.j08jp/3
dc.relation.isreferencedby doi:10.1371/journal.pone.0041389
dc.relation.isreferencedby PMID:22848480
dc.subject halophilic archaea
dc.subject archaea
dc.subject phylogenomics
dc.subject genomics
dc.subject evolution
dc.title Data from: Sequencing of seven haloarchaeal genomes reveals patterns of genomic flux
dc.type Article *
dwc.ScientificName Haloferax
dwc.ScientificName Haloarcula
dc.contributor.correspondingAuthor Lynch, Erin A.
prism.publicationName PLOS ONE
dryad.dansTransferDate 2018-04-12T22:49:06.948+0000
dryad.dansArchiveDate 2018-04-15T23:49:37.781+0000

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title Dataset S3 - Genome assemblies in fasta format
Downloaded 35 times
Description Genome assemblies of six halophilic archaea. Organisms were obtained from culture collections, grown, and then each genome was shotgun sequenced using 454 pyrosequencing. The genomes were then assembled into contigs and scaffolds. Data for each organism is in a separate file in fasta format. Each scafold is labelled with a header delineated by the '>' character.
Download (16.96 Mb)
Details View File Details
Title Dataset_S1 Syntenic Halophilic Tribes matrix
Downloaded 22 times
Description In order to determine phylogenetic distribution of haloarchaeal genes, a gene presence/absence matrix was constructed by the following process. Independent multi-genome alignments were made for the Haloferax and Haloarcula genera using the whole genome alignment method progressiveMauve [64]. The contigs for each alignment were reordered to match the published genomes of Haloferax volcanii [19] and Haloarcula marismortui [18], respectively, using Mauve’s built-in contig reordering program (Figures S3 and S4). Sets of functionally homologous genes (orthologs), referred to hereafter as Syntenic Halophile Tribes (SHTs), were determined from alignments and joined by the following process. The proteins in each SHT from the Haloferax alignment were searched against all proteins in each SHT from the Haloarcula genomes using BLAST [37] and a bit score for each pair of SHTs was calculated by averaging the bit scores from each BLAST hit. A traditional reciprocal best hit (RBH) BLAST approach was used to produce one-to-one mappings between SHTs in the two genera. Each joined SHT was assigned a function using the most commonly occurring functional annotation of the protein products of the genes in the SHT. This resulted in a set of 398 SHTs present in all nine genomes. Hidden Markov Models (HMMs) were generated for each SHT using HMMER 3, resulting in 13,276 HMMs. The 1,303 completed archaeal and bacterial genomes available from NCBI as of March 15, 2011 were downloaded and a single genome from each genus selected at random, resulting in 396 genomes. Each SHT HMM was searched against these 396 genomes and the eight halophile genomes generated for this study using HMMER 3. Each gene was counted as belonging to the HMM if it had an E-value below 0.0001 and the hit covered greater than 80% of the length of both the gene and the HMM. If a gene hit more than one HMM it was counted only for the HMM with the best E-value. These hits were then used to generate a 13,276 x 405 presence/absence matrix. The genomes and HMMs were clustered using the ‘ctc’ library in R [65] with manhattan distance and complete linkage clustering. The clustering was viewed with the Java Treeview program [66]. Cluster file can be accessed at our website [60] and as Dataset S1 and Figure S5.
Download Dataset_S1.cdt (6.315 Mb)
Details View File Details
Title Dataset S2. Full alignment of Proliferating Cell Nuclear Antigen (PCNA) homologs
Downloaded 34 times
Description Untrimmed alignment of sixty-one PCNA homologs from fifty-seven archaeal and eukaryotic species constructed with MUSCLE.
Download journal-2.pone.0041389.s013.txt (32.41 Kb)
Details View File Details

Search for data

Be part of Dryad

We encourage organizations to: