Data from: Sequencing of seven haloarchaeal genomes reveals patterns of genomic flux

Lynch EA, Langille MGI, Darling A, Wilbanks EG, Haltiner C, Shao KSY, Starr MO, Teiling C, Harkins TT, Edwards RA, Eisen JA, Facciotti MT, Randau L

Date Published: August 8, 2012

DOI: http://dx.doi.org/10.5061/dryad.j08jp

 

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title Dataset S3 - Genome assemblies in fasta format
Downloaded 28 times
Description Genome assemblies of six halophilic archaea. Organisms were obtained from culture collections, grown, and then each genome was shotgun sequenced using 454 pyrosequencing. The genomes were then assembled into contigs and scaffolds. Data for each organism is in a separate file in fasta format. Each scafold is labelled with a header delineated by the '>' character.
Download Dataset_S3.zip (16.96 Mb)
Details View File Details
Title Dataset_S1 Syntenic Halophilic Tribes matrix
Downloaded 19 times
Description In order to determine phylogenetic distribution of haloarchaeal genes, a gene presence/absence matrix was constructed by the following process. Independent multi-genome alignments were made for the Haloferax and Haloarcula genera using the whole genome alignment method progressiveMauve [64]. The contigs for each alignment were reordered to match the published genomes of Haloferax volcanii [19] and Haloarcula marismortui [18], respectively, using Mauve’s built-in contig reordering program (Figures S3 and S4). Sets of functionally homologous genes (orthologs), referred to hereafter as Syntenic Halophile Tribes (SHTs), were determined from alignments and joined by the following process. The proteins in each SHT from the Haloferax alignment were searched against all proteins in each SHT from the Haloarcula genomes using BLAST [37] and a bit score for each pair of SHTs was calculated by averaging the bit scores from each BLAST hit. A traditional reciprocal best hit (RBH) BLAST approach was used to produce one-to-one mappings between SHTs in the two genera. Each joined SHT was assigned a function using the most commonly occurring functional annotation of the protein products of the genes in the SHT. This resulted in a set of 398 SHTs present in all nine genomes. Hidden Markov Models (HMMs) were generated for each SHT using HMMER 3, resulting in 13,276 HMMs. The 1,303 completed archaeal and bacterial genomes available from NCBI as of March 15, 2011 were downloaded and a single genome from each genus selected at random, resulting in 396 genomes. Each SHT HMM was searched against these 396 genomes and the eight halophile genomes generated for this study using HMMER 3. Each gene was counted as belonging to the HMM if it had an E-value below 0.0001 and the hit covered greater than 80% of the length of both the gene and the HMM. If a gene hit more than one HMM it was counted only for the HMM with the best E-value. These hits were then used to generate a 13,276 x 405 presence/absence matrix. The genomes and HMMs were clustered using the ‘ctc’ library in R [65] with manhattan distance and complete linkage clustering. The clustering was viewed with the Java Treeview program [66]. Cluster file can be accessed at our website [60] and as Dataset S1 and Figure S5.
Download Dataset_S1.cdt (6.315 Mb)
Details View File Details
Title Dataset S2. Full alignment of Proliferating Cell Nuclear Antigen (PCNA) homologs
Downloaded 30 times
Description Untrimmed alignment of sixty-one PCNA homologs from fifty-seven archaeal and eukaryotic species constructed with MUSCLE.
Download journal-2.pone.0041389.s013.txt (32.41 Kb)
Details View File Details

When using this data, please cite the original publication:

Lynch EA, Langille MGI, Darling A, Wilbanks EG, Haltiner C, Shao KSY, Starr MO, Teiling C, Harkins TT, Edwards RA, Eisen JA, Facciotti MT, Randau L (2012) Sequencing of seven haloarchaeal genomes reveals patterns of genomic flux. PLoS ONE 7(7): e41389. http://dx.doi.org/10.1371/journal.pone.0041389

Additionally, please cite the Dryad data package:

Lynch EA, Langille MGI, Darling A, Wilbanks EG, Haltiner C, Shao KSY, Starr MO, Teiling C, Harkins TT, Edwards RA, Eisen JA, Facciotti MT, Randau L (2012) Data from: Sequencing of seven haloarchaeal genomes reveals patterns of genomic flux. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.j08jp
Cite | Share
Download the data package citation in the following formats:
   RIS (compatible with EndNote, Reference Manager, ProCite, RefWorks)
   BibTex (compatible with BibDesk, LaTeX)

Search for data

Be part of Dryad

We encourage organizations to: