Data from: Sequencing of seven haloarchaeal genomes reveals patterns of genomic flux
Lynch, Erin A., University of California, Davis
Langille, Morgan G. I., Dalhousie University
Darling, Aaron, University of California, Davis
Wilbanks, Elizabeth G., University of California, Davis
Haltiner, Caitlin, University of California, Davis
Shao, Katie S. Y., Davis Senior High School, Davis, California, United States of America
Starr, Michael O., University of California, Davis
Teiling, Clotilde, 454 Life Sciences, a Roche Company, Branford, Connecticut, United States of America
Harkins, Timothy T., Life Technologies, Beverly, Massachusetts, United States of America
Edwards, Robert A., San Diego State University, Argonne National Laboratory
Eisen, Jonathan A., University of California, Davis
Facciotti, Marc T., University of California, Davis
We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model group for the study of archaeal biology. The unique physiological properties of these organisms also make them good candidates for novel enzyme discovery for biotechnological applications. Seven genomes were sequenced to ~20×coverage and assembled to an average of 50 contigs (range 5 scaffolds - 168 contigs). Comparisons of protein-coding gene compliments revealed large-scale differences in COG functional group enrichment between these genera. Analysis of genes encoding machinery for DNA metabolism reveals genera-specific expansions of the general transcription factor TATA binding protein as well as a history of extensive duplication and horizontal transfer of the proliferating cell nuclear antigen. Insights gained from this study emphasize the importance of haloarchaea for investigation of archaeal biology.
RAST_Annotations
RAST-generated annotations for all genomes included in the analyses in this manuscript.
Raw_sequence_reads
Fragment libraries were constructed for eight species of the family Halobacteriacea, three from the genus Haloarcula (Har. californiae, Har. sinaiiensis, Har. vallismortis) and five from the genus Haloferax (Hfx. denitrificans, Hfx. mediterranei, Hfx. mucosum, Hfx. sulfurifontis, and Hfx. volcanii), and sequenced on a single GS FLX Titanium run following standard protocols (454 Life Sciences - http://454.com/). Hfx. volcanii was included as a sequencing control, as its genome had been completed previously [19]. Additionally, for Har. sinaiiensis and Hfx. mediterranei, 8 Kb pair-end libraries were constructed and the terminal 100 bp of each end was sequenced, according to standard protocols. The paired-end information and any trimming information are specified using annotation strings on the description line of the reads. Reads were assembled using the Genome Sequencer De Novo assembler (454 Life Sciences - http://www.my454.com/).