Data from: Single-cell genomics reveals hundreds of coexisting subpopulations in wild Prochlorococcus

Kashtan, Nadav1; Roggensack, Sara E.1; Rodrigue, Sébastien1; Thompson, Jessie W.1; Biller, Steven J.1; Coe, Allison1; Ding, Huiming1; Marttinen, Pekka2; Malmstrom, Rex R.3; Stocker, Roman1; Follows, Michael J.1; Stepanauskas, Ramunas4; Chisholm, Sallie W.1

Published Mar 19, 2015 on Dryad. https://doi.org/10.5061/dryad.9r0p6

Data files

Mar 19, 2015 version files 353.98 MB

Abstract

Extensive genomic diversity within coexisting members of a microbial species has been revealed through selected cultured isolates and metagenomic assemblies. Yet, the cell-by-cell genomic composition of wild uncultured populations of co-occurring cells is largely unknown. In this work, we applied large-scale single-cell genomics to study populations of the globally abundant marine cyanobacterium Prochlorococcus. We show that they are composed of hundreds of subpopulations with distinct “genomic backbones,” each backbone consisting of a different set of core gene alleles linked to a small distinctive set of flexible genes. These subpopulations are estimated to have diverged at least a few million years ago, suggesting ancient, stable niche partitioning. Such a large set of coexisting subpopulations may be a general feature of free-living bacterial species with huge populations in highly mixed habitats.

Mapping between ITS and MDA names

This files maps between the ITS sequence name and the MDA sequence name. ITS sequence names are described by their well in the ITS sanger sequencing plates (96-well plates) MDA sequence names are described by their well in the MDA reactions plates (384-well plates) ITS sequence name format is as follows: for example: >B245a_520_F02_p18 means BATS cruise 245a, MDA plate 520, ITS plate 18 well F02. To find the relavant MDA well use the mapping file ITS_to_MDA_mapping_final.xls In this example the MDA well is thus plate 520 well O2

ITS_to_MDA_mapping_final.xlsx

ITS-rRNA multi-alignment of all single cells

Alignment of all ITS sequences used to build the trees and heatmaps in Fig. 1. The format of sequence names is as follows:

ITS_all_algn.fasta

ITS-rRNA multi-alignment of the 96 single cells

Multi-alignment of the ITS of the 96 single cells (as well as of 5 HLII strains) used to generate the tree in Fig 2A.

ITS_96_algn.fasta

Mapping between ITS and MDA names (96 single cells)

ITS names and MDA names of the 96 single cells.

ITS_to_MDA_96cells.xlsx

C1 composite genome

Used as a reference genome for the reference-guided assembly. Was built from large overlapping contigs of single-cells within the cN2-C1 clade.

C1_composite_genome.gbk

Whole genome alignment of the 96 single cell partial genomes

Multi-alignment of the partial genomes of the 96 single cells used to generate the tree in Fig 2B.

WG_96_algn.fasta

Whole genome alignment of 8 clonal E. coli partial genomes

Multi-alignment of the partial genomes of the 8 clonal E. coli single cells used as a control to estimate the error rate involved in single cell genomics.

Ecoli_WG_8_algn.fasta

Classification of genes into Clusters of Orthologous Genes (COGs).

COGS.fasta