Extensive genomic diversity within coexisting members of a microbial species has been revealed through selected cultured isolates and metagenomic assemblies. Yet, the cell-by-cell genomic composition of wild uncultured populations of co-occurring cells is largely unknown. In this work, we applied large-scale single-cell genomics to study populations of the globally abundant marine cyanobacterium Prochlorococcus. We show that they are composed of hundreds of subpopulations with distinct “genomic backbones,” each backbone consisting of a different set of core gene alleles linked to a small distinctive set of flexible genes. These subpopulations are estimated to have diverged at least a few million years ago, suggesting ancient, stable niche partitioning. Such a large set of coexisting subpopulations may be a general feature of free-living bacterial species with huge populations in highly mixed habitats.
Mapping between ITS and MDA names
This files maps between the ITS sequence name and the MDA sequence name.
ITS sequence names are described by their well in the ITS sanger sequencing plates (96-well plates)
MDA sequence names are described by their well in the MDA reactions plates (384-well plates)
ITS sequence name format is as follows:
for example:
>B245a_520_F02_p18
means BATS cruise 245a, MDA plate 520, ITS plate 18 well F02.
To find the relavant MDA well use the mapping file ITS_to_MDA_mapping_final.xls
In this example the MDA well is thus plate 520 well O2
ITS_to_MDA_mapping_final.xlsx
ITS-rRNA multi-alignment of all single cells
Alignment of all ITS sequences used to build the trees and heatmaps in Fig. 1.
The format of sequence names is as follows:
ITS_all_algn.fasta
ITS-rRNA multi-alignment of the 96 single cells
Multi-alignment of the ITS of the 96 single cells (as well as of 5 HLII strains) used to generate the tree in Fig 2A.
ITS_96_algn.fasta
Mapping between ITS and MDA names (96 single cells)
ITS names and MDA names of the 96 single cells.
ITS_to_MDA_96cells.xlsx
C1 composite genome
Used as a reference genome for the reference-guided assembly. Was built from large overlapping contigs of single-cells within the cN2-C1 clade.
C1_composite_genome.gbk
Whole genome alignment of the 96 single cell partial genomes
Multi-alignment of the partial genomes of the 96 single cells used to generate the tree in Fig 2B.
WG_96_algn.fasta
Whole genome alignment of 8 clonal E. coli partial genomes
Multi-alignment of the partial genomes of the 8 clonal E. coli single cells used as a control to estimate the error rate involved in single cell genomics.
Ecoli_WG_8_algn.fasta
Classification of genes into Clusters of Orthologous Genes (COGs).
Classification of genes into Clusters of Orthologous Genes (COGs).
Each entry is a gene (or partial gene sequence) in one of the 96 single cells denovo assembled genomes or in a genome of a cultured strains.
The number in the beginnig of each header is the COG ID.
The fatsa headers for genes in a single cell genome has the following format:
>COG ID|MDA name|Detailed name|contig__|Start bp|End bp|+/- Strand|Method|Description
The fatsa headers for genes in a complete genome of a cultured strain has the following format:
>COG ID|Short name|Detailed name|Start bp|End bp|+/- Strand|Method|
COGS.fasta