Data from: Population genomic datasets describing the post-vaccine evolutionary epidemiology of Streptococcus pneumoniae

Croucher NJ, Finkelstein JA, Pelton SI, Parkhill J, Bentley SD, Hanage WP, Lipsitch M

Date Published: October 26, 2015

DOI: http://dx.doi.org/10.5061/dryad.t55gq

 

Files in this package

Content in the Dryad Digital Repository is offered "as is." By downloading files, you agree to the Dryad Terms of Service. To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this data. CC0 (opens a new window) Open Data (opens a new window)

Title Maximum likelihood phylogeny based on the core genome alignment of 616 Streptococcus pneumoniae isolates
Downloaded 8 times
Description Newick format maximum likelihood phylogeny generated using the 106,196 polymorphic sites in a core genome alignment of 616 Streptococcus pneumoniae isolates. The tree was produced by RAxML using the general time reversible substitution model with a four category gamma distribution to correct for rate heterogeneity.
Download SPARC.core_genes.tree (33.94 Kb)
Details View File Details
Title Core genome codon alignment of 616 Streptococcus pneumoniae isolates
Downloaded 14 times
Description FASTA format 1.14 Mb codon alignment generated through concatenation of individual alignments of the 1,194 coding sequences found to be present in a single copy in each of 616 Streptococcus pneumoniae isolates sampled from Massachusetts between 2001 and 2007. File is compressed using tar and gzip.
Download SPARC.core_genes.aln.tar.gz (215.4 Mb)
Details View File Details
Title Whole genome alignments of 15 sequence clusters of similar isolates
Downloaded 8 times
Description Fifteen FASTA format whole genome alignments, each corresponding to one of the monophyletic sequence clusters identified through population clustering and phylogenetics as described in Croucher et al (2013) Nat. Genet. 45:656-663. Alignments were generated through mapping of paired Illumina reads to a reference sequence, itself omitted from the alignment, using SMALT. Files are compressed as a single archive using tar and gzip.
Download SPARC.sequenceClusters.aln.tar.gz (307.3 Mb)
Details View File Details
Title Software and reference sequence for inferring serotype from Illumina sequence data
Downloaded 24 times
Description This software is a simple script that uses BWA mapping to identify the likely serogroup of a pneumococcal isolate based on paired end FASTQ data.
Download README.readme (2.984 Kb)
Download pneumococcalSerotyper.tar.gz (408.5 Kb)
Details View File Details
Title Predicted protein coding sequences from 616 S. pneumoniae isolates
Downloaded 8 times
Description This compressed archive comprises a FASTA file containing the DNA sequences of all predicted protein coding sequences from 616 S. pneumoniae isolates collected from Massachusetts between 2001 and 2007. Each sequence is labelled with a unique identifier (of the form, “ERSX_Y”, where “ERSX” is the sample accession code in the European Nucleotide Archive and Y is an incrementing index) and, where applicable, the COG of the translated protein (of the form, “SPARC1_CLSZ” or “SPARC1_CLSTZ”, where Z is a number).
Download SPARC_CDS_dna_sequences.fasta.tar.gz (345.7 Mb)
Details View File Details
Title Predicted protein sequences from 616 S. pneumoniae isolates
Downloaded 4 times
Description This compressed archive comprises a FASTA file containing the amino acid sequences translated from all predicted protein coding sequences from 616 S. pneumoniae isolates collected from Massachusetts between 2001 and 2007. Each sequence is labelled with a unique identifier (of the form, “ERSX_Y”, where “ERSX” is the sample accession code in the European Nucleotide Archive and Y is an incrementing index) and, where applicable, the COG to which the protein belongs (of the form, “SPARC1_CLSZ” or “SPARC1_CLSTZ”, where Z is a number).
Download SPARC_CDS_protein_sequences.fasta.tar.gz (226.1 Mb)
Details View File Details
Title Draft reference genome sequences for each of the 15 sequence clusters
Downloaded 3 times
Description This compressed archive contains 15 FASTA draft de novo assemblies used to generate the whole genome alignments within each sequence cluster. The files are named according to the sequence cluster and taxon identifier of the isolate to which the contigs relate.
Download SPARC_reference_sequences.tar.gz (9.633 Mb)
Details View File Details

When using this data, please cite the original publication:

Croucher NJ, Finkelstein JA, Pelton SI, Parkhill J, Bentley SD, Lipsitch M, Hanage WP (2015) Data from: Population genomic datasets describing the post-vaccine evolutionary epidemiology of Streptococcus pneumoniae. Scientific Data 2(150058). http://dx.doi.org/10.1038/sdata.2015.58

Additionally, please cite the Dryad data package:

Croucher NJ, Finkelstein JA, Pelton SI, Parkhill J, Bentley SD, Hanage WP, Lipsitch M (2015) Data from: Population genomic datasets describing the post-vaccine evolutionary epidemiology of Streptococcus pneumoniae. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.t55gq
Cite | Share
Download the data package citation in the following formats:
   RIS (compatible with EndNote, Reference Manager, ProCite, RefWorks)
   BibTex (compatible with BibDesk, LaTeX)

Search for data

Be part of Dryad

We encourage organizations to: