Data from: Ribosomal DNA sequence heterogeneity reflects intra-species phylogenies and predicts genome structure in two contrasting yeast species

West, Claire, Norwich Research Park

James, Stephen A., Norwich Research Park

Davey, Robert P., Norwich Research Park

Dicks, Jo, Norwich Research Park

Roberts, Ian N., Norwich Research Park

Published Mar 10, 2014 on Dryad. https://doi.org/10.5061/dryad.0674n

Cite this dataset

West, Claire et al. (2014). Data from: Ribosomal DNA sequence heterogeneity reflects intra-species phylogenies and predicts genome structure in two contrasting yeast species [Dataset]. Dryad. https://doi.org/10.5061/dryad.0674n

Abstract

The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the ITS region, frequently used in plant phylogenetics, is now recognised as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analysed datasets from the Saccharomyces Genome Resequencing Project, characterising rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterisation of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intra-species phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridisation events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of closely related organisms, and discuss how it could be extended to future studies of multi-locus rDNA systems.

Usage notes

Appendix 1 - S. paradoxus Variation Table

pSNP and SNP frequencies for SGRP sequence reads of 26 S. paradoxus strains and the S288c S. cerevisiae strain compared with the rDNA consensus sequence of the CBS432 S. paradoxus type strain.

West_rDNA_Appendix_1.xlsx

Appendix 2 - S. cerevisiae Variation Table

pSNP and SNP frequencies for SGRP sequence reads of 34 S. cerevisiae strains and the Q32.3 S. paradoxus strain compared with the rDNA consensus sequence of the S288c S. cerevisiae type strain.

West_rDNA_Appendix_2.xlsx

Appendix 3 - Phylogenetic Networks of S. paradoxus and S. cerevisiae Strains

Both a) and b) show an enlargement of the main population structure in the network, with the small boxed inset showing the whole network including the outgroup. a) The S. paradoxus network shows a clear separation of each geographic population. b) The S. cerevisiae network shows a more complex network structure, consistent with our knowledge of this population.

West_rDNA_Appendix_3.pdf

Appendix 4 - Bar Charts of the pSNP Percentage Occupancy in S. cerevisiae by Population Type

a) Bar chart of the S. cerevisiae structured strains, with number of pSNPs against the pSNP occupancy. The boxed section highlights pSNPs with occupancies greater than 10% and less than 90%. The Malaysian, North American and West African strains have very few pSNPs within this boxed area, and these are denoted as structured clean strains. Those strains with a number of pSNPs within this boxed area show a degree of mosaicism, and we classify these strains as being structured mosaic strains. b) Bar chart of S. cerevisiae mosaic strains, where there are a large number of pSNPs within the 10% to 90% occupancy range.

West_rDNA_Appendix_4.pdf

S. paradoxus CE distance matrix

Cavalli-Sforza and Edwards rDNA-based distance matrix for 26 S. paradoxus strains plus S. cerevisiae strain S288c

S_paradoxus_CE_Dist.nex

S. cerevisiae CE distance matrix

Cavalli-Sforza and Edwards rDNA-based distance matrix for 34 S. cerevisiae strains plus S. paradoxus strain Q32.3

S_cerevisiae_CE_Dist.nex

S. paradoxus NJ tree

Neighbor-Joining phylogenetic tree derived from the S. paradoxus CE distance matrix

S_paradoxus_tree.nex

S. cerevisiae NJ tree

Neighbor-Joining phylogenetic tree derived from the S. cerevisiae CE distance matrix

S_cerevisiae_tree.nex

Perl script for coverage/copy number estimation

Perl script to calculate the rDNA unit coverage from a sequence read dataset and to estimate the number of rDNA units (copy number) in an rDNA tandem array.

coverage_v2.pl

Updated Appendix 3 - Phylogenetic Networks of S. paradoxus and S. cerevisiae Strains

Both a) and b) show an enlargement of the main population structure in the network, with the small grey inset showing the whole network including the outgroup. a) The S. paradoxus network shows a clear separation of each geographic population. b) The S. cerevisiae network shows a more complex network structure, consistent with our knowledge of this population.

West_rDNA_Appendix_3.svg

S. paradoxus CE distance matrix

Cavalli-Sforza and Edwards rDNA-based distance matrix for 26 S. paradoxus strains plus S. cerevisiae strain S288c

S_paradoxus_CE_dist.nex

Updated S cerevisiae CE distance matrix

Cavalli-Sforza and Edwards rDNA-based distance matrix for 34 S. cerevisiae strains plus S. paradoxus strain Q32.3

S_cerevisiae_CE_dist.nex

Updated S. paradoxus NJ tree

Neighbor-Joining phylogenetic tree derived from the S. paradoxus CE distance matrix

S_paradoxus_tree.nex

Updated S. cerevisiae NJ tree

Neighbor-Joining phylogenetic tree derived from the S. cerevisiae CE distance matrix

S_cerevisiae_tree.nex