The domestication of the wine yeast Saccharomyces cerevisiae is thought to be contemporary with the development and expansion of viticulture along the Mediterranean basin. Until now, the unavailability of wild lineages prevented the identification of the closest wild relatives of wine yeasts. Here, we enlarge the collection of natural lineages and employ whole-genome data of oak-associated wild isolates to study a balanced number of anthropic and natural S. cerevisiae strains. We identified industrial variants and new geographically delimited populations, including a novel Mediterranean oak population. This population is the closest relative of the wine lineage as shown by a weak population structure and further supported by genomewide population analyses. A coalescent model considering partial isolation with asymmetrical migration, mostly from the wild group into the Wine group, and population growth, was found to be best supported by the data. Importantly, divergence time estimates between the two populations agree with historical evidence for winemaking. We show that three horizontally transmitted regions, previously described to contain genes relevant to wine fermentation, are present in the Wine group but not in the Mediterranean oak group. This represents a major discontinuity between the two populations and is likely to denote a domestication fingerprint in wine yeasts. Taken together, these results indicate that Mediterranean oaks harbour the wild genetic stock of domesticated wine yeasts.
S. cerevisiae whole-genome SNP alignment
This is a gzip/tar compressed data archive containing the concatenated SNP alignment of S. cerevisiae.
SNP alignments for each chromosome are also provided, together with the corresponding list of SNP positions.
Scer_SNP_alignment.tgz
S. cerevisiae (large dataset) whole-genome SNP alignment with outgroup S. paradoxus, related to Figure S3
This is a gzip/tar compressed data archive containing the concatenated SNP alignment of the large dataset with S. paradoxus as outgroup, related to Figure S3.
SNP alignments for each chromosome are also provided, together with the corresponding list of SNP positions.
SNP-alignment-largeDataset+Spar.tgz
S. cerevisiae (restricted dataset) whole-genome SNP alignment with outgroup S. paradoxus, related to Figure 3
This is a gzip/tar compressed data archive containing the concatenated SNP alignment of the restricted dataset with S. paradoxus as outgroup, related to Figure 3.
SNP alignments for each chromosome are also provided, together with the corresponding list of SNP positions.
SNP-alignment-restrictedDataset+Spar.tgz
Structure input file used in Figure 2A
This is a gzip/tar compressed data archive containing the input file used to run Structure v2.3.4 (Pritchard et al. 2000), related to figure 2A.
A list file with the positions of the parsimony informative sites used is also provided.
Structure-input.tgz
FineStructure input files used in Figure 2B
This is a gzip/tar compressed data archive containing the idfile, phase and recombination input files used to run FineSTRUCTURE Version 2.0.2 (Lawson et al. 2012), related to Figure 2B.
fineStructure-input.tgz
dadi input file used to infer demographic history of Wine and MO populations
The joint allele frequency spectrum of Wine and Mediterranean oak (MO) populations used as input to run the demographic inference in dadi (Gutenkunst et al. 2009).
Populations are defined as in the comment lines at the beginning of file.
allChr_scer_subset_nonCDS_wine-randMO.dadi
Raw results from VariScan, related to Table 3
This is a gzip/tar compressed data archive containing the results of Variscan v2.0 (Hutter et al. 2006) for the whole-genome alignments considered, related to Table 3.
variscan-results.tgz
Raw results from compute (libsequence analysis package), related to Table S2
This is a gzip/tar compressed data archive containing the results of compute from the libsequence analysis package (http://molpopgen.org/) (Thornton 2003), related to Table S2.
The archive contains results for coding sequences (CDS) and non-CDS in the Mediterranean oak population (MO) and the wine population (wine), with one file per locus.
The SGD systematic name is used to identify CDS and the nucleotide positions are used to identify non-CDS.
libsequence-analysis-compute-results.tgz
Raw results from polydNdS (libsequence analysis package), related to Table S2
This is a gzip/tar compressed data archive containing the results of polydNdS from the libsequence analysis package (http://molpopgen.org/) (Thornton 2003), related to Table S2.
The archive contains results for coding sequences (CDS) in the Mediterranean oak population (MO) and the wine population (wine), with one file per locus.
The SGD systematic name is used to identify CDS.
libsequence-analysis-polyDnDs-results.tgz
de-novo draft assemblies
This is a gzip/tar compressed data archive containing the draft contig de-novo assemblies generated in this study.
deNovo-draft-assemblies.tgz
Whole-genome SNP phylogeny (restricted dataset), related to Figure 3
RAxML phylogenetic tree obtained from the whole-genome SNP alignment of the restricted dataset and with S. paradoxus as outgroup, related to Figure 3.
raxml-tree-snp-restricted-dataset.nwk
Whole-genome SNP phylogeny (restricted dataset), related to Figure S3
RAxML phylogenetic tree in newick format, obtained from the whole-genome SNP alignment of the restricted dataset and with S. paradoxus as outgroup, related to Figure 3.
raxml-tree-snp-large-dataset.nwk
Multilocus tree including the Chinese strains, related to Figure S4
Neighbor-Joining tree in newick format, inferred from a concatenated alignment of 13 loci and including Chinese strains, related to Figure S4
multilocus-tree-with-Chinese-isolates.nwk
Raw data of microsatelite genotypes used in figure S1
This table contains 3 columns indicating the strain name, the locus name and the allele values (size bins). When only one allele was found the value has been doubled as the stain is assessed to be diploid. For some strains additional values can be observed (ie Bread strains). For such cases, additional lines are added to described all values found at this loci.
DTfinaleOctobre2010forArchive.xls