The molecular and evolutionary processes underlying fungal domestication remain largely unknown despite the importance of fungi to bioindustry and for comparative adaptation genomics in eukaryotes. Wine fermentation and biological aging are performed by strains of S. cerevisiae with, respectively, pelagic fermentative growth on glucose, and biofilm aerobic growth utilizing ethanol. Here, we use environmental samples of wine and flor yeasts to investigate the genomic basis of yeast adaptation to contrasted anthropogenic environments. Phylogenetic inference and population structure analysis based on single nucleotide polymorphisms (SNPs) revealed a group of flor yeasts separated from wine yeasts. A combination of methods revealed several highly differentiated regions between wine and flor yeasts, and analyses using codon-substitution models for detecting molecular adaptation identified sites under positive selection in the high affinity transporter gene ZRT1. The Cross Population Composite Likelihood Ratio (XP-CLR) revealed selective sweeps at three regions, including in the hexose transporter gene HXT7, the yapsin gene YPS6 and the membrane protein coding gene MTS27. Our analyses also revealed that the biological aging environment has led to the accumulation of numerous mutations in proteins from several networks, including Flo11 regulation and divalent metal transport. Together, our findings suggest that the tuning of FLO11 expression and zinc transport networks are a distinctive feature of the genetic changes underlying the domestication of flor yeasts. Our study highlights the multiplicity of genomic changes underlying yeast adaptation to man-made habitats, and reveals that flor/wine yeast lineage can serve as a useful model for studying the genomics of adaptive divergence.
VinVoile.VQSR.HighQualitybiall.vcf
Compressed (tgz) Vcf file containing a subset of high quality biallelic variant filtered out from the genotyping output file VQSR.vcf produced by GATK, when filtering for only variants corresponding to the best genotyping scored (Indicated with PASS). Population analysis measures (Dxy and Da) are performed on a second subset of this file after filtering genotypes according to allelic depth and genotyping quality with vcftools, with parameters: --minGQ 30 --minDP 10 --maf 0.001 . This filtered data set is used for figure 2C, and Figure S7.
Evaluation of variant impact on protein functions according to SIFT(4G)_ High quality biallelic variant set
Compressed file containing the impact of each variant on each annotated transcript, inferred with SIFT4G, for the file including all variant (VinVoile.VQSR.HighQualitybiall.vcf).
only_annotated_VinVoile.VQSR.HighQuality_SIFT_multiTranscripts.tsv.tgz
Tableau-Genes retained by each tests-dyad
This table contains all genes pointed by one list : PCA differentiating flor yeast from other strains (Allyeast-set), Dxy (VinVoile.VQSR.HighQualitybiall.vcf filtered for min genotype quality of 30 and sequence depth of 10), Da (same as Dxy), XPCLR (for Jura and Spanish populations), 20% Best potentially damaging SNPs (according to SIFT) differentiating wine and flor PCA, codeml value. This table contains data for table S4 and for Figure 3.
DTdadi
Demographic inferences (Figure S5a and b): SNP data set for wine and Jura/Hungary flor strains, and wine and Spain/Italy flor strains at dadi format for demographic inference. Obtained with script Vcf-to-dadi.pl
Vcf-to-dadi perl script
Demographic inferences (Figure S5a and b): Perl script that converts a vcf file including 2 or 3 pop with Ancestral Allele field (not necessary if folded) to a file for diffusion approximation demographic inference (dadi, Gutenkunst 2015, PLoS Genetics). Requires Bioperl, reference sequence fasta file, and variant file at VCF format (only with strains required for the analysis)
Vcf-to-dadi.pl
TranslocationsCompilation_outputs_delly
Compilation of delly V 0.1.1 outputs for each strain. This file contrains only events that occured more than 40 times.
XPCLR selection test: Script prepares ms data and runs XPLCR
mstoXPCLR.zip
Data XPCLR of Wine and Spain Italy Flor strain
XPCLR selection test: Data for XPCLR of Wine and Spain and Italy Flor strain recombination rate 403 M. Window= 0.1 cM, window 1kb
DTXPCLR-Wine-Spain-italyFlor.zip
Data XPCLR of Wine and France and Hungary Flor strain
XPCLR selection test: Data for XPCLR of Wine and French (Jura) and Hungarian Flor strain recombination rate 403 M. Window= 0.1 cM, window 1kb
DTXPCLR-Wine_Jura-HungaryFlor.zip
Combined set of SNPS including wine and flor strains and strains from other origins(Allyeast-set)
Data for Phylogeny and population structure (Figure 1, Figure2A and B). Compressed Vcf file containing a set of biallelic variant with less than 20 % missing position among all strains, obtained from the genotyping of 20 flor and wine strains compared to 82 strains with available genome sequences (obtained from Saccharomyces genome database (http://www.yeastgenome.org/), of from the SGRP project (http://www.sanger.ac.uk/research/projects/genomeinformatics/sgrp.html) (Liti et al. 2009, Population genomics of domestic and wild yeasts. Nature, 458, 337–41)
AssembliesGenowine20%maxmiss.tar.gz
Data for differentiation based method PCA, Dxy, Da
- plink format files AssembliesGenowine.raw and .map for PCA performed with adegenet - Vcf phased data files for each chromosome of flor and wine population - R script utilising on PopGenome performing all analysis of Theta, pi, Dxy (Da = Dxy – (Piwine+piflor)/2 ) on each chromosome datafile).
Differentiation-methods.zip
DT FineStructure Flor Wine Champagne
These are the data files used for the FineStructure analysis for the hybrid nature of strains of the Champagne group (Figure S1)
DT-FineStructure-Flor-Champagne.tgz
Data for Linkage desequilibrium analysis (figureS3)
ped and map files for Flor and wine populations. Files are prepared from phased vcf file . Data used with plink with options --ld-window 1000
--ld-window-kb 50
--ld-window-r2 0
--r2
Data_LD.tgz
Data for Estimation of recombination rates (Figure S4)
This data has been used for the genome scan of recombination rate (Figure S4) using a model with hotspot as inplemented in the "INTERVAL" module of LDHAT package. A global estimate of recombination rate has also been estimated for flor strain using PAIRWISE with the module "PAIRWISE" of the same package.
Wine-Flor-DT-LDhat.tgz
Data of figure4: Biofilm images used for the evaluation of the impact of each wine and flor allele on biofilm formation
The compressed archive contains the photos and the notice indicating for image its organization.
Biofilm_images-figure4.tgz
de-novo draft assemblies
This is a gzip/tar compressed data archive containing the draft contig de-novo assemblies generated in this study. This data is used for phylogenic trees and genes sequence alignments (Figures S2, S6, S8).
draft-assemblies.tgz