Skip to main content

Barcoded Bulk QTL mapping reveals highly polygenic and epistatic architecture of complex traits in yeast

Cite this dataset

Nguyen Ba, Alex N. et al. (2022). Barcoded Bulk QTL mapping reveals highly polygenic and epistatic architecture of complex traits in yeast [Dataset]. Dryad.


Mapping the genetic basis of complex traits is critical to uncovering the biological mechanisms that underlie disease and other phenotypes. Genome-wide association studies (GWAS) in humans and quantitative trait locus (QTL) mapping in model organisms can now explain much of the observed heritability in many traits, allowing us to predict phenotype from genotype. However, constraints on power due to statistical confounders in large GWAS and smaller sample sizes in QTL studies still limit our ability to resolve numerous small-effect variants, map them to causal genes, identify pleiotropic effects across multiple traits, and infer non-additive interactions between loci (epistasis). Here, we introduce barcoded bulk quantitative trait locus (BB-QTL) mapping, which allows us to construct, genotype, and phenotype 100,000 offspring of a budding yeast cross, two orders of magnitude larger than the previous state of the art. We use this panel to map the genetic basis of eighteen complex traits, finding that the genetic architecture of these traits involves hundreds of small-effect loci densely spaced throughout the genome, many with widespread pleiotropic effects across multiple traits. Epistasis plays a central role, with thousands of interactions that provide insight into genetic networks. By dramatically increasing sample size, BB-QTL mapping demonstrates the potential of natural variants in high-powered QTL studies to reveal the highly polygenic, pleiotropic, and epistatic architecture of complex traits.


Dataset accompanying the research article "Barcoded Bulk QTL mapping reveals highly polygenic and epistatic architecture of complex traits in yeast" by Alex N. Nguyen Ba, Katherine R. Lawrence, Artur Rego-Costa, Shreyas Gopalakrishnan, Daniel Temko, Franziska Michor, and Michael M. Desai (2021).

The dataset contains genotype and phenotype information for a panel of approximately  100,000 F1 offspring from a cross between strains BY and RM of Saccharomyces cerevisiae. The genotype is encoded at the approximately 42,000 single-nucleotide polymorphisms between the two parental strains as an inferred posterior probability of that strain having the RM allele at that locus. Phenotypic values comprise inferred relative fitness in liquid fed-batch culture condition under eighteen different growth media. This dataset was created from raw sequencing data stored in the NCBI SRA repository.

Usage notes

> Dataset description
|- SNP_list.txt
|    Table with index, genome position, and sequence information of the approximately 42,000
|    single-nucleotide polymorphisms between the two parental strains. The genome reference sequence
|    on which this dataset is based is S288C_reference_genome_R64-3-1_20210421 by the Saccharomyces
|    Genome Database Project. The RM alleles were inferred from sequencing of the parental RM strain.
|- segregant_info.txt
|    Unique indeces, library location (Batch, Set, Plate, Well) and barcode sequence of each of the
|    approximately 100,000 F1 strains in our panel.
|- geno_data_*.txt.gz
|    Inferred posterior probability of each strain in our panel having the RM allele at each of the
|    approximately 42,000 single-nucleotide polymorphisms considered. Each row contains the genotype
|    of one strain. The first column contains the identifying number of that strain, as in the 'Number'
|    column of segregant_info.txt. Each file contains the genotype of 20,000 strains.
|- pheno_data_*.txt.gz
|    Each file contain the inferred fitness with its associated standard error for each strain in our panel
|    in a given environment, identified in the file name. Missing data from strains for which we could
|    not infer fitness is coded as 'nan'.

> Additional information

See readme.txt for general dataset information.

See manuscript for more information on data generation, processing, and access.



Natural Sciences and Engineering Research Council

Fannie & John Hertz Foundation Graduate Fellowship Award

National Science Foundation

NSF-Simons Center for Mathematical and Statistical Analysis of Biology, Award: #1764269

National Science Foundation, Award: PHY-1914916

National Institute of General Medical Sciences, Award: GM104239