Data for: Raw count data, transcribed variant count data, and reference genomic annotation files for Boocock et al. 2024
Data files
May 21, 2024 version files 4.74 GB
Abstract
Expression quantitative trait loci (eQTLs) provide a key bridge between noncoding DNA sequence variants and organismal traits. The effects of eQTLs can differ among tissues, cell types, and cellular states, but these differences are obscured by gene expression measurements in bulk populations. We developed a one-pot approach to map eQTLs in Saccharomyces cerevisiae by single-cell RNA sequencing (scRNA-seq) and applied it to over 100,000 single cells from three crosses. We used scRNA-seq data to genotype each cell, measure gene expression, and classify the cells by cell-cycle stage. We mapped thousands of local and distant eQTLs and identified interactions between eQTL effects and cell-cycle stages. We took advantage of single-cell expression information to identify hundreds of genes with allele-specific effects on expression noise. We used cell-cycle stage classification to map 20 loci that influence cell-cycle progression. One of these loci influenced the expression of genes involved in the mating response. We showed that the effects of this locus arise from a common variant (W82R) in the gene GPA1, which encodes a signaling protein that negatively regulates the mating pathway. The 82R allele increases mating efficiency at the cost of slower cell-cycle progression and is associated with a higher rate of outcrossing in nature. Our results provide a more granular picture of the effects of genetic variants on gene expression and downstream traits.
README
Description of the data and file structure
Includes cell-cycle assignments in cell_cycle_feb02152022.tsv
. Each row contains the cell-cycle assignment for cells analyzed in the experiment. Columns include the data set, cell barcode, cell-cycle assignment, and Seurat-based cluster assignments prior to manual cell-cycle assignment.
Additionally, output data structures from Cell Ranger and Vartrix are included in processed.tar.gz
. When expanded, processed/.*/filtered_feature_bc_matrix/ contains for each single-cell experiment (experiments are indicated as the folder names in . * and are described in provided R code referenced below at github):
-
barcodes.tsv
are cell barcodes used in analysis, each line indicates a cell barcode that was used for downstream analysis -
features.tsv.gz
are gene features used in analysis, each line indicates a transcript (systematic gene name and common gene name are provided) -
matrix.mtx.gz
are the UMI counts per transcript in sparse matrix format as output by Cell Ranger
Includes output from Vartrix, quantifying allele-specific expression counts in processed/.*/ as alt_counts.mtx
, ref_counts.mtx
and out_var.txt
. Each of the mtx files are the UMI counts for reference or alternate alleles per variant in sparse matrix format as output by Vartrix. out_var.txt
contains chromosome name and position for each variant.
reference.tar.gz
includes gene and genome annotation information when expanded to reference/ :
-
cross.list.RData
R structure of genetic maps from Bloom et al. 2019 -
parents.list.RData
RData R structure of expected parental variants from Bloom et al. 2019 -
genes.gtf
gtf file of gene annotations -
sacCer3.fasta
reference fasta file -
saccharomyces_cerevisiae.gff
reference genome gff
Code/Software
Analysis code can be found at https://github.com/joshsbloom/single_cell_eQTL/tree/master/yeast/code