Data from: An adaptive biomolecular condensation response is conserved across environmentally divergent species
Data files
Aug 22, 2023 version files 231.13 MB
-
keyportkik2023-conservation-condensation-dryad-package-20230817.zip
231.11 MB
-
README.md
13.69 KB
Jan 11, 2024 version files 241.83 MB
Abstract
Cells must sense and respond to sudden maladaptive environmental changes—stresses—to survive and thrive. Across eukaryotes, stresses such as heat shock trigger conserved responses: growth arrest, a specific transcriptional response, and biomolecular condensation of protein and mRNA into structures known as stress granules under severe stress. The composition, formation mechanism, adaptive significance, and even evolutionary conservation of these condensed structures remain enigmatic. Here we provide an unprecedented view into stress-triggered condensation, its evolutionary conservation and tuning, and its integration into other well-studied aspects of the stress response. Using three morphologically near-identical budding yeast species adapted to different thermal environments and diverged by up to 100 million years, we show that proteome-scale biomolecular condensation is tuned to species-specific thermal niches, closely tracking corresponding growth and transcriptional responses. In each species, poly(A)-binding protein—a core marker of stress granules—condenses in isolation at species-specific temperatures, with conserved molecular features and conformational changes modulating condensation. From the ecological to the molecular scale, our results reveal previously unappreciated levels of evolutionary selection in the eukaryotic stress response, while establishing a rich, tractable system for further inquiry.
README
This README file is for the data published in "An adaptive biomolecular condensation response is conserved across environmentally divergent species" by Keyport Kik et al. (2023) (preprint: https://www.biorxiv.org/content/10.1101/2023.07.28.551061v1). These folders also contain custom R (version 4.2.2) scripts for data processing, analysis, and figure generation, organized into separate experimental folders. Some data have a separate R code for data processing and exporting data. The ordering of the figures may be different from what's shown in the paper. The code used to process raw RNA-seq data can be found on Github (https://github.com/skeyport/conservation-of-condensation-2023. The rest of the data and scripts can be found on Dryad (https://doi.org/10.5061/dryad.w3r2280w6). Below is a description of the contents of each folder. Raw RNA-seq data can be found under GEO accession code GSE234499.
Please note: The most recent version of this package has been updated to include changes recommended by the reviewers, including microscopy data and further analysis for the HDX-MS section.
supp-files/
Within this folder are three supplemtanal files:
-
suppfile1/
(master-orthologs.txt
) - contains ortholog calls for all three species, including gene and ORF name from S. cerevisiae -
suppfile2/
(230215_labeled_genes_scer_all.tsv
) - were derived from the following sources. The targets of HSF1 and Msn2/4 were curated from Pincus et al. 2018 and Solís et al. 2016. The genes for core ribosomal proteins, ribosome biogenesis factors, and glycolytic enzymes (superpathway of glucose fermentation) as well as transcription factor regulation assignments were derived from the Saccharomyces Genome Database (Cherry et al. 2012; Engel et al. 2014, https://www.yeastgenome.org/). Genes for translation factors were derived from the KEGG BRITE database (Kanehisa et al. 2016).\ *suppfile3/
(20-04-21_sgd-all-regulators2.tsv
) - Transcription factor regulators were assigned according to (Triandafillou et al. 2020).
fig-svgs/
Final Figures 1-5 as well as Supplemental Figures 1-5 (in *.svg
or *.png
format) can be found in this folder. Table 1 and Table 2 can also be found here as *.txt
files.
dryad-upload/
Eight folders contain raw and processed data as well as R or Python scripts (open source versions) to produce the figures in the paper. All required packages and dependencies are listed at the top of each script. Both Windows v11 and MacOS can run all scripts. After downloading the .zip
file, data are organized to be able to run efficiently within the structured directories in which they are found. Expected (actual) outputs are included for each script. To run all of the scripts and produce the figures should require less than a day's time. All input files are contained in this folder to test the code. Instructions for use are below. If there is a specific order to run the code, it is outlined below (e.g., condensation-ms/
); in all other cases, scripts can be run in any order.
condensation-ms/ - mass spectrometry was performed at different control and treatment temperatures for each species. Outputs from Scaffold DIA 3.3.1 analysis perforfmed by MS Bioworks are in the
data/
folder (MSB-9658A U. Chicago Keyport 042622.txt
- text file of S. kudriavzevii processed data from MS Bioworks,MSB-9658B U. Chicago Keyport 042722.txt
- text file of S. cerevisiae processed data from MS Bioworks,MSB-10835 U. Chicago Keyport 022223.txt
- text file of K. marxianus processed data from MS Bioworks). Other sample name and proteome information called by the scripts are contained indata/
, includingconditions.txt
(contains sample conditions),sample_names.txt
(experimental names and assoiciated data),kmarx-tsp-by-condition.txt
(tidy version of raw intensity data for K. marxianus). All analyses are performed in RStudio (R version 4.2.2). Analysis scriptprocess-raw-dia.Rmd
processes raw data for S. cerevisiae and S. kudriavzevii and uses raw MS data,sample_names.txt
,uniprot-gene-orf.txt
(accession and gene information from Uniprot), andskud_proteome_ygob.fasta
(*.fasta
file for S. kudriavzevii from YGOB). This script producesresults/processed_data.tsv
, which is an input for mixing ratio calculation incalculate-mixing-dia.Rmd
along withdata/kmarx-tsp-by-condition.txt
anddata/scer-kmarx-skud-orthologs.txt
(curated ortholog list for each species); this script outputs two files (results/230501_mixing_ratios.tsv
andresults/psups-three-species.txt
).psup-three-species.txt
was transformed intopsup_wide_data.txt
which is then used inconservation-of-condensation.Rmd
, in combination withdata/scer-anno-proteins.txt
(annotations of protein processes as described in the Methods section, "Gene Annotation") and230501_mixing_ratios.tsv
. This script,conservation-of-condensation.Rmd
, ultimately produces the figures (found infigures/
). Final figures can be found in thefig-svg/
directory. A subset of custom functions called in scripts are defined inutilityFunctions.R
.dls/ - raw measurements from DLS temperature ramp experiments are contained in directories with dates as names as
*.csv
files, and associated sample information is also contained as*-samples.csv
. All analyses are performed in RStudio (R version 4.2.2). Analysis scriptanalyze-dls.Rmd
reads in sample information and raw data to produce Figures 4a-d (output in infigs/
folder). Raw data from Riback & Katanski et al. 2017 can be found inriback-2017/
directory (WT Pab1 inPab1_15uM_DLSbuffpH6p4.csv
, MV to A Pab1 inPab1_MVtoA_15uM_pH6p4_4_14_15_1.csv
, and MV to I Pab1 inPab1_MVtoI_15uM_pH6p4.csv
), and is also used inanalyze-dls.Rmd
. Growth data to produce Figure 4b are generated from scripts ingrowth-curves/
and output into this directory, includingconf-int-topt.txt
(confidence intervals from esptimation of optimal temperature in Figure 1b),hsr-growth-max.txt
(maximum estimated growth temperature and heat shock temperature for each species), andstop-growing.txt
(estimations the temperatures at which each species stops growing). Table S1 (table1.txt
) and Table S2 (table2.txt
) are output from theanalyze-dls.Rmd
script and contain baseline size calculations for each protein (S1) and temperature and doubling size of condensation for each protein (S2). Final figures can be found in thefig-svg/
directory.flow-cytometry/ - raw
*.fcs
files of each species +/- heat shock at various temperatures with 180 min of recovery are found here with condition discriptors in file names and contained directories with dates as names. All analyses are performed in RStudio (R version 4.2.2), and requires custom but publically available R packages. These custom packages can be found on GitHub (https://github.com/ctriandafillou/flownalysis; https://github.com/ctriandafillou/cat.extras). The script20220414-skud-scer-manytemps-endpoint.Rmd
reads in*.fcs
files, processes flow data, and produces Figure S2a (found inoutput/results/
). Another output file,max-fc-hsr.txt
, which contains the fold change and statistics of maximum heat shock response for each species at each temperature, is produced here and also exported togrowth-cruves/data/
which will in part produce Figure 1c-d (also found inoutput/results/
). Final figures can be found in thefig-svg/
directory.growth-curves/ - raw OD600 data for each WT species is found in
20230311/20230311.txt
and was sampled from log-phase growing yeast grown at different temperatures. All analyses are performed in RStudio (R version 4.2.2). Two R scripts are contained in this directory. First,growth-curve-estimation.R
calculates the maximum specific growth rates by estimating the slope of the linear range of growth for each species and temperature. Resulting growth curves are then fit using the cardinal temperature model with inflection (Rosso, Lobry, and Flandrois 1993), and temperature of maximum growth is estimated from these curves and output intodata/dat-max.txt
. Optimal temperature statistics are calculated as confidence interval (results/conf-int-topt.txt
, also output into../dls/
) and standard deviation (results/topt-sd.txt
). This script is also used to estimate the temperature at which each species stops growing, and is output asdata/stop-growing.txt
(also output into../dls/
).growth-curve-estimation.R
generates Figure 1b (found inresults/
). The second script,plot-growth-hsr-correlation.Rmd
takes the output files from the first script as well asdata/max-fc-hsr.txt
produced from../flow-cytometry/20220414-skud-scer-manytemps-endpoint.Rmd
to produce Figures 1c and 1d as well asresults/hsr-growth-max.txt
, which contains temperatures optima for growth and heat shock responses for each species. Final figures can be found in thefig-svg/
directory.hdx/ - HDX-MS data were collected for each species' Pab1 before and after condensation. Raw data for each species as
*_peptideUptake.csv
, as well as manually curated domain boudaries (20221214_pab1DomainBoundaries
) and secondary structure predictions (domain-ss.txt
) from Schäfer et al. 2019 are found in this folder. Python3 scripts (extract-hdx.py
,hdx.py
, andhdx_test.py
) produce %D values for each residue in aligned Pab1 usingaligned-positions.txt
based on the alignment (align-pab1.txt
) that are used to compute means across the sequences. The outputs for Python3 scripts are inoutput/*hdx.txt
. Figure 5a-f generation (each found inoutput/
) and other data processing are done in RStudio (R version 4.2.2) in20230328_HDX_Pab1_SKK.Rmd
. Final figures can be found in thefig-svg/
directory.microscopy/ - Images are provided in
.tif
format and in.jpg
format used for the final figure in the paper (in separate directories,selections_tif/
andselections_jpg/
). Files are labeled by species abbreviation (Sk, Sc, Km) and by temperature (in degrees Celsius). The Km_55 (K. marxianus, 55 degrees) seelction has a 5 micron scale bar added in ImageJ. The colorbar settings used for the.jpg
images are as described in the final figure. There are 184 nm per pixel (5.43 pixels per micron). Final figures can be found in thefig-svg/
directory.rna-seq/ - Each species was treated with a species-specific heat shock (or control temperature) and then submitted for RNA-Seq. Upstream pre-processing was performed with a custom Snakemake pipeline which can be found on Github (https://github.com/skeyport/conservation-of-condensation-2023). Downstream processing and analyses as well as figure generation are performed in RStudio (R version 4.2.2). The script
230503_analyze_counts.Rmd
aggregates count data (counts/*_counts.tsv
) and sample information (20230309-sample-info.txt
, also indata/
) to produceoutput/20230309-counts.tsv
. Gene lengths were extracted for each gene by first adding exon annotations to the GTF files (*_genomic.gtf
or*_genomic.cleaned.gtf
) using a custom script based on gffutils v0.11.1 (https://github.com/daler/gffutils, also in this directory asgffutils_fix_missing_exon.py
). Gene lengths were then calculated using the GenomicFeatures package in R. These lengths (found inoutput/*_merged_exon_length.tsv
) are then used to calculate transcript per million values (TPMs,output/20230309_TPM.tsv
, also indata/
) using the counts output (output/20230309-counts.tsv
). Fold changes in transcript abundance were calculated using DESeq2 v3.16 (output/20230309-deseq.tsv
). We used pre-published genome annotationssrc/label_Scer_genes/Saccharomyces-kudriavzevii-ZP591_genome.tab
from YGOB v8 (beta) to match RNA-Seq data to S. kudriavzevii strain ZP591. Figures 2a-2d and Figures S3a-c are produced using230503_analyze_counts.Rmd
. Gene annotations were assigned insrc/label_Scer_genes/230215-label-genes.Rmd
, where targets of HSF1 and Msn2/4 were curated from Solis et al., 2016 and Pincus et al., 2018 (Solis_2016/mmc3.xlsx
and200325-scer-features.txt
(also indata/
)). The genes for core ribosomal proteins (scer-ribosomal-proteins.txt
), ribosome biogenesis factors (ribosome_biogenesis_annotations_sgd.txt
), and glycolytic enzymes (superpathway of glucose fermentation,SGD_superpathway_glycolysis_221025.txt
) as well as transcription factor regulation assignments (20-04-21_sgd-all-regulators2.tsv
andtfs_and_targets_heatshock.tsv
) were derived from the Saccharomyces Genome Database (https://www.yeastgenome.org/). Genes for translation factors were derived from the KEGG BRITE database (translation_factors_kegg.tsv
). The annotation gene file was output into230215_labeled_genes_scer_all.tsv
, which is subsequently used in230503_analyze_counts.Rmd
to assign gene annotations. The script to produce the upset plots from Figures S2b-c istxn-comparison.Rmd
, and also takes data from Brion et al. 2016 (data/brion2016-lkluyveri-stress-seq.txt
), sample information (/data/20230309-sample-info.txt
), orthologs (data/master-orthologs.txt
), TPMs (data/20230309_TPM.tsv
), and gene annotations (/data/230508_labeled_genes_scer.tsv
). GEO upload TPMs and counts (Keyport-Kik_2023_TPMs.tsv
andKeyport-Kik_2023_counts.tsv
are found inoutput/
as well, and are identical tooutput/20230309_*.txt
). A subset of custom functions called in scripts are defined inutilityFunctions.R
. A file containing orthologs for all species (also Supplemental File 1),master-orthologs.txt
is found indata/
andsrc/label_Scer_genes/
and is used as an input in the script230503_analyze_counts.Rmd
. Final figures can be found in thefig-svg/
directory.spot-assays/ - Spot assays and plate growth assays were performed for each species at corresponding temperatures. Raw
*.tif
images for Figure 4e and Figure S1a-c (organized as directories) are contained here. Naming conventions for images infig4e/
andfigs1e/
are [date][time][strain/species][duration][temperature]. Naming conventions for images infigs1a/
are [plate][duration][temp][species]. Naming conventions for images infigs1c/
are [date][time][temperature][duration]. Final figures can be found in thefig-svg/
directory.
Usage notes
Eight folders contain raw and processed data as well as R or Python scripts to produce the figures in the paper.