Data from: An adaptive biomolecular condensation response is conserved across environmentally divergent species

Keyport Kik, Samantha 1 ; Christopher, Dana1 ; Glauninger, Hendrik1 ; Wong Hickernell, Caitlin1 ; Bard, Jared1 ; Lin, Kyle1 ; Ford, Michael2 ; Squires, Allison1 ; Sosnick, Tobin1 ; Drummond, D. Allan1

Published Aug 22, 2023; Updated Jan 11, 2024 on Dryad. https://doi.org/10.5061/dryad.w3r2280w6

Data files

Aug 22, 2023 version files 231.13 MB

keyportkik2023-conservation-condensation-dryad-package-20230817.zip

231.11 MB
README.md

13.69 KB

Jan 11, 2024 version files 241.83 MB

keyportkik2024-conservation-condensation-dryad-package-20240111.zip

241.82 MB
README.md

14.34 KB

Abstract

Cells must sense and respond to sudden maladaptive environmental changes—stresses—to survive and thrive. Across eukaryotes, stresses such as heat shock trigger conserved responses: growth arrest, a specific transcriptional response, and biomolecular condensation of protein and mRNA into structures known as stress granules under severe stress. The composition, formation mechanism, adaptive significance, and even evolutionary conservation of these condensed structures remain enigmatic. Here we provide an unprecedented view into stress-triggered condensation, its evolutionary conservation and tuning, and its integration into other well-studied aspects of the stress response. Using three morphologically near-identical budding yeast species adapted to different thermal environments and diverged by up to 100 million years, we show that proteome-scale biomolecular condensation is tuned to species-specific thermal niches, closely tracking corresponding growth and transcriptional responses. In each species, poly(A)-binding protein—a core marker of stress granules—condenses in isolation at species-specific temperatures, with conserved molecular features and conformational changes modulating condensation. From the ecological to the molecular scale, our results reveal previously unappreciated levels of evolutionary selection in the eukaryotic stress response, while establishing a rich, tractable system for further inquiry.

This README file is for the data published in "An adaptive biomolecular condensation response is conserved across environmentally divergent species" by Keyport Kik et al. (2023) (preprint: https://www.biorxiv.org/content/10.1101/2023.07.28.551061v1). These folders also contain custom R (version 4.2.2) scripts for data processing, analysis, and figure generation, organized into separate experimental folders. Some data have a separate R code for data processing and exporting data. The ordering of the figures may be different from what's shown in the paper. The code used to process raw RNA-seq data can be found on Github (https://github.com/skeyport/conservation-of-condensation-2023. The rest of the data and scripts can be found on Dryad (https://doi.org/10.5061/dryad.w3r2280w6). Below is a description of the contents of each folder. Raw RNA-seq data can be found under GEO accession code GSE234499.

Please note: The most recent version of this package has been updated to include changes recommended by the reviewers, including microscopy data and further analysis for the HDX-MS section.

supp-files/

Within this folder are three supplemtanal files:

suppfile1/ (master-orthologs.txt) - contains ortholog calls for all three species, including gene and ORF name from S. cerevisiae
suppfile2/ (230215_labeled_genes_scer_all.tsv) - were derived from the following sources. The targets of HSF1 and Msn2/4 were curated from Pincus et al. 2018 and Solís et al. 2016. The genes for core ribosomal proteins, ribosome biogenesis factors, and glycolytic enzymes (superpathway of glucose fermentation) as well as transcription factor regulation assignments were derived from the Saccharomyces Genome Database (Cherry et al. 2012; Engel et al. 2014, https://www.yeastgenome.org/). Genes for translation factors were derived from the KEGG BRITE database (Kanehisa et al. 2016).
*suppfile3/ (20-04-21_sgd-all-regulators2.tsv) - Transcription factor regulators were assigned according to (Triandafillou et al. 2020).

fig-svgs/

Final Figures 1-5 as well as Supplemental Figures 1-5 (in *.svg or *.png format) can be found in this folder. Table 1 and Table 2 can also be found here as *.txt files.

dryad-upload/

Eight folders contain raw and processed data as well as R or Python scripts (open source versions) to produce the figures in the paper. All required packages and dependencies are listed at the top of each script. Both Windows v11 and MacOS can run all scripts. After downloading the .zip file, data are organized to be able to run efficiently within the structured directories in which they are found. Expected (actual) outputs are included for each script. To run all of the scripts and produce the figures should require less than a day's time. All input files are contained in this folder to test the code. Instructions for use are below. If there is a specific order to run the code, it is outlined below (e.g., condensation-ms/); in all other cases, scripts can be run in any order.

condensation-ms/ - mass spectrometry was performed at different control and treatment temperatures for each species. Outputs from Scaffold DIA 3.3.1 analysis perforfmed by MS Bioworks are in the data/ folder (MSB-9658A U. Chicago Keyport 042622.txt - text file of S. kudriavzevii processed data from MS Bioworks, MSB-9658B U. Chicago Keyport 042722.txt - text file of S. cerevisiae processed data from MS Bioworks, MSB-10835 U. Chicago Keyport 022223.txt - text file of K. marxianus processed data from MS Bioworks). Other sample name and proteome information called by the scripts are contained in data/, including conditions.txt (contains sample conditions), sample_names.txt (experimental names and assoiciated data), kmarx-tsp-by-condition.txt (tidy version of raw intensity data for K. marxianus). All analyses are performed in RStudio (R version 4.2.2). Analysis script process-raw-dia.Rmd processes raw data for S. cerevisiae and S. kudriavzevii and uses raw MS data, sample_names.txt, uniprot-gene-orf.txt (accession and gene information from Uniprot), and skud_proteome_ygob.fasta (*.fasta file for S. kudriavzevii from YGOB). This script produces results/processed_data.tsv, which is an input for mixing ratio calculation in calculate-mixing-dia.Rmd along with data/kmarx-tsp-by-condition.txt and data/scer-kmarx-skud-orthologs.txt (curated ortholog list for each species); this script outputs two files (results/230501_mixing_ratios.tsv and results/psups-three-species.txt). psup-three-species.txt was transformed into psup_wide_data.txt which is then used in conservation-of-condensation.Rmd, in combination with data/scer-anno-proteins.txt (annotations of protein processes as described in the Methods section, "Gene Annotation") and 230501_mixing_ratios.tsv. This script, conservation-of-condensation.Rmd, ultimately produces the figures (found in figures/). Final figures can be found in the fig-svg/ directory. A subset of custom functions called in scripts are defined in utilityFunctions.R.
dls/ - raw measurements from DLS temperature ramp experiments are contained in directories with dates as names as *.csv files, and associated sample information is also contained as *-samples.csv. All analyses are performed in RStudio (R version 4.2.2). Analysis script analyze-dls.Rmd reads in sample information and raw data to produce Figures 4a-d (output in in figs/ folder). Raw data from Riback & Katanski et al. 2017 can be found in riback-2017/ directory (WT Pab1 in Pab1_15uM_DLSbuffpH6p4.csv, MV to A Pab1 in Pab1_MVtoA_15uM_pH6p4_4_14_15_1.csv, and MV to I Pab1 in Pab1_MVtoI_15uM_pH6p4.csv), and is also used in analyze-dls.Rmd. Growth data to produce Figure 4b are generated from scripts in growth-curves/ and output into this directory, including conf-int-topt.txt (confidence intervals from esptimation of optimal temperature in Figure 1b), hsr-growth-max.txt (maximum estimated growth temperature and heat shock temperature for each species), and stop-growing.txt (estimations the temperatures at which each species stops growing). Table S1 (table1.txt) and Table S2 (table2.txt) are output from the analyze-dls.Rmd script and contain baseline size calculations for each protein (S1) and temperature and doubling size of condensation for each protein (S2). Final figures can be found in the fig-svg/ directory.
flow-cytometry/ - raw *.fcs files of each species +/- heat shock at various temperatures with 180 min of recovery are found here with condition discriptors in file names and contained directories with dates as names. All analyses are performed in RStudio (R version 4.2.2), and requires custom but publically available R packages. These custom packages can be found on GitHub (https://github.com/ctriandafillou/flownalysis; https://github.com/ctriandafillou/cat.extras). The script 20220414-skud-scer-manytemps-endpoint.Rmd reads in *.fcs files, processes flow data, and produces Figure S2a (found in output/results/). Another output file, max-fc-hsr.txt, which contains the fold change and statistics of maximum heat shock response for each species at each temperature, is produced here and also exported to growth-cruves/data/ which will in part produce Figure 1c-d (also found in output/results/). Final figures can be found in the fig-svg/ directory.
growth-curves/ - raw OD600 data for each WT species is found in 20230311/20230311.txt and was sampled from log-phase growing yeast grown at different temperatures. All analyses are performed in RStudio (R version 4.2.2). Two R scripts are contained in this directory. First, growth-curve-estimation.R calculates the maximum specific growth rates by estimating the slope of the linear range of growth for each species and temperature. Resulting growth curves are then fit using the cardinal temperature model with inflection (Rosso, Lobry, and Flandrois 1993), and temperature of maximum growth is estimated from these curves and output into data/dat-max.txt. Optimal temperature statistics are calculated as confidence interval (results/conf-int-topt.txt, also output into ../dls/) and standard deviation (results/topt-sd.txt). This script is also used to estimate the temperature at which each species stops growing, and is output as data/stop-growing.txt (also output into ../dls/). growth-curve-estimation.R generates Figure 1b (found in results/). The second script, plot-growth-hsr-correlation.Rmd takes the output files from the first script as well as data/max-fc-hsr.txt produced from ../flow-cytometry/20220414-skud-scer-manytemps-endpoint.Rmd to produce Figures 1c and 1d as well as results/hsr-growth-max.txt, which contains temperatures optima for growth and heat shock responses for each species. Final figures can be found in the fig-svg/ directory.
hdx/ - HDX-MS data were collected for each species' Pab1 before and after condensation. Raw data for each species as *_peptideUptake.csv, as well as manually curated domain boudaries (20221214_pab1DomainBoundaries) and secondary structure predictions (domain-ss.txt) from Schäfer et al. 2019 are found in this folder. Python3 scripts (extract-hdx.py, hdx.py, and hdx_test.py) produce %D values for each residue in aligned Pab1 using aligned-positions.txt based on the alignment (align-pab1.txt) that are used to compute means across the sequences. The outputs for Python3 scripts are in output/*hdx.txt. Figure 5a-f generation (each found in output/) and other data processing are done in RStudio (R version 4.2.2) in 20230328_HDX_Pab1_SKK.Rmd. Final figures can be found in the fig-svg/ directory.
microscopy/ - Images are provided in .tif format and in .jpg format used for the final figure in the paper (in separate directories, selections_tif/ and selections_jpg/). Files are labeled by species abbreviation (Sk, Sc, Km) and by temperature (in degrees Celsius). The Km_55 (K. marxianus, 55 degrees) seelction has a 5 micron scale bar added in ImageJ. The colorbar settings used for the .jpg images are as described in the final figure. There are 184 nm per pixel (5.43 pixels per micron). Final figures can be found in the fig-svg/ directory.
rna-seq/ - Each species was treated with a species-specific heat shock (or control temperature) and then submitted for RNA-Seq. Upstream pre-processing was performed with a custom Snakemake pipeline which can be found on Github (https://github.com/skeyport/conservation-of-condensation-2023). Downstream processing and analyses as well as figure generation are performed in RStudio (R version 4.2.2). The script 230503_analyze_counts.Rmd aggregates count data (counts/*_counts.tsv) and sample information (20230309-sample-info.txt, also in data/) to produce output/20230309-counts.tsv. Gene lengths were extracted for each gene by first adding exon annotations to the GTF files (*_genomic.gtf or *_genomic.cleaned.gtf) using a custom script based on gffutils v0.11.1 (https://github.com/daler/gffutils, also in this directory as gffutils_fix_missing_exon.py). Gene lengths were then calculated using the GenomicFeatures package in R. These lengths (found in output/*_merged_exon_length.tsv) are then used to calculate transcript per million values (TPMs, output/20230309_TPM.tsv, also in data/) using the counts output (output/20230309-counts.tsv). Fold changes in transcript abundance were calculated using DESeq2 v3.16 (output/20230309-deseq.tsv). We used pre-published genome annotations src/label_Scer_genes/Saccharomyces-kudriavzevii-ZP591_genome.tab from YGOB v8 (beta) to match RNA-Seq data to S. kudriavzevii strain ZP591. Figures 2a-2d and Figures S3a-c are produced using 230503_analyze_counts.Rmd. Gene annotations were assigned in src/label_Scer_genes/230215-label-genes.Rmd, where targets of HSF1 and Msn2/4 were curated from Solis et al., 2016 and Pincus et al., 2018 (Solis_2016/mmc3.xlsx and 200325-scer-features.txt (also in data/)). The genes for core ribosomal proteins (scer-ribosomal-proteins.txt), ribosome biogenesis factors (ribosome_biogenesis_annotations_sgd.txt), and glycolytic enzymes (superpathway of glucose fermentation, SGD_superpathway_glycolysis_221025.txt) as well as transcription factor regulation assignments (20-04-21_sgd-all-regulators2.tsv and tfs_and_targets_heatshock.tsv) were derived from the Saccharomyces Genome Database (https://www.yeastgenome.org/). Genes for translation factors were derived from the KEGG BRITE database (translation_factors_kegg.tsv). The annotation gene file was output into 230215_labeled_genes_scer_all.tsv, which is subsequently used in 230503_analyze_counts.Rmd to assign gene annotations. The script to produce the upset plots from Figures S2b-c is txn-comparison.Rmd, and also takes data from Brion et al. 2016 (data/brion2016-lkluyveri-stress-seq.txt), sample information (/data/20230309-sample-info.txt), orthologs (data/master-orthologs.txt), TPMs (data/20230309_TPM.tsv), and gene annotations (/data/230508_labeled_genes_scer.tsv). GEO upload TPMs and counts (Keyport-Kik_2023_TPMs.tsv and Keyport-Kik_2023_counts.tsv are found in output/ as well, and are identical to output/20230309_*.txt). A subset of custom functions called in scripts are defined in utilityFunctions.R. A file containing orthologs for all species (also Supplemental File 1), master-orthologs.txt is found in data/ and src/label_Scer_genes/ and is used as an input in the script 230503_analyze_counts.Rmd. Final figures can be found in the fig-svg/ directory.
spot-assays/ - Spot assays and plate growth assays were performed for each species at corresponding temperatures. Raw *.tif images for Figure 4e and Figure S1a-c (organized as directories) are contained here. Naming conventions for images in fig4e/ and figs1e/are [date][time][strain/species][duration][temperature]. Naming conventions for images in figs1a/are [plate][duration][temp][species]. Naming conventions for images in figs1c/are [date][time][temperature][duration]. Final figures can be found in the fig-svg/ directory.

Data from: An adaptive biomolecular condensation response is conserved across environmentally divergent species

Data files

Abstract

README

Usage notes

Works referencing this dataset