A conserved function of corepressors is to nucleate assembly of the preinitiation complex

Leydon, Alexander 1 ; Downing, Benjamin1 ; Solano Sanchez, Janet 1 ; Loll-Krippleber, Raphael2 ; Belliveau, Nathan1 ; Rodriguez-Mias, Ricard1 ; Bauer, Andrew1 ; Watson, Isabella1 ; Bae, Lena1 ; Villén, Judit1 ; Brown, Grant 2 ; Nemhauser, Jennifer1

Published Dec 03, 2024 on Dryad. https://doi.org/10.5061/dryad.x0k6djhst

Data files

Dec 03, 2024 version files 1.30 GB

README.md

16.08 KB
Root_Data.zip

2.48 MB
RSGA_analysis_Archive.zip

1.30 GB

Abstract

The plant corepressor TPL is recruited to diverse chromatin contexts, yet its mechanism of repression remains unclear. Previously, we have leveraged the fact that TPL retains its function in a synthetic transcriptional circuit in the yeast model Saccharomyces cerevisiae to localize repressive function to two distinct domains. Here, we employed two unbiased whole genome approaches to map the physical and genetic interactions of TPL at a repressed locus. We identified SPT4, SPT5 and SPT6 as necessary for repression with the SPT4 subunit acting as a bridge connecting TPL to SPT5 and SPT6. We also discovered the association of multiple additional constituents of the transcriptional preinitiation complex at TPL-repressed promoters, specifically those involved in early transcription initiation events. These findings were validated in yeast and plants through multiple assays, including a novel method to analyze conditional loss of function of essential genes in plants. Our findings support a model where TPL nucleates preassembly of the transcription activation machinery to facilitate rapid onset of transcription once repression is relieved.

https://doi.org/10.5061/dryad.x0k6djhst

Give a brief summary of dataset contents, contextualized in experimental procedures and results.

Description of the R-SGA data and file structure

Two independent crosses of YNL3669 (MATa SPARCH1-H5[pTDH3-AtTPLH1-5-IAA14-ttACS, pRPS2-AtAFB2-ttCIT1, LEU2, pADH1-AtARF19-ttADH1, pP3(2x)-UbiVenus-ttCYC1] HO::ACT1pr-tdTomato::hphMX, can1∆::STE2pr-Sphis5 lyp1∆ his3∆1 leu2∆0 ura3∆0 met15∆0) and YNL3670 (MATa SPARCTPLN188[pTDH3-AtTPLN188-IAA14-ttACS, pRPS2-AtAFB2-ttCIT1, LEU2, pADH1-AtARF19-ttADH1, pP3(2x)-UbiVenus-ttCYC1 ] HO::ACT1pr-tdTomato::hphMX, can1∆::STE2pr-Sphis5 lyp1∆ his3∆1 leu2∆0 ura3∆0 met15∆0) with the yeast nonessential deletion collection and a set of conditional temperature-sensitive alleles of essential genes were performed following standard SGA procedures

Files are labelled with the shorthand signifier for the strain carrying the TPL truncation: H1-H5 = 3669, N188 = 3670. The data is submitted in a series of nested files. In example, in the main folder there are a series of folders with either 3669 or 3670, with descriptors:

DA (Deletion Array) or TS (Temperature Sensitive Array), and NAA (1-Naphthaleneacetic acid (a synthetic auxin)).

Within each folder is the code for that experiment (DATE_typhoon_SGAanalysis.R), as well as the raw data from the background-subtracted yellow fluorescent protein (Venus) and tdTomato intensities that were computed for each colony from .GEL images using GenePixPro version 7.0 software. The text and graphical output of this pipeline is also in this folder. In general all attempts have been made to clearify in the code through comments the relative actions and calculations that are taking place in the associated files and outputs.

Chapter 1. Aggregate Size and Intensity Information

1. Read colony size files and append rows/rbind. 2. Add conditions and replicate vectors. 3. Cbind colony sizes with replicate and condition. 4. Read 1536 FLEX map and cbind. 5. Name file and write to computer.

In each folder there is a nested folder containing the colony size imaging, these have the following column structure:
Flags: S - Colony spill or edge interference, C - Low colony circularity
row Row number
col Column number
size Calculated colony size
circularity Evenness of the circularity of the colony
flags See above

The next step is to integrate colony size data with ORF map for FLEX collection. This utilizes the "DAv2_quadruplicate.csv" file which is a list of each row, position and gene identity in the R-SGA library, these have the following column structure:
gene,plate,r,c,TSQ.ID
gene - standard yeast format: i.e. the border gene YOR202W
plate - plate number
r - Row number
c - Column number
TSQ.ID - For DA this is same as gene, for TS array this is a unique identifier, and the assocaited datafile "TS_quadruplicate.csv" data file has added information:
Name - Gene symbol
Allele - Mutant name

The output file "BioRep_1_aggregateSize.csv" aggregates some of this information and has the colum names:
"gene","TSQ.ID","PlateNum","Row","Column","Size","Circularity","BioRep", inhereted from the files above. BioRep is a column name that can be optionally added in the code to be biological replicate. In this study this column can be safely ignored.

The raw data from GenePixPro version 7.0 software is entitled such as "3669-1_2Day_data.csv". The columns within are:
Flags,Normalize,Autoflag,Block,Column,Row,Name,ID,X,Y,Dia.,F532 Median,F532 Mean,F532 SD,F532 CV,B532,B532 Median,B532 Mean,B532 SD,B532 CV,% > B532+1SD,% > B532+2SD,F532 % Sat.,F488 Median,F488 Mean,F488 SD,F488 CV,B488,B488 Median,B488 Mean,B488 SD,B488 CV,% > B488+1SD,% > B488+2SD,F488 % Sat.,F532 Total Intensity,F488 Total Intensity,SNR 532,SNR 488,Ratio of Medians (488/532),Ratio of Means (488/532),Median of Ratios (488/532),Mean of Ratios (488/532),Ratios SD (488/532),Rgn Ratio (488/532),Rgn R2 (488/532),Log Ratio (488/532),F532 Median - B532,F488 Median - B488,F532 Mean - B532,F488 Mean - B488,Sum of Medians (488/532),Sum of Means (488/532),F Pixels,B Pixels,Circularity,Index

The critical columns for the analysis are: "Block", "Row", "Column", "F488 Median - B488", "F488 Mean - B488", "F532 Median - B532", "F532 Mean - B532"
The critical columns for the analysis are: "Block", "Row", "Column", "F488 Median - B488", "F488 Mean - B488", "F532 Median - B532", "F532 Mean - B532"
Please consult GenePixPro for the full breakdown on the export information, if needed.

This data is incorporated into the file "BioRep_1_aggregateSizeIntensity.csv", with the following column structure:
"PlateNum","Row","Column","gene","TSQ.ID","GFPmean","GFPmedian","RFPmean","RFPmedian","Size","Circularity"
Here the only new columns are: "GFPmean","GFPmedian","RFPmean","RFPmedian", which are pulled from the output of the GenePixPro version 7.0 software analysis of the colony fluorescence.

Chapter 2. Filtering Low Quality Data

1. Filter border strains using Row/Column information 2. Filter By Size 3. Filter Undetectable

This data is then filtered by colony size (Size > 600 pixels), filtered out border strains (used to buffer the edge effect), and removed any undetectable colonies, and exported to "BioRep_1_aggregateSizeIntensityFilter.csv", and therefore inherits the identical column names and data structure as above:
"PlateNum","Row","Column","gene","TSQ.ID","GFPmean","GFPmedian","RFPmean","RFPmedian","Size","Circularity"

Chapter 3. Normalization

1. Compute log2 ratio (logRatio) and brightness (Brightness) for filtered data 2. Plot ratio vs. size and center such that mean is zero across all sizes - Compute log base 2 GFP/RFP ratios and brightness

The following calculation is applied:
intensityFilt <- within(intensityFilt, {
meanlogRatio <- log2(GFPmean/RFPmean)
meanBrightness <- log2(GFPmeanRFPmean)
medlogRatio <- log2(GFPmedian/RFPmedian)
medBrightness <- log2(GFPmedianRFPmedian)
})

These values are used to apply LOESS normalization> The Loess Normalization process normalizes data across arrays using a loess smoothing model to account for plate fluctuations. It is applied using the following function:
spanAnswer <- as.numeric(readline(prompt = "Desired LOESS span: ")) # typically = 1
span <- spanAnswer
intensityNorm <- within(intensityFilt, {
meanLOESS <- predict(loess(meanlogRatio ~ meanBrightness, span = span))
medLOESS <- predict(loess(medlogRatio ~ medBrightness, span = span))
meanlogRatioNorm <- meanlogRatio - meanLOESS
medlogRatioNorm <- medlogRatio - medLOESS
})

The data is then ordered largest to smalles on the "meanlogRatioNorm" value:
intensityNormord <- intensityNorm[with(intensityNorm, order(meanlogRatioNorm, decreasing = T)), ]
and exported to the "IntensityFilterLOESS_Gal.csv" file.

The columns in this file are as follows:
"PlateNum","Row","Column","gene","TSQ.ID","GFPmean","GFPmedian","RFPmean","RFPmedian","Size","Circularity","medBrightness","medlogRatio","meanBrightness","meanlogRatio","medlogRatioNorm","meanlogRatioNorm","medLOESS","meanLOESS"
The new columns below are what are applied from the code above:
"medlogRatio","meanBrightness","meanlogRatio","medlogRatioNorm","meanlogRatioNorm","medLOESS","meanLOESS"

The associated LOESS calculation is graphed to demonstrate the LOESS effect: "BioRep_1_aggregateSizeIntensityFilterLOESS_Rs.pdf"

Calculate Z scores

Next, the z-score is calculated on the normalized ratio value. Z-score is a statistical measure that quantifies the distance between a data point and the mean of a dataset. It's expressed in terms of standard deviations. It indicates how many standard deviations a data point is from the mean of the distribution. It is applied through the following code:

Zscore <- (intensityNormord$meanlogRatioNorm - mean(intensityNormord$meanlogRatioNorm))/sd(intensityNormord$meanlogRatioNorm)
intensityNormord <- cbind (intensityNormord, Zscore)

This is saved as "BioRep_1_aggregateSizeBavg_Z.csv", and the columns are as follows:
"PlateNum","Row","Column","gene","TSQ.ID","GFPmean","GFPmedian","RFPmean","RFPmedian","Size","Circularity","medBrightness","medlogRatio","meanBrightness","meanlogRatio","medlogRatioNorm","meanlogRatioNorm","medLOESS","meanLOESS","Zscore"

The new added column "Zscore" is found at the last column position.
*It should be noted that we did experiment with applying the Z-score by plate, as opposed to the entire set, and saw some improvement. This code was used going forward, and the file "BioRep_1_aggregateSizeBavg_Z.csv" was used as a comprison file in later experiments, but should not be used as the starting input of the code or analysis as this loess is less efficient.

The LOESS normalized 'by-plate' data was then trimmed and the final file with the Z-score data was output with the genotype specific prefix as in "3669-1_DAbyplate_Z.csv". The columns are as follows: "gene","number","PlateNum","Row","Column","colony.mean.ratio","colony.sd.ratio","Zscore"
The new column names here are "colony.mean.ratio","colony.sd.ratio","Zscore". These values were calculated from the code as follows:

mean.ratio = aggregate(intensityNormord$meanlogRatioNorm, by=list(intensityNormord$gene), mean)
sd.ratio = aggregate(intensityNormord$meanlogRatioNorm, by=list(intensityNormord$gene), sd)
mean.sd.ratio = cbind(mean.ratio, sd.ratio[,2])
colnames(mean.sd.ratio) = c("gene", "colony.mean.ratio", "colony.sd.ratio")
df.info = intensityNormord[!duplicated(intensityNormord$gene),1:5]
mean.ratio.withInfo = merge(df.info, mean.sd.ratio, by.x = "gene", by.y = "gene", all.x=T)
Zscore.per.p = as.data.frame(do.call(rbind, lapply(1:14, function(i){
df = subset(mean.ratio.withInfo, PlateNum == i)
Zscore = scale(df$colony.mean.ratio)
df.Zscore = cbind(df[,1], Zscore)
})))
colnames(Zscore.per.p) = c("gene", "Zscore")
Zscore.per.p.all.dat = merge(mean.ratio.withInfo, Zscore.per.p, by = "gene", all.y=T)

Export the upregulated and downregulated genes as hitlists

For the Upregulated genes, the hits with a z-score above 2 were subset from the data using the following code:

hits_up = subset(Zscore.per.p.all.dat, Zscore >= 2)
hits_up = hits_up[order(hits_up$Zscore, decreasing=T),]
This was saved as file "BioRep_1_aggregateSizeUp_hits.csv"

hits_Down = subset(Zscore.per.p.all.dat, Zscore < -2)
hits_Down = hits_Down[order(hits_Down$Zscore, decreasing=T),]
This was saved as file "BioRep_1_aggregateSizeDown_hits.csv"

Both data files have the identical structure as their parental file "3669-1_DAbyplate_Z.csv", they are simply subset to contain only up or down hits:
"gene","number","PlateNum","Row","Column","colony.mean.ratio","colony.sd.ratio","Zscore"

This approach was repeated for GFP or RFP alone to allow independant graphing of these fluorescent values, to control for any normalization errors, this can be seen in "BioRep_1_aggregateSizeGFP_Up_hits.csv", however the user can run these additionally if they are desired.

The Z-scores were then plotted in the following file:
"BioRep_1_aggregateSizeZplot.pdf"

Additionally the Ratio Z-scores (Zscore.y), GFP Zscore (Zscore.GFP), RFP Zscore (Zscore.RFP), and Size Zscore (Zscore.Size) were plotted for the Upregulated hits in a heatmap and exported in "Rplot_ratio_up.pdf". This was performed to examine whether the Ratio Z-scores were an appropriate measurement and whether they were influenced by any other factors.

Description of the Arabidopsis root data and file structure

All experimental root data has been organized according to the figure panel number found in Figure 6. Each subfolder contains the raw data file in .csv format, a R code file, and some files contain output graphs and project files for ease of use.

Data files for Figure 6D,E,F,L may have the following column names:
Genotype This is the Nemlab agro number assigned to the agro strain that was used to transform the plants
days This refers to the number of days the seedlings were grown in sterile tissue culture format (days after germination)
plate - Plate number, used as technical replicate for differences in plate/batch
seedling - This is the individual seedling number
length_mm - Length of the primary root in millimeters
lrnum - Lateral root number total on primary root
LRD - Lateral root density (calculated as lateral root number / length of the primary roon in millimeters)
construct - The construct that was expressed in the plant
gen - Generation (i.e. T1, T2, etc)
comment - Any comments about plant growth are listed here
plant_line - If T2s were analyzed, this column is included to indicate which family they were a part of. This is a plant number, and can be correlated back to a specific T1 founding line.
LRL - Lateral root length in millimeters
Lrcount - sequential count of lateral root number from oldest to youngest
group - grouping used to perform statistics (usually genotype)
date - Date of plating

For integrase experiments in Figure 6N columns are as follows:
Day - This refers to the number of days the seedlings were grown in sterile tissue culture format (days after germination)
Plant_num - If T2s were analyzed, this column is included to indicate which family they were a part of. This is a plant number, and can be correlated back to a specific T1 founding line.
seedling_number - This is the individual seedling number
genotype - the spt6l mutant genotype either heterozygous (het) or homozygous (hm)
Prlength - Length of the primary root in millimeters
Lrnum - Lateral root number total on primary root
Initation_count - Number of visible initiating lateral roots that had not yet emerged
LRD - Lateral root and initiating Lateral roots divided by the total primary root length (Lrnum+Initiation_count / Prlength)
RFP - Yes or no (Y/N) response to the question was nuclear RFP visible in the initiating lateral root
GFP - Yes or no (Y/N) response to the question was nuclear GFP visible in the primary root
Specific - Yes or no (Y/N) response to the question was RFP visible only in the initiation lateral root, and not anywhere in the primary root.

For integrase experiments in Figure 6O columns are as follows:
number - row number
Strain - This is the Nemlab agro number assigned to the agro strain that was used to transform the plants
Parent - If T2s were analyzed, this column is included to indicate which family they were a part of. This is a plant number, and can be correlated back to a specific T1 founding line.
Day - This refers to the number of days the seedlings were grown in sterile tissue culture format (days after germination)
Seedling - This is the individual seedling number
num_LR.init - This is the combined Lateral root number plus number of visible initiating lateral roots that had not yet emerged - total on primary root
switch - Yes or no (Specific/Not specific) response to the question was YFP visible only in the initiation lateral root, and not anywhere in the primary root.
Comments - Notation of whether there were any other phenotypes observed
Lrpheno - the observation of whether there was an observed phenotype yes or no (yes/no). All were subsequently PCR genotyped, and there is an absolute correlation between homozygous mutant taf5 and the LRpheno being "yes".
PRL - Length of the primary root in millimeters
LRILRDens - Lateral root and initiating Lateral roots divided by the total primary root length (num_LR.init / PRL)

Sharing/Access information

Links to other publicly accessible locations of the data:

The code can be found in Github

Code/Software

All quantification and statistical analyses were performed in R, and the corresponding code has been deposited into GitHub: https://github.com/achillobator/TPL-H1_Mechanism

Two independent crosses of YNL3669 (MATa SPARC^H1-H5[pTDH3-AtTPLH1-5-IAA14-ttACS, pRPS2-AtAFB2-ttCIT1, LEU2, pADH1-AtARF19-ttADH1, pP3(2x)-UbiVenus-ttCYC1] HO::ACT1pr-tdTomato::hphMX, can1∆::STE2pr-Sphis5 lyp1∆ his3∆1 leu2∆0 ura3∆0 met15∆0) and YNL3670 (MATa SPARC^TPLN188[pTDH3-AtTPLN188-IAA14-ttACS, pRPS2-AtAFB2-ttCIT1, LEU2, pADH1-AtARF19-ttADH1, pP3(2x)-UbiVenus-ttCYC1 ] HO::ACT1pr-tdTomato::hphMX, can1∆::STE2pr-Sphis5 lyp1∆ his3∆1 leu2∆0 ura3∆0 met15∆0) with the yeast nonessential deletion collection and a set of conditional temperature-sensitive alleles of essential genes were performed following standard SGA procedures¹⁰⁴. Final arrays were pinned in duplicate on either SD/MSG–his–leu+ 200mg/mL G418 (untreated) or YPD supplemented with 50mM NAA and grown for 24hr before fluorescence scanning. The Typhoon TrioVariable Mode Imager (GEHealthcare) was used to acquire Venus (488-nmlaser, 520/40BP emission filter) and tdTomato (532-nmlaser, 610/30BPemission filter) fluorescence values. For the essential temperature-sensitive mutants, all growth was conducted at 23°C until the final growth before imaging, where they were grown at 30°C. After fluorescence imaging, colony size data were acquired by individually photographing plates with a Canon PowerShotG 24.0 megapixel digital camera using Remote Capture software. Data analysis followed essentially what is described in Kainth et al. (2009)⁴³, with small variations. To summarize, background-subtracted yellow fluorescent protein (Venus) and tdTomato intensities were computed for each colony from .GEL images using GenePixPro version 7.0 software. Colony size was imaged on SPimager from S&P Robotics, Inc, and size information was calculated from individual photographs SGAtools. Border colonies, small colonies (colony area < 500pixels), and colony size information was calculated from individual photographs. Border colonies, small colonies (colony area < 500pixels), and colonies with aberrantly low tdTomato values (bottom 0.05%) were removed before further analysis. log2(Venus/tdTomato) values were calculated and LOESS normalized for each plate. Using the log2(Venus /tdTomato) ratio as a metric for Venus abundance has the advantage that dividing by tdTomato corrects for any colony size dependent intensity effects. Finally, normalized log2(Venus/tdTomato) values were averaged across all replicate experiments and a Z-score calculated (See Supporting Information). All analyses were performed in R.

For Arabidopsis thaliana experiments using the GAL4-UAS system (Laplaze et al., 2005), J0121 was introgressed eight times into Col-0 accession from the C24 accession and rigorously checked to ensure root growth was comparable to Col-0 before use. UAS-TPL-IAA14mED constructs were introduced to J0121 introgression lines by floral dip method¹¹². T1 seedlings were selected on 0.5× LS (Caisson Laboratories, Smithfield, UT)+ 25 μg/ml Hygromycin B (company) + 0.8% phytoagar (Plantmedia; Dublin, OH). Plates were stratified for 2 days, exposed to light for 6 hr, and then grown in the dark for 3 days following a modification of the method of Harrison et al., 2006. Hygromycin-resistant seedlings were identified by their long hypocotyl, enlarged green leaves, and long root. Transformants were transferred by hand to fresh 0.5× LS plates + 0.8% Bacto agar (Thermo Fisher Scientific) and grown vertically for 14 days at 22°C. Plates were scanned on a flatbed scanner (Epson America, Long Beach, CA) at day 14. slr seeds were obtained from the Arabidopsis Biological Resource Center (Columbus, OH). For integrase switch experiments T2 plant lines harboring T-DNAs for either MED21 (med21–1, WiscDsLox461–464K13), SPT6L (spt6l-7, SAIL_59_G06) and TAF5 (taf5–4, SAIL_274_A04) were transformed with the floral dip method to generate integrase target lines, and then used to introduce each integrase construct into these established target lines. For T1 selection: 120 mg of T1 seeds (~2000 seeds) were sterilized using 70% ethanol and 0.05% Triton-X-100 and then washed using 95% ethanol. Seeds were resuspended in 0.1% agarose and spread onto 0.5X LS Bacto selection plates, using 25 μg/mL of kanamycin for target lines and 25 μg/mL kanamycin and 25 μg/mL hygromycin for lines with both the integrase and the target. The plates were stratified at 4 °C for 48 h then light pulsed for 6 h and covered for 48 h. They were then grown for 4–5 days. To select transformants, tall seedlings with long roots and a vibrant green color were picked from the selection plate with sterilized tweezers and transferred to a new 0.5X LS Phyto agar plate for characterization.