Chromatin conformation, gene transcription, and nucleosome remodeling as an emergent system
Data files
Jan 14, 2025 version files 292.84 MB
-
Figure_1.zip
1.20 MB
-
Figure_2.zip
3.58 MB
-
Figure_3.zip
2.72 MB
-
Figure_4.zip
12.69 MB
-
Figure_5.zip
145.07 MB
-
Figure_6.zip
54.76 MB
-
Figure_7.zip
65.34 MB
-
Figure_8.zip
7.46 MB
-
README.md
17.98 KB
Abstract
In single cells, variably sized nanoscale chromatin structures are observed but it is unknown if these form a cohesive framework that regulates RNA transcription. Herein we demonstrate that the human genome is an emergent, self-assembling, reinforcement learning system. Conformationally-defined heterogenous, nanoscopic packing domains form by the interplay of transcription, nucleosome remodeling, and loop extrusion. We show that packing domains are not topologically associated domains. Instead, packing domains exist across a structure-function life-cycle that couples heterochromatin and transcription in situ, explaining how heterochromatin enzyme inhibition can produce a paradoxical decrease in transcription by destabilizing domain cores. Applied to development and aging, we show the pairing of heterochromatin and transcription at myogenic genes that could be disrupted by nuclear swelling. In sum, packing domains represent a foundation to explore the interactions of chromatin and transcription at the single cell level in human health.
README: Chromatin conformation, gene transcription, and nucleosome remodeling as an emergent system
https://doi.org/10.5061/dryad.b8gtht7p0
Description of the data and file structure
Below is a brief summary of the included data:
Acronyms ActD refers to Actinomycin D, RAD21 refers to the cohesin protein 'RAD21', GSK refers to the EZH2 inhibitor GSK343, and TSA refers to the histone deacetylase inhibitor, trichostatin A. Unless otherwise specified, RNA Polymerase II in the single molecule localization microscopy images refers to Serine-2 phosphorylated protein state.
All of the .tif and .jpg files can be accessed with freely available ImageJ or FIJI software packages. .xlsx files can be opened with Excel.
Figure 1.zip)
Image files of chromatin electron microscopy tomograms (images) from cells are included:
A549.tif represents the chromatin electron microscopy tomogram of an A549 cell
BJ.tif represents the tomogram of a BJ fibroblast cell
HCT116.tif represents the tomogram of an HCT-116 cell respectively.
A549 packing domain.jpg refers to a slice of a packing domain tomogram imaged in A549 cells
The file CVCvsR.xlsx can be accessed using the the code LogLog Domains.nb. CVCvsR.xlsx contains the data representing the gray scale density of chromatin as a function of the radius (column 1) for 3 respective domains (r86CVC020, r68CVC021, and r72CVC018). LogLog Domains.nb is a Mathematica script containing the code to open this file and plot the log-log distance vs density for each domain.
Figure 2.zip)
Enzyme sizes.xlsx is a multi-sheet excel file containing the properties of chromatin remodeling enzymes with respect to their calculated radius of gyration and molecular mass.
Analysis Sheet contains the radius of gyration values for heterochromatin enzymes compared to euchromatin enzymes on average in angstroms with a paired, two tailed t-test calculating the difference in size. The Spherical approximation represent the assumed volume of the enzymes on average with the t-test as above. The mass in daltons is the molecular weight with the t-test as above. Adjacent to these comparisons are the observed values for transcription factors (TFs) and polymerases (Pol) with their average and standard deviations reported for their radius of gyration, spherical approximate size, and observed mass.
Plotting Sheet contains the columns detailing the described function of the protein (class) categorized as euchromatin (EU), heterochromatin (HC), transcription factor (TF) or polymerase (P). The name of the protein is listed under the column 'Protein'. The radius of gyration is calculated in angstrom (Mean Rg Angstrom), in nanometers (Rg nm), and the radius of gyration squared (R_2_g). The approximate volume is listed in 'Spherical Volume". The mass is then listed in kilodaltons (Mass kDa) and daltons (Mass daltons). The accompanying plot of the size of each class of enzyme is included in this sheet plotting mass vs the radius of gyration in nanometers.
The sheet AlphaFold Eu contains the information on the enzyme (protein), the organism it was generated from (organism), the link to the generated protein structure (Link), the protein database bank name (PDB Name), the observed radius of gyration in angstrom (Mean Rg Angstrom), the approximate spherical volume (Spherical Volume nm^3) and the mass (Mass Dalton).
The sheets AlphaFold Het, AlphaFold TFs, and Polymerases are structured in the same manner as AlphaFold Eu but correspond to the respective protein in each class (het - heterochromatin, TF - transcription factors, Polymerases - RNA polymerases).
The folder containing AlphaFold PDB has 3 subdirectories and a script Calculate Rg Enzymes.py that are the protein data bank structures of the enzymes reported in the excel spreadsheet. The folders that are loaded by the script Calcualte Rg Enzymes.py correspond to the structures for each respective enzyme class. The included list of .pdb files in each folder correspond to the links and databank listed in the accompanying excel spreadsheet. This script can be run by Python v 3.0. The structure of the proteins in each .pdb file can be viewed with the open access visualization tool, Visual Molecular Dynamics (VMD).
The file SR-EV Domain Penetration Results.nb is a Mathematica file that can be run using Mathematica v13.0. This file contains the observed penetration behavior of enzymes as a function of their size with the equations included as sig3, sig6, sig9 and sig12 corresponding to proteins with a diameter of 3nm, 6nm, 9nm, and 12nm respectively. This can alternatively be transformed into a script for R or Python software to generate the corresponding plots.
Figure 3.zip)
This folder contains the script PM_Figures_Vfinal.ipynb which loads the csv files within the zip folder Version1.5_ps0.1.
These .csv files are as follows and can be opened with any text editing tool but are designed as inputs for the python script above:
Dpd_3hr, Dpd_8hr, Dpd_100hr are the chromatin scaling for a packing domain within the returns model at 3 hours, 8 hours and 100 hours respectively. Dpd_ActD_100hr is the behavior of the domains in the absence of transcription at 100 hours. EntropicReturns_3hr, EntropicReturns_8hr, EntropicReturns_100hr are the rates of entropic returns at the respective time points. EntropicReturns_ActD_100hr is the rate of entropic returns in the absence of transcription at 100 hours. TransReturns_3hr, TransReturns_8hr, TransReturns_100hr are the rates of transcriptionally generated returns at the respective time points. TransReturns_ActD_100hr is the rate of transcriptionally generated returns in the absence of transcription at 100 hours. TotalReturns_3hr, TotalReturns_8hr, TotalReturns_100hr are the rates of all generated returns at the respective time points. TotalReturns_ActD_100hr is the rate of all generated returns in the absence of transcription at 100 hours.
PackingEff_3hr, PackingEff_8hr, PackingEff_100hr are the modeled packing efficiency for a packing domain within the returns model at 3 hours, 8 hours and 100 hours respectively. PackingEff_ActD_100hr is the packing efficiency behavior of the domains in the absence of transcription at 100 hours.
EuchPhi_3hr, EuchPhi_8hr, EuchPhi_100hr are the modeled euchromatin density within the returns model at 3 hours, 8 hours and 100 hours respectively. EuchPhi_ActD_100hr is the euchromatin density of the domains in the absence of transcription at 100 hours. HChPhi_3hr, HChPhi_8hr, HChPhi_100hr are the modeled heterochromatin density within the returns model at 3 hours, 8 hours and 100 hours respectively. HChPhi_ActD_100hr is the heterochromatin density of the domains in the absence of transcription at 100 hours.
Figure 4)
Hi-C data is located at GEO GSE279166.
Data for ChromSTEM and PWS is uploaded in accompanying folders as described below.
PWS Data folder:
PWS tabulated data is in prism format in the file Cody and Emily_Planetary Model_CD Edits.prism and can be opened with PRISM viewer software. The folder contains the scaling of chromatin packing, fractional moving mass, and and ensemble diffusion values for cells obtained with dual-PWS microscopy. Within this folder are RAD21 depleted cells containing respective PWS images from control cells (control rad21.nuc.tif) and for RAD21 depleted cells (rad21 depleted.tif). These are pseudocolored PWS images scaled for D values between 2 and 3 and can be opened with any image viewing software. Also included is a folder, ActD that represents pseudocolored PWS images for cells treated with Actinomycin D (ActD treated.tif) compared to controls (ActD Control.tif). The values are scaled as above between 2 and 3.
ActD Folder:
This contains a domain slice actD_treated.jpg which is an image file that can be opened with any image viewer. ActD Domain Lifecycle.nb is a Mathematica script that can be run with Mathematica v13 to load the data stored within the excel spreadsheet BJActD_Domains_2024.xlsx which contains two sheets. For analysis, the included Mathematica script containing the code to open this file and calculate the distribution of domain radius compared to packing efficiency as a reflection of the domain life cycle for these excel files adjusted for the imaged volume on chromatin electron microscopy due to differences in the imaged area in ActD cells compared to Rad21 cells. This code can be run with Mathematica v13 or alternative opened with a text editor and transformed into comparable scripts in python or R.
The BJActD_Domains_2024.xlsx file has two sheets, BJ_DMSO and BJ_ActD represents the values for domains in the control cells compared to ActD treated cells, respectively. Domain properties include the average intensity within the chromatin volume concentration (binarized CVC), domain (grayscale_density), the scaling of chromatin packing values (D), the radius of the domain (domain radius), the packing efficiency, and the radius of the chain (Rmin). These data files can be opened with Microsoft excel or other open access text editing tools (text editor) for viewing.
Rad21 Folder:
This contains a domain slice rad21_depleted.tif which is an image file that can be opened with any image viewer. Rad21 Domain Lifecycle.nb is a Mathematica script that can be run with Mathematica v13 to load the data within the folder Data for code. For analysis, the included Mathematica script containing the code to open these files and calculate the distribution of domain radius compared to packing efficiency as a reflection of the domain life cycle for these excel files adjusted for the imaged volume on chromatin electron microscopy. This code can be run with Mathematica v13 or alternative opened with a text editor and transformed into comparable scripts in python or R.
The control packing efficiency.xlsx represents the values for domains in the control cells whereas RAD21 packing efficiency.xlsx represents the RAD21 depleted domain properties. Domain properties include the average intensity within the chromatin volume concentration (binarized CVC), domain (grayscale_density), the scaling of chromatin packing values (D), the radius of the domain (domain radius), the packing efficiency. These data files can be opened with Microsoft excel or other open access text editing tools (text editor) for viewing.
Figure 5.zip)
ChipSeq folder contains a subfolder with the same name. Within this folder is ChiP_Pm.nb, a mathematica script that can be run with Mathematica v 13 to access the files included within the folder labeled 'data'. Within this folder are subfolders corresponding to chromatin markers h3k4me3, h3k9me3, h3k27ac, h3k27me3, pol2rps5 that each contain a .bed file corresponding to the location of chromatin peaks for each of these proteins on chromatin immunoprecipitation assays. These files were originally obtained from ENCODE. The ChiP_Pm.nb needs a ncbiRefSeq.xlsx, which is an excel spreadsheet containing the coordinates for human genes obtained from the RefSeq reference human genome. This file can be obtained from URL: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ncbiRefSeqSelect.txt.gz. and then converted into the correct format (.xlsx if needed). It then measures the distance between RNA polymerase (pol2rps5) to the respective chromatin marks as a function of the polymerase density. It separately calculates the correlation on a per-chromosome bases between these respective marks.
Figure 5 SMLM contains the single molecule localization microscopy imaging for control cells (dmso folder), ActD treated cells (act-d folder), gsk treated cells (gsk folder) and tsa treated cells (tsa). The ActD folder only contains .csv files. Within each folder are subfolders containing the imaged cells with output containing at least the reconstruction files which end in .csv with the header k9 referring to H3K9me3, Pol2 referring to RNA polymerase II, and K27ac referring to H3K27ac. These files can be visualized using FIJI with the ThunderSTORM plug in for reconstruction. In some specific folders, generated images are present Composite.png is a representative multicolor image. In addition, the first folder of GSK, TSA, and DMSO (Cell 1 folder) contain a subfolder called donut. Within this folder are the reconstruction parameters used to generate the annular (donut) analysis and the resulting outputs as .tif or .png files. The .csv files again refer to localization outputs that can be loaded in ImageJ or FIJI using thunderstorm. The .roi files represent the regions of interest that can be loaded into ImageJ. These .roi files were used from the composite to make the inset visuals (labeled zoomedin). In some cases, there is another subfolder referred to as donut2 which represents an alternative field of view from the image.
Figure 6.zip)
This contains a folder, called 'Data' that contains the following files.
PWS images from HCT116 control cells (PWS bapta control.tif) and for HCT116 magnesium chelated cells (PWS bapta treated.tif). These are pseudocolored PWS images scaled for D values between 2 and 3 and can be opened with any image viewing software.
Ion Data for Paper.xlsx is an excel spreadsheet containing 3 data sheets. The sheet 'D' contains the PWS observed scaling values for control cells (Control) compared to BAPTA treated (10um BAPTA-AM) and APDAP treated cells (10um APDAD-AM) to chelate magnesium and calcium ions within the cell. The sheet 'FMM' contains the observed fractional moving mass values observed with PWS microscopy for the groups as above. The sheet 'Nuclear Volume' contains the observed nuclear volumes in cubic microns from confocal microscopy imaging of the respective cells with the header as above
Two single molecule localization microscopy images are within this folder representing control cells (HCT116_CTRL_Comp_50laser_30expt_cell2_ThunderStorm_thre2.tif) and BAPTA treated cells (HCT116_BAPTA_Comp_50laser_30expt_cell4_ThunderSTORM_thre2.tif). These can be viewed using ImageJ or FIJI respectively. The folder 'ctrl' is the data containing control images of cells. In each folder is a set of cell folders, each containing .csv files and .tif files. The .csv files are the output from the super resolution imaging and can be opened with FIJI using ThunderSTORM plug in. K9 files refer to H3K9me3 staining and Pol2 refers to RNA polymerase II staining. The folder, 'treated' refers to BAPTA treated cells with the same syntax as that in the control folder, however, the images are .jpg files instead.
Figure 7.zip)
This contains a folder, SMLM, which includes 3 sub folders referring to control cells (DMSO_EU_K9), GSK treated cells (GSK_EU_K9), and TSA treated cells (TSA_EU_K9). Here K9 refers to H3K9me3 staining and EU refers to 5-Ethynyl Uridine staining (nascent RNA synthesis).
The respective cells in each folder contain the imaged cells for each molecule with the composite.tif representing the merged file for both markers and can be viewed with FIJI or ImageJ. The respective .csv files are those used for the generation of reconstructions using FIJI plugin ThunderSTORM and the .txt files are the reconstruction parameters. In each of these cell folders is a subfolder called 'Non-lamin-nucleolus'. It contains reconstructions of the lamin (nuclear boundary) and are titled EU_lamin.tif, EU_lamin.csv and EU_lamin-protocol.txt and of the nucleolus (nucleolus area) and these are called EU_nucleolus.tif, EU_nucleolus.csv and EU_lamin-nucleolus-protocol.txt. The .tif file is the resulting image, the .csv are the output from the super resolution imaging, and the .txt files are the reconstruction protocol.
Figure 8.zip)
Density Map.png is an output file generated by the Stochastic Returns Exclude Volume model where the intensity is calculated for the contact of a nucleosome bead with its neighbors. The intensity scales from zero neighbors (blue, CN =0) to a maximum of 12 (red, CN = 12). The coordination number represents the local chromatin density. The file myh 1configuration introns.png is an output image from the Stochastic Returns Excluded Volume model demonstrating the local density of exons (scaled from blue to red as above) with the introns all colored in yellow. The file myh1 configuration exons.png is an output image from the Stochastic Returns Excluded Volume model demonstrating the local density of introns (scaled from blue to red as above) with the exons all colored in green. These .png files can be viewed with any image viewing tool
The folder Panel J and K contains the file SREV_CN.nb is a Mathematica Script that can be opened with Mathematica v13 to load the data files included to analyze the local density (coordination number) of the exons as a function of the excluded volume of the nucleus. The files exon-av_coord-12-115.dat are data files that can be viewed with any text editor. These files correspond to a volume fraction of 0.12 when they contain '012-115' and a volume fraction of 0.16 when they contain '016-115' in their file names.
Code/software
Code included is in python version 3.0, Mathematica v 12, and Prism viewed formats. The respective tools to view the respective files are further detailed above.
Access information
Other publicly accessible locations of the data:
- GEO data for Hi-C as described within each figure in .readme above
Data was derived from the following sources:
- ENCODE data as described within the manuscript in supplemental tables
- Reference genome: https://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/ncbiRefSeqSelect.txt.gz
Methods
The data was collected across a range of techniques including chromatin scanning transmission electron microscopy (ChromSTEM), multiplexed single molecule localization microscopy (SMLM), partial wave spectroscopic (PWS) microscopy), Hi-C, and ChIP-Seq. Details of the collection and analysis are within the materials and methods of the manuscript for each method.