Increasingly efficient chromatin binding of cohesin and CTCF supports chromatin architecture formation during zebrafish embryogenesis
Data files
Jan 20, 2025 version files 24.35 GB
-
README.md
10.44 KB
-
Simulation.zip
24.19 GB
-
Single_molecule_data.zip
161.91 MB
Abstract
The three-dimensional folding of chromosomes is essential for nuclear functions such as DNA replication and gene regulation. The emergence of chromatin architecture is thus an important process during embryogenesis. To shed light on the molecular and kinetic underpinnings of chromatin architecture formation, we characterized biophysical properties of cohesin and CTCF binding to chromatin and their changes upon cofactor depletion using single-molecule imaging in live developing zebrafish embryos. We found that chromatin-bound fractions of both cohesin and CTCF increased significantly between the 1000-cell and shield stages, which we could explain through changes in both their association and dissociation rates. Moreover, increasing binding of cohesin restricted chromatin motion, potentially via loop extrusion, and showed distinct stage-dependent nuclear distribution. Polymer simulations with experimentally derived parameters recapitulated the experimentally observed gradual emergence of chromatin architecture. Our findings reveal molecular kinetics underlying chromatin architecture formation during zebrafish embryogenesis.
README: Increasingly efficient chromatin binding of cohesin and CTCF supports chromatin architecture formation during zebrafish embryogenesis
https://doi.org/10.5061/dryad.3bk3j9ks8
Single_molecule_data.zip
Description
The folder 'Single molecule data' contains mat files that include single-molecule tracking data used in the manuscript Cossmann et al., Increasingly efficient chromatin binding of cohesin and CTCF supports chromatin architecture formation during zebrafish embryogenesis.
It contains subfolders labeled with the respective illumination-scheme names or analysis names used in the publication:
CBD = Center-Border-Distance analysis for determining the relative positions of long-bound tracks
Continous = Continous movie analysis for bound fractions and diffusion coefficients
ITM = Interlaced time-lapse microscopy for relative fractions in binding time classes
TACO = time-lapse alternated with continuous intervals for mobolity of molecules in certain binding time classes
timelapse = time-lapse microscopy for residence times and bound fractions
The subfolders contain .mat files in Matlab format, each labeled in way that allows sorting by filename to attain ascending development times:
The first filename part is either given as a stage name (s):
s0064 = 64-cell stage
s0128 = 128-cell stage
s0256 = 256-cell stage
s0512 = 512-cell stage
s1024 = 1k-cell stage
s2048 = high stage
s4096 = oblong stage
s8192 = sphere stage
s9999 = shield stage
z24hpf = 24 hpf
Or as pooled stages:
pre-ZGA = 64-, 128-, 256, 512-cell stages pooled
post-ZGA = high, oblong, sphere stages pooled
shield stage
The second filename part describes the observed molecules:
HTRad21 = HaloTag-Rad21
HTCTCF = HaloTag-CTCF\
HTcontrol = HaloTag control\
MOcontrol = Standard Control morpholino
The third filename part describes wild-type or addiotional treatments:
WT = Wild-type
3xMut = 3x mutant
ZF47Mut = ΔZF4-7 mutant
withAAmanitin = addition of α-Amanitin
withTriptolide = addition of Triptolite
withCTCFMO = addition of ctcf-morpholino
withNIPBLMO = addition of nipl-morpholino
withWAPLMO = addition of wapl-morpholino
The fourth filename part decribes the illumination scheme described in the parent folder (and above):
ITM = Interlaced time-lapse microscopy
TL = time-lapse microscopy
CONT = Continuous
TACOshort= short scheme of time-lapse alternated with continuous intervals
TACOlong = long scheme of time-lapse alternated with continuous intervals
Steps to Repeat the Analysis:
The .mat files contain the variable 'batch'. This Matlab 'struct' array contains the data of one single-molecule movie.
The .mat files were created with the open source single-molecule tracking software TrackIt (https://gitlab.com/GebhardtLab/TrackIt) using the 'File > Save batch file as...' function and merged using the 'File > Merge multiple batch files' function.
The workflow starting from tracking to quantitative analysis can be visualized and repeated by running the 'TrackIt_v1_5_1.m' file followed by pressing 'File > Load batch file'. Alternatively, the batches can be loaded into TrackIts data analysis tool by running the 'data_analysis_tool.m' file followed by pressing 'Load batch .mat file(s)'
Simulation.zip
The Simulation folder contains a comprehensive collection of input data, modeling configurations, and Jupyter notebooks necessary to reproduce the analysis and output plots presented in the study.
Folder Contents
data
This folder contains the following subdirectories:
- experiment:
- Publicly available datasets are used for model parametrization and comparison. Includes:
- ChIP-seq Data:
- BED files (.bed) for CTCF ChIP-seq at two developmental stages: 24 hours post-fertilization (hpf) and the shield stage.
- Example file:
- chr18_CTCF_motif_sizegiven_extend5kb_with_GSE133437_CTCF_24hpf_combined_reps_all_peaks_clean.bed is a subset of the broader file GSE133437_CTCF_24hpf_combined_reps_all_peaks_clean_wt_shield_macs2_filtered_peaks.narrowPeak_commonChIP.bed.
- These are used in the Jupyter Notebooks for model parametrization (to obtain CTCF sites)
- Hi-C Data:
- Hi-C contact matrices stored in the mcool format for three developmental stages.
- Files:
- Wike2021*: Hi-C contact matrices in mcool format.
- danrer11*: Chromosome and chromosome arm lengths.
- These are used in the Jupyter Notebooks to create the experimental Hi-C contact maps
- modeling:
- To provide the output of our simulations, the modeling folder contains archives and subdirectories with the original trajectories output. These trajectories were converted to Contact Matrices (as described in the “Steps to Repeat the polymer simulation” below):
- Trajectories:
- trajectories_*.tar: Our simulated chromosome trajectories at 1kb and 10kb resolutions obtained from LAMMPS_LE.
- Contact Matrices:
- Subfolders 1_kb and 10_kb: Our contact matrices corresponding to specific resolutions and parameter sets.
- Parameter dictionary for folder naming i_j:
- i:value {2:0.6,1:3,0:15}, measured in 1/(mb⋅min).
- j:value {0:500,1:100,2:50,3:10}, measured in seconds.
- Contact matrices were obtained from trajectories using the cmdm script (as described in the “Steps to Repeat the polymer simulation” below):
- initial_setups
- Input parameter configurations needed to reproduce the simulation trajectories for 1_kb and 10_kb (see “Steps to Repeat the polymer simulation”)
- Parameter dictionary for folder naming i_j:
- i:value {2:0.6,1:3,0:15}, measured in 1/(mb⋅min).
- j:value {0:500,1:100,2:50,3:10}, measured in seconds.
notebooks
Contains two Jupyter notebooks (one for the main figures, one for the supplementary figures) and a Python script, utils.py, which includes helper functions to reproduce all figure panels and their underlying data. The environment.yml is used for environment setup (as described in the “Steps to Repeat Data Plotting using Jupyter Notebooks” below)
results
This folder is divided into two subfolders:
- figures: Contains PDF files of the plotted figures used in the manuscript and obtained from the Jupyter notebooks. Underlying data of the pdf figures is provided in the dat folder.
- dat: Contains the data underlying the pdf figures in multiple file formats (.csv, .txt, .dat). Filenames are descriptive and correspond to the associated figures.
Steps to Repeat the Polymer Simulation using LAMMPS_LE
- Modeling Details:
- The modeling process was conducted using LAMMPS, a molecular dynamics simulation software (https://www.lammps.org).
- A custom-built Loop Extrusion (LE) module, integrated within LAMMPS, was employed. Detailed information about the LAMMPS version, the module's implementation, custom parameters and installation process can be found in the accompanying GitHub repository (https://github.com/polly-code/lammps_le).
- Initial Parameters:
- Archives located in the folder data/modeling/initial_setups contain the input parameter configurations needed to reproduce the simulation trajectories. These parameter sets are foundational to the modeling workflow. Each subfolder contains the following files:
polymer.lam
: The main configuration file where all simulation parameters are specified. For more details, refer to the official LAMMPS documentation (https://www.lammps.org) and the README for the version with loop extrusion available at polly-code/lammps_le (https://github.com/polly-code/lammps_le).run.sh
: A bash script used to schedule the job on the UGE scheduler.zebrafish_chr18
: The initial spatial arrangement of the system.
- To start a simulation, use the following command as an example:
- /path/to/lammps/lmp -in polymer.lam
- Replace /path/to/lammps/ with the actual path to your LAMMPS installation.
- Archives located in the folder data/modeling/initial_setups contain the input parameter configurations needed to reproduce the simulation trajectories. These parameter sets are foundational to the modeling workflow. Each subfolder contains the following files:
- Output files
- Since every simulation has different seeds for the random number generators and rounding errors, we provide the output trajectory files from our simulation in the data folder as trajectories_1kb.tar (for the 1 kb simulations) and trajectories_10kb.tar (for the 10 kb simulations)
- These trajectories were processed to contact matrices (located in the data/modeling/1_kb and data/modeling/10_kb folders) using the cm_dm.cpp utility located in the folder src/contact_distance_maps/.
- This script needs to be compiled using the MPI and GCC compiler (tested with OpenMPI-3.1.4 and GCC-8.3.0) by executing:
- mpicxx cm_dm.cpp -o cmdm
- After compiling, you can run the executable cmdm with the --help argument to see a list of all available options. Here is an example command to process trajectory files using 40 CPU cores:
- mpiexec -np 40 path/to/contact_distance_maps/cmdm -p path/to/mytraj.lammpstrj -e 1 -rc 5
Steps to Repeat Data Plotting using Jupyter Notebooks
- Environment Setup:
- Set up a conda or mamba environment using the provided environment.yml file (in the notebooks folder). This file specifies all required dependencies and their versions, ensuring a consistent computational environment for the Jupyter notebooks.
- Start Jupyter Notebooks located in the notebooks folder.
- Use main_fig_modeling.ipynb to reproduce the main figure panels.
- Use Supplementary_modeling_1.ipynb to reproduce the supplementary figure panels.
- The jupyter notebooks output PDF files (located in figures folder) and their underlying plotting data (located in the dat folder, provided as .txt, .csv or .dat files)
- All file paths have been preset in the notebooks and the obtained PDF files are still visible as reference.
- Input files for notebooks
- Experimental Hi-C data from Wike2021 provided as .mcool files (in data/experiment)
- Experimental ChiP-seq data provided as .bed files (in data/experiment))
- Simulated contact maps (in data/modeling/1_kb or data/modeling/10_kb)