Data for: Irreversibility in bacterial regulatory networks
Data files
Jul 29, 2024 version files 808.40 MB
Abstract
Irreversibility, in which a transient perturbation leaves a system in a new state, is an emergent property in systems of interacting entities. This property has well-established implications in statistical physics but remains underexplored in biological networks, especially for bacteria and other prokaryotes whose regulation of gene expression occurs predominantly at the transcriptional level. Focusing on the reconstructed regulatory network of Escherichia coli, we examine network responses to transient single-gene perturbations. We predict irreversibility in numerous cases and find that the incidence of irreversibility increases with the proximity of the perturbed gene to positive circuits in the network. Comparison with experimental data suggests a connection between the predicted irreversibility to transient perturbations and the evolutionary response to permanent perturbations.
README: Data for: Irreversibility in bacterial regulatory networks
https://doi.org/10.5061/dryad.547d7wmhg
This dataset contains example results and transcriptional data supporting the results of the paper "Irreversibility in Bacterial Regulatory Networks". The code for processing the data can be accessed at Zenodo https://dx.doi.org/10.5281/zenodo.12775479.
Description of the data and file structure
The files below are designed to be processed with the software above. The working directory "./" is assumed to be "irr_grn/" in the file paths below.
Name | Description | Format | Target directory | Other details |
---|---|---|---|---|
att_txt_files.tar.gz | attractor text files | Tarball of plain text files | ./attfiles/ | Each file in the tarball contains the attractors for a realization of the Boolean rules |
1st_csv_files.tar.gz | attractor 1st state files | Tarball of CSV files | ./attfiles/ | Each file in the tarball contains the first state of each attractor for a realization of the Boolean rules |
example_netfiles.tar.gz | example Boolean network files | Tarball of plain text files | ./netfiles/ | Each file describes the Boolean rules for a realization of the ensemble |
result_attr_trans.tar.gz | attractor transitions results | Tarball of CSV files | ./results/ | Each file contains the attractor transitions for a given realization |
result_KO_twoparam.tar.gz | knockout perturbation results | Tarball of CSV files | ./results/ | Each file contains the irreversibility of each knockout for each attractor encoded as a 1 (irreversible) or 0 (reversible) |
result_OE_twoparam.tar.gz | overexpression perturbation results | Tarball of CSV files | ./results/ | Each file contains the irreversibility of each overexpression for each attractor encoded as a 1 (irreversible) or 0 (reversible) |
changed_pre_twoparam.tar.gz | changed genes results | Tarball of CSV files | ./results/ | Each file contains the number of irreversible response genes for each perturbation |
a1_irrgn_crp.tar.gz | Intermediate attractors reached after perturbation | Tarball of CSV files | ./results/crp/ | |
irrev_changed_crp.tar.gz | Irreversible response genes to crp knockout | Tarball of CSV files | ./results/crp/ | |
rev_changed_crp.tar.gz | Reversible response genes to crp knockout | Tarball of CSV files | ./results/crp/ | |
irrgn_crpKO.tar.gz | Irreversibility results for crp knockout | Tarball of CSV files | ./results/crp/ | |
hamming_summaries_median.pkl | Hamming weights for each transcriptional state averaged across attractors | Python pickle of a pandas dictionary | ./results/crp/ | Each dictionary entry contains a key of the form (Bioproject, perturbation_number, state_label) and each column corresponds to the Hamming weight of each gene. The state label can be one of: 'irr' (irreversible), 'rev' (reversible), or 'unch' (unchanged). Weights are determined by calculating the distance of the pre-and-post crp knockout attractor to observed transcriptional states |
boltz_summaries_median.pkl | Boltzmann weights for each transcriptional state averaged across attractors | Python pickle of a pandas dictionary | ./results/crp/ | Each dictionary entry contains a key of the form (Bioproject, perturbation_number, state_label) and each column corresponds to the Hamming weight of each gene. The state label can be one of: 'irr' (irreversible), 'rev' (reversible), or 'unch' (unchanged). Weights are determined by calculating the distance of the pre-and-post crp knockout attractor to observed transcriptional states |
attractor_args_d.pkl | Boltzmann and Hamming weights for each attractor | Python pickle of a pandas dictionary | ./results/crp/ | Contains the weights determined by calculating the distance of the pre-and-post crp knockout attractor to observed transcriptional states |
gene_status_df.pkl | Whether a gene is reversible, irreversible, or unchanged upon crp knockout | Python pickle of a pandas DataFrame | ./results/crp/ | :- |
all_logtpm_dat.pkl | Log transcript-per-million data | Python pickle of a pandas DataFrame | ./tmp/ | Transcriptional data where rows are labeled by SRA Run/Experiment accession and columns are labeled by gene symbols |
ecoli_metadata.pkl | Transcriptional metadata | Python pickle of a pandas DataFrame | ./tmp/ | Metadata associated with each row of "all_logtpm_dat.pkl" |
gsym_thr_ser.pkl | Thresholds for binarizing genes | Python pickle of a pandas Series | ./tmp/ | Thresholds to distinguish active and inactive genes, determined by looking at histograms of expression across the dataset |
gsym_bias_ser.pkl | Probability of genes being expressed above threshold | Python pickle of a pandas Series | ./tmp/ | List of gene symbols and the frequency with which they are expressed above the threshold in "gsym_thr_ser.pkl" |
gsym_biasz_ser.pkl | Probability of genes being expressed above zero | Python pickle of a pandas Series | ./tmp/ | List of gene symbols and the frequency with which they are expressed above zero in the transcriptional dataset |
overall_output_d.pkl | averages over resampled different network reconstructions | Python pickle of a dictionary | ./tmp/ | Contains the summary statistics of random realizations |
basinSizes.tar.gz | sizes of attractor basins | Tarball of TSV files | ./basins/ | Each file lists one attractor per row (indicated by the row number where its first state is listed in the "1st" files) with its associated basin size |
basinFreqs.tar.gz | frequencies of basin sizes | Tarball of TSV files | ./basins/ | Each file lists one attractor per row (indicated by its binary string of its first state as listed in the "1st" files) with its associated basin size |
Abbreviation: crp = "cyclic adenosine monophosphate receptor protein"
Sharing/Access information
All raw sequencing data is publicly available. It may be downloaded from the Sequencing Read Archive (https://www.ncbi.nlm.nih.gov/sra) using the accession numbers in the E. coli metadata table.
The remaining results files may be obtained independently by following the documentation in the GitHub repository (https://github.com/yizhao-nu/Irreversibility-in-GRN/) associated with the paper.
Code/Software
Software for processing the data can be found at the Zenodo and GitHub links above
Methods
This dataset contains the results of the Boolean network simulation of irreversibility associated with Science Advances manuscript ado3232. All of the files can be regenerated from the repository stored at GitHub https://github.com/yizhao-nu/Irreversibility-in-GRN/ and Zenodo (DOI: https://dx.doi.org/10.5281/zenodo.12775479) with the exception of the files "all_logtpm_dat.pkl" and "ecoli_metadata.pkl", which are tables of log-transformed transcriptional data and the associated metadata, respectively. These latter files are derived from downloading and processing publicly available raw sequencing data (and associated metadata) obtained from the National Center for Biotechnology Information's Sequencing Read Archive. Raw reads were aligned to the E. coli K12 MG1655 genome (NC.000913.3) using Rockhopper (https://cs.wellesley.edu/~btjaden/Rockhopper/). The purpose of providing these results files is to facilitate reproducibility of the paper's results for users who do not have access to a computer cluster to redo the simulations and/or reprocess the raw sequencing data.