Evaluating transcriptomic integration for cyanobacterial constraint-based metabolic modelling
Data files
Mar 26, 2025 version files 68.19 MB
-
Baseline_import_sol_map.csv
1.79 MB
-
Baseline_sol_map.csv
30.86 MB
-
Baseline_threshold_sol_map.csv
1.41 MB
-
Lazy_import_MAX.csv
684 B
-
Lazy_import.csv
685 B
-
Lazy_MAX.csv
2.97 KB
-
Lazy_threshold_MAX.csv
2.31 KB
-
Lazy_threshold.csv
2.31 KB
-
Lazy.csv
3.12 KB
-
Linear_MAX.csv
1.40 KB
-
Linear.csv
1.40 KB
-
Max_import_sol_map.csv
1.93 MB
-
Max_sol_map.csv
30.77 MB
-
Max_threshold_sol_map.csv
1.41 MB
-
README.md
6.75 KB
Abstract
Metabolic modelling has wide-ranging applications, including for the production of high-value compounds, understanding complex disease and analysing community interactions. Integrating transcriptomic data with genome-scale metabolic models is crucial for deepening our understanding of complex biological systems, as it enables the development of models tailored to specific conditions, such as particular tissues, environments, or experimental setups. Relatively little attention has been given to the assessment of such integration methods in predicting intracellular fluxes. While a few validation studies offer some insights, their scope remains limited, particularly for organisms like cyanobacteria, for which little metabolic flux data are available. Cyanobacteria hold significant biotechnological potential due to their ability to synthesize a wide range of high-value compounds with minimal resource inputs.
The impact of specific methodological decisions on integration, however, has scarcely been assessed beyond human models, with no thorough exploration of parameter choices in valve-based integration methods. By implementing a novel analysis pipeline, we evaluated these methodological decisions using the genome-scale model for Synechocystis sp. PCC 6803 (iSynCJ816 [Joshi et al., 2017 doi.org/10.1016/j.algal.2017.09.013]) with existing transcriptomic data in biomass-optimised scenarios. Our analyses indicate that selecting an appropriate integration method may not always be straightforward and depends on the initial model configuration - a factor which is often overlooked during integration. By evaluating sets of methods, we identified a trade-off between the buffering of light into the system and maintenance of flux near system boundaries. Our findings also highlighted how selection of an appropriate integration method likely depends the choice of configuration, emphasising the need to consider both together.
Dataset DOI: 10.5061/dryad.n2z34tn7d
Description of the data and file structure
Overview
This dataset accompanies the study “Evaluating Transcriptomic Integration for Cyanobacterial Constraint-based Metabolic Modelling” where we computed time-series growth rates using Flux Balance Analysis (FBA) to validate the performance of methods for integrating transcriptomic data with a metabolic flux model, iSynCJ816 [Joshi et al., 2017].
Files and variables
The provided datasets include:
- Full flux distributions (mmol per gram dry cell weight per hour) computed for each integration method
- List of p-values for each integration method (including scaling strategy)
(Integration method labelling)
Name | Mapping Function | Thresholding | Scaling |
---|---|---|---|
Lazy | Lazy-step | No | One-size-fits-all |
Lazy threshold | Lazy-step | Yes | One-size-fits-all |
Lazy importance | Lazy-step | No | Reaction Specific |
Linear | Linear | No | One-size-fits-all |
Data Files and Descriptions
Flux Distribution Files (sol_map Files)
Each file contains full metabolic flux distributions predicted by FBA (maximising for biomass production) for different integration methods.
Filename | Integration method(s) |
---|---|
Baseline_sol_map.csv | Lazy, Linear |
Max_sol_map.csv | Max Lazy, Max Linear |
Baseline_threshold_sol_map.csv | Lazy Threshold |
Max_threshold_sol_map.csv | Max Lazy Threshold |
Baseline_import_sol_map.csv | Lazy Importance |
Max_import_sol_map.csv | Lazy Max Importance |
Variable Naming Format
CyanoExpress label | Description | Sample size | Reference |
---|---|---|---|
WT_Cd | Cadmium stress | 9 | Houot et al., 2007 |
WT_blue_red | Blue light growth | 6 | Singh et al., 2009 |
WT_HL | High light stress | 6 | Singh et al., 2008 |
crhR_low_temperature | CrhR-mutant; Low temperature | 3 | Prakash et al., 2010 |
WT_S_Starvation_HEPES | Sulphur starvation (no HEPES) | 3 | Zhang et al., 2008 |
WT_S_Starvation | Sulphur starvation | 7 | Zhang et al., 2008 |
WT_Fe_depletion | Iron stress | 5 | Hernandez-Prieto et al., 2012 |
Each variable follows the format: CONDITION_TIMEPOINT_METHOD_SCALING
- CONDITION: Experimental condition from which transcriptomics were derived – condition labels same as in CyanoExpress2.3 (http://cyanoexpress.sysbiolab.eu/)
- TIMEPOINT: Time at which cells were harvested
- METHOD: Integration method which was applied (if no method is labelled it is assumed to be the Lazy step mapping function)
- SCALING: Scaling strategy applied; a value representing either:
- gamma value (one-size-fits-all approaches)
- max_importance value (importance-based methods)
All flux value units are mmol/gDCW/hr and reaction indexing is the same as in iSynCJ816: index 74 the maximised autotrophic biomass reaction.
P-value list
Growth predictions extracted from sol_map files were max-min normalised and compared to experimental data using Dynamic Time Warping (DTW). The statistical significance of each prediction was determined using dataset-specific null distributions (Supplementary figure 4) and resulting p-values are stored in the following files. The names of these files relate directly to integration method being assessed. Conditions are labelled as listed in Table 2 of the associated manuscript.
Lazy.csv
Linear.csv
Lazy_MAX.csv
Linear_MAX.csv
Lazy_threshold.csv
Lazy_threshold_MAX.csv
Lazy_import.csv
Lazy_import_MAX.csv
Code/software
no specialist software
Access information
Data was derived from the following sources:
-
CyanoExpress (http://cyanoexpress.sysbiolab.eu/), within which the data came from the following studies:
Chintan J Joshi, Christie AM Peebles, and Ashok Prasad. Modeling and analysis of flux distribution and bioproduct formation in synechocystis sp. pcc 6803 using a new genome-scale metabolic reconstruction. Algal research, 27:295–310, 2017
Laetitia Houot, Martin Floutier, Benoit Marteyn, Magali Michaut, Antoine Picciocchi, Pierre Legrain, Jean-Christophe Aude, Corinne Cassier-Chauvat, and Franck Chauvat. Cadmium triggers an integrated reprogramming of the metabolism of synechocystis pcc6803, under the control of the slr1738 regulator. BMC genomics,8:1–16, 2007.
Abhay K Singh, Maitrayee Bhattacharyya-Pakrasi, Thanura Elvitigala, Bijoy Ghosh, Rajeev Aurora, and Himadri B Pakrasi. A systems-level analysis of the effects of light quality on the metabolism of a cyanobacterium. Plant physiology, 151(3):1596–1608, 2009.
Abhay K Singh, Thanura Elvitigala, Maitrayee Bhattacharyya-Pakrasi, Rajeev Aurora, Bijoy Ghosh, and Himadri B Pakrasi. Integration of carbon and nitrogen metabolism with energy production is crucial to light acclimation in the cyanobacterium synechocystis. Plant physiology, 148(1):467–478, 2008.
Jogadhenu SS Prakash, Pilla Sankara Krishna, Kodru Sirisha, Yu Kanesaki, Iwane Suzuki, Sisinthy Shivaji and Norio Murata. An rna helicase, crhr, regulates the low-temperature-inducible expression of heat-shock genes groes, groel1 and groel2 in synechocystis sp. pcc 6803. Microbiology, 156(2):442–451, 2010.
Zhigang Zhang, Ninad D Pendse, Katherine N Phillips, James B Cotner, and Arkady Khodursky. Gene expression patterns of sulfur starvation in synechocystis sp. pcc 6803. BMC genomics, 9:1–14, 2008.
Miguel A Hern´andez-Prieto, Verena Sch¨on, Jens Georg, Lu´ısa Barreira, Jo˜ao Varela, Wolfgang R Hess, and Matthias E Futschik. Iron deprivation in synechocystis: inference of pathways, non-coding rnas, and regulatory elements from comprehensive expression profiling. G3: Genes— Genomes— Genetics, 2(12):1475–1495, 2012.
Expression profiles deposited on CyanoExpress 2.3 (pre-processing and normalisation details available from: http://cyanoexpress.sysbiolab.eu/) were screened for their suitability for downstream analysis. Each transcriptomic dataset was required to have expression profiles from a minimum of 3 independent timepoints and have paired OD data to infer growth rates from (within their associated source papers). WebPlotDigitizer was used to extract experimental data (Supplementary) from published OD plots [Ankit Rohatgi.Webplotdigitizer.]. Transcriptomic log-fold change values were converted to fold-change (so that wild-type expression was equal to 1) before applying integration methods.