Metabolic modelling has wide-ranging applications, including for the production of high-value compounds, understanding complex disease and analysing community interactions. Integrating transcriptomic data with genome-scale metabolic models is crucial for deepening our understanding of complex biological systems, as it enables the development of models tailored to specific conditions, such as particular tissues, environments, or experimental setups. Relatively little attention has been given to the assessment of such integration methods in predicting intracellular fluxes. While a few validation studies offer some insights, their scope remains limited, particularly for organisms like cyanobacteria, for which little metabolic flux data are available. Cyanobacteria hold significant biotechnological potential due to their ability to synthesize a wide range of high-value compounds with minimal resource inputs.

The impact of specific methodological decisions on integration, however, has scarcely been assessed beyond human models, with no thorough exploration of parameter choices in valve-based integration methods. By implementing a novel analysis pipeline, we evaluated these methodological decisions using the genome-scale model for Synechocystis sp. PCC 6803 (iSynCJ816 [Joshi et al., 2017 doi.org/10.1016/j.algal.2017.09.013]) with existing transcriptomic data in biomass-optimised scenarios. Our analyses indicate that selecting an appropriate integration method may not always be straightforward and depends on the initial model configuration - a factor which is often overlooked during integration. By evaluating sets of methods, we identified a trade-off between the buffering of light into the system and maintenance of flux near system boundaries. Our findings also highlighted how selection of an appropriate integration method likely depends the choice of configuration, emphasising the need to consider both together.

Expression profiles deposited on CyanoExpress 2.3 (pre-processing and normalisation details available from: http://cyanoexpress.sysbiolab.eu/) were screened for their suitability for downstream analysis. Each transcriptomic dataset was required to have expression profiles from a minimum of 3 independent timepoints and have paired OD data to infer growth rates from (within their associated source papers). WebPlotDigitizer was used to extract experimental data (Supplementary) from published OD plots [Ankit Rohatgi.Webplotdigitizer.]. Transcriptomic log-fold change values were converted to fold-change (so that wild-type expression was equal to 1) before applying integration methods.

Name	Mapping Function	Thresholding	Scaling
Lazy	Lazy-step	No	One-size-fits-all
Lazy threshold	Lazy-step	Yes	One-size-fits-all
Lazy importance	Lazy-step	No	Reaction Specific
Linear	Linear	No	One-size-fits-all

Filename	Integration method(s)
Baseline_sol_map.csv	Lazy, Linear
Max_sol_map.csv	Max Lazy, Max Linear
Baseline_threshold_sol_map.csv	Lazy Threshold
Max_threshold_sol_map.csv	Max Lazy Threshold
Baseline_import_sol_map.csv	Lazy Importance
Max_import_sol_map.csv	Lazy Max Importance

CyanoExpress label	Description	Sample size	Reference
WT_Cd	Cadmium stress	9	Houot et al., 2007
WT_blue_red	Blue light growth	6	Singh et al., 2009
WT_HL	High light stress	6	Singh et al., 2008
crhR_low_temperature	CrhR-mutant; Low temperature	3	Prakash et al., 2010
WT_S_Starvation_HEPES	Sulphur starvation (no HEPES)	3	Zhang et al., 2008
WT_S_Starvation	Sulphur starvation	7	Zhang et al., 2008
WT_Fe_depletion	Iron stress	5	Hernandez-Prieto et al., 2012

Evaluating transcriptomic integration for cyanobacterial constraint-based metabolic modelling

Data files

Abstract

Description of the data and file structure

Files and variables

The provided datasets include:

Code/software

Access information