Compartmentalized sesquiterpenoid biosynthesis and functionalization in the Chlamydomonas reinhardtii plastid
Data files
Nov 21, 2024 version files 8.81 MB
-
README.md
7.39 KB
-
Supplementary_File_01.gb
1.03 MB
-
Supplementary_File_02.csv
2.48 MB
-
Supplementary_File_03.cvs.csv
684.38 KB
-
Supplementary_File_04.csv
2.45 MB
-
Supplementary_File_05.csv
1.03 MB
-
Supplementary_File_06.csv
525 KB
-
Supplementary_File_07.csv
299.07 KB
-
Supplementary_File_08.csv
299.07 KB
Abstract
Terpenoids play key roles in cellular metabolism, with some organisms having evolved expanded terpenoid profiles for specialized functions such as signaling and defense. While heterologous production in microbial hosts offers an alternative to natural extraction, the development of efficient biosynthetic platforms remains challenging. Here, we developed a subcellular engineering approach in the model green alga Chlamydomonas reinhardtii by targeting both sesquiterpenoid synthases and cytochrome P450s (CYPs) to the plastid, exploiting its photosynthetic electron transport chain to drive CYP-mediated oxidation without reductase partners. Nuclear-encoded sesquiterpenoid synthases were expressed with farnesyl pyrophosphate synthase fusions and targeted to the plastid, while CYPs were modified for soluble localization in the plastid stroma by removing transmembrane domains. The plastid environment supported hydroxylation, epoxidation, and oxidation reactions, with functionalization efficiencies reaching 80% of accumulated products. Carbon source availability influenced product ratios, revealing metabolic flexibility in the engineered pathways. Overall sesquiterpenoid yields ranged between 250-2500 µg L–1 under screening conditions, establishing proof-of-concept for using plastid biochemistry in complex terpenoid biosynthesis. Living two-phase terpenoid extractions with different perfluorinated solvents revealed variable performances based on sesquiterpenoid functionalization and solvent type. This work demonstrates that photosynthetic electron transport can drive CYP-mediated functionalization in engineered subcellular compartments. However, improvements in photobioreactor cultivation concepts will be required to facilitate the use of algal chassis for scaled production.
Generated on: 2024-11-21 by Sergio Gutiérrez
**Date of Data Collection: **2023–2024
Contributors: Sergio Gutiérrez, Sebastian Overmans, Gordon B. Wellman, and Kyle J. Lauersen*
Bioengineering Program, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
*Corresponding Author: kyle.lauersen@kaust.edu.sa
Overview
This dataset provides comprehensive information on the sesquiterpenoid biosynthesis pathway engineering and functionalization in Chlamydomonas reinhardtii. The dataset includes genetic constructs and chromatographic data generated during the study. These files support studies of metabolic engineering, biochemistry, and bioinformatics, offering detailed insights into the design and analysis of engineered strains, their product profiles, and experimental parameters. This dataset is an excellent resource for researchers in synthetic biology, microbial metabolic engineering, and plastid biology.
Data Formats
The dataset is organized into two major file types:
- Genetic construct sequences: GenBank files (
.gb)- Contain complete plasmid sequences, annotations, and regulatory details.
- Chromatographic data: CSV files (
.csv)- Include raw data from GC-FID experiments and processed output, documenting sesquiterpenoid product profiles.
All files are provided in open formats to ensure compatibility with widely available tools and maximum usability.
Number Suffix (Enzyme)
The engineered strains are identified using a number suffix to specify the sesquiterpenoid synthase expressed:
01: Negative control (fluorescent protein only)02: Aristolochene synthase03: Valencene synthase04: Selinene synthase05: Vetispiradiene synthase06: Santalene synthase07: Bisabolol synthase08: Cadinol synthase09: Guaiene synthase10: Valerianol synthase11: Patchoulol synthase
These suffixes are used consistently across all files to identify the specific sesquiterpenoid synthase expressed in each strain.
Data Organization and Description
Genetic Constructs
- File Format: GenBank (
.gb) - Content Overview:
- Plasmid sequences: Complete nucleotide sequences of the genetic constructs used in the study.
- Gene annotations: Information on gene insertions, promoter regions, regulatory elements, and fluorescent protein sequences.
- Regulatory details: Cloning sites, antibiotic resistance markers, and other functional sequences included in the constructs.
- Purpose: Enables replication of genetic engineering efforts, understanding of construct design, and further experimentation.
Chromatographic Data
The chromatographic data are organized into separate CSV files based on experimental setups, including cytoplasmic expression, chloroplast targeting, and functionalization experiments.
Supplementary File_02: Cytoplasmic Expression (A-Series)
- File Format: CSV (
.csv) - Content:
- GC-FID data for cytoplasmic expression of sesquiterpenoid synthases (A-series strains).
- Structure:
- Columns:
Time (min): Retention time during gas chromatography.Peak Area (response): Detector response corresponding to sesquiterpenoid product peaks.
- Rows: Individual time points recorded during analysis for each replicate.
- Columns:
- Samples:
- A01: Negative control (fluorescent protein only).
- A02–A11: Strains expressing the following sesquiterpenoid synthases:
02: Aristolochene synthase03: Valencene synthase04: Selinene synthase05: Vetispiradiene synthase06: Santalene synthase07: Bisabolol synthase08: Cadinol synthase09: Guaiene synthase10: Valerianol synthase11: Patchoulol synthase
- Experimental Conditions:
- Temperature gradient: 60–300°C.
- GC column: HP-5MS.
- Carrier gas: Helium.
- Purpose: Provides raw data to analyze product formation in cytoplasmic strains.
Supplementary File_03: Chloroplast-Targeted Erg20 Fusion (B-Series)
- File Format: CSV (
.csv) - Content:
- GC-FID data for chloroplast-targeted sesquiterpenoid synthase strains fused with Saccharomyces cerevisiaeErg20.
- Structure:
- Columns:
Time (min): Retention time during gas chromatography.Peak Area (response): Detector response corresponding to sesquiterpenoid product peaks.
- Rows: Individual time points recorded during analysis for each replicate.
- Columns:
- Samples:
- B01: Negative control (fluorescent protein only).
- B02–B11: Strains expressing sesquiterpenoid synthases as detailed in the Number Suffix list above.
- Experimental Conditions:
- Temperature gradient: 60–300°C.
- GC column: HP-5MS.
- Carrier gas: Helium.
- Purpose: Provides data on sesquiterpenoid biosynthesis using the chloroplast-targeted Erg20 fusion.
Supplementary File_04: Chloroplast-Targeted ispA Fusion (C-Series)
- File Format: CSV (
.csv) - Content:
- GC-FID data for chloroplast-targeted sesquiterpenoid synthase strains fused with Escherichia coli ispA.
- Structure:
- Columns:
Time (min): Retention time during gas chromatography.Peak Area (response): Detector response corresponding to sesquiterpenoid product peaks.
- Rows: Individual time points recorded during analysis for each replicate.
- Columns:
- Samples:
- C01: Negative control (fluorescent protein only).
- C02–C11: Strains expressing sesquiterpenoid synthases as detailed in the Number Suffix list above.
- Experimental Conditions:
- Temperature gradient: 60–300°C.
- GC column: HP-5MS.
- Carrier gas: Helium.
- Purpose: Provides data on sesquiterpenoid biosynthesis using the chloroplast-targeted ispA fusion.
Supplementary File_05: CYP Co-Expression Analysis
- File Format: CSV (
.csv) - Content:
- GC-FID data from co-expression of sesquiterpenoid synthases and cytochrome P450 enzymes (CYPs).
- Structure:
- Columns:
Time (min): Retention time during gas chromatography.Peak Area (response): Detector response for modified sesquiterpenoid products.
- Rows: Individual time points recorded during analysis for each replicate.
- Columns:
- Samples:
- Controls: B01–B09 strains without CYP enzymes.
- Modified: Strains co-expressing CYPs with sesquiterpenoid synthases as detailed in the Number Suffix list above.
Supplementary Files_06–08: Carbon Source Effect Studies
- File Format: CSV (
.csv) - Content:
- Data from experiments testing different carbon sources on sesquiterpenoid biosynthesis.
- Samples:
- Strains expressing sesquiterpenoid synthases as detailed in the Number Suffix list.
- Carbon Sources: CO2, acetate, and CO2+acetate conditions.
Usage and Accessibility
- All files are in open, non-proprietary formats, ensuring accessibility without requiring specialized software.
- Biological triplicates are included for all strains to ensure statistical rigor.
Experimental section
Algae cultivation, plasmid design, transformation, and screening
Experiments used a C. reinhardtii strain derived from UPN22, modified for enhanced terpenoid biosynthesis through squalene synthase knockdown and β-carotene ketolase overexpression [1]. Cultures were maintained in TAPhi-NO3 medium under LED illumination (150 µmol m–2 s–1). We selected ten sesquiterpene synthases (STPSs) based on documented activity and complete sequence availability, including selinene synthase (UniProt: O64404) and vetispiradiene synthase (UniProt: A0A411G8M5)[2, 3]. All genes underwent codon optimization and intron spreading for nuclear genome expression [2]. We developed three pOpt3-based STPS construct designs: cytoplasmic expression with paromomycin selection (APHVIII), and two chloroplast-targeted expression with hygromycin selection (APHVII)[4]. Chloroplast-targeted constructs utilized the PsaD promoter and chloroplast targeting peptide (CTP), fused to mKOk fluorescent protein, and either S. cerevisiae (Erg20) or E. coli (ispA) farnesyl diphosphate synthase (FPPS). FPPS sequences included C-terminal stop codons to preserve activity [5]. CYPs were selected based on three criteria: (1) previous biochemical characterization, (2) documented activity on target sesquiterpenoid scaffolds, and (3) availability of complete sequence information. We modified CYPs for plastid targeting by removing transmembrane domains, identified through TMHMM-2.0 server analysis and AlphaFold structural modeling [6]. The modified sequences were subcloned into pOpt3-based constructs containing the PsaD promoter, CTP, mTFP1 reporter, and zeocin resistance marker (shBle). All constructs were synthesized by Genscript (Piscataway, NJ, USA) (Supplementary information, Table S1, File S1).
For transformation, we linearized plasmid DNA using XbaI and KpnI restriction enzymes and introduced 10 µg DNA per transformation via glass-bead protocol [7]. Following 8-hour recovery in liquid TAPhi-NO3 medium [8]under low light, cells were plated on selective media containing appropriate antibiotics: spectinomycin (200 µg mL–1), paromomycin (10 µg mL–1), hygromycin B (15 µg mL–1), or zeocin (15 µg mL–1). After 7 days under continuous illumination, we employed a PIXL colony-picking robot (Singer Instruments, UK) to transfer up to 384 colonies onto fresh TAPhi-NO3 plates. A ROTOR robot (Singer Instruments, UK) duplicated colonies after 3 days onto screening plates containing amido black (150 µg mL–1) [3]. We selected four independent transformants per construct based on fluorescent protein signal intensity (Supplementary information, Figure S1–S4). Each transformant underwent analysis in biological triplicate (n=12) to account for expression variability due to random nuclear integration. Selected colonies were cultured in 12-well plates containing 2 mL TAPhi-NO3 medium at 160 rpm for subsequent two-phase cultivation and solvent analysis [2, 5].
Capture and analysis of algal-produced sesquiterpenoids by GC-MS/FID
We employed a two-phase cultivation system for sesquiterpenoid extraction and quantification [2, 9]. For each construct, four independent transformants showing strong fluorescence signals were analyzed in biological triplicate (n=12 total samples). Cultures were grown in 6-well microtiter plates containing 4.5 mL TAPhi-NO3 medium with a 500 µL dodecane overlay (10% total volume) for 7 days [2, 3]. To systematically evaluate extraction efficiency, we compared ten perfluorocarbon solvents (FCs) against dodecane: CFL7160, CXFL-68, CFL3000A, CXFL-3288, FC-770, FC-3284, FC-43, FC-72, FC-40, and FC-3283 (sourced from Sigma-Aldrich, Germany; Acros Organics, Belgium; Hunan Chemfish Pharmaceutical Co., Ltd, China). FC extractions used 1000 μL solvent (20% total volume) with 4 mL cultures, with dodecane forming an upper phase and FCs forming lower phases (Supplementary information, Table S5). For these extractions, we used 1000 μL of solvent (20% of total volume) with 4 mL cultures. Dodecane formed an upper 'overlay' while FCs formed 'underlays'. We monitored culture volumes throughout cultivation to account for evaporation effects. Phases were separated by centrifugation (3500 × g, 5 min), and both solvent fractions were collected for GC analysis. Cell density was determined using flow cytometry (Supplementary information, File S7) [9].
GC-MS/FID analysis followed established protocols [5] and processed chromatograms using MassHunter software (Agilent, Germany, version B.08.00). Compound identification involved three complementary approaches: (1) Comparison of mass spectra against the NIST Mass Spectral Library (National Institute of Standards and Technology, USA); (2) Analysis of retention indices; (3) Matching against authenticated standards. For absolute quantification, we generated calibration curves (1–500 μM) using purified standards in both dodecane and FCs: δ-guaiene, patchoulol, α-santalene, valerianol, α-bisabolol, valencene, and cedrene (Toronto Research Chemicals, Canada) (Supplementary information, Figure S5). Compound identification reliability was evaluated using three metrics: (1) mass spectral match factor ("P%" in Supplementary information, Tables S7-S8), calculated as the mathematical similarity between sample and reference spectra on a scale of 0-100; (2) retention index comparison with literature values; and (3) comparison against a standard terpene mixture containing 98 compounds at 1 mM in methanol (MetaSci, Canada) (Figure 6, Supplementary information, Table S6). Mass spectral matches below 50% were considered tentative identifications. For functionalized products without available standards, identification relied on the presence of expected mass shifts corresponding to specific chemical modifications (e.g., +16 m/z for hydroxylation) and fragmentation patterns characteristic of the predicted structures. These identifications are in the data tables (Supplementary information, Tables S7-S9).
Data analysis
The experimental design included four independent transformants per construct, each analyzed in biological triplicate (n=12 total samples per construct). We included two types of controls: the parental non-transformed C. reinhardtii strain and vector-only constructs with matched fluorescent reporters. GC-MS/FID measurements followed a rigorous quality control protocol including: (1) Manual review of all chromatograms for peak quality and integration accuracy; (2) Triplicate technical measurements of each biological sample; (3) Monitoring of internal standards for instrument performance; (4) Regular blank injections to detect potential carryover. Compound identification utilized three criteria: (1) Retention index comparison with reference standards; (2) Mass spectral match factors (reported as P% in Supplementary information, Tables S7-S8); (3) NIST library comparison with match quality threshold >50%. For quantitative analysis, we calculated means and standard deviations of sesquiterpenoid yields from biological replicates. Due to random nuclear transgene integration in C. reinhardtii, expression levels can vary substantially between transformants [2, 9, 10]. Our multi-transformant approach captures this inherent variability, providing a realistic assessment of production potential. High standard deviations in yield data reflect this biological variation rather than technical measurement uncertainty. Statistical analysis employed two-way ANOVA with Tukey's post-hoc test to evaluate differences between subcellular localization strategies (Supplementary information, Table S3). We used JMP v.16 (SAS Institute, NC) and R v.3.6.2 (R Foundation for Statistical Computing, Austria) for statistical computations. Data visualization utilized JMP v.16 and GraphPad Prism v.10.3 (GraphPad Software, USA). Figures were prepared using Affinity Designer v.2.5.3 (Serif Ltd., UK) for diagrams, ChemDraw v.20.1 (PerkinElmer, MA, USA) for chemical structures, and Affinity Publisher v.2.5.3 (Serif Ltd., WB, UK) for layout integration.
