Data from: A synthetic biology and green bioprocess approach to recreate agarwood sesquiterpenoid mixtures
Data files
Jan 18, 2024 version files 14.50 GB
Abstract
Certain endangered Thymelaeaceous trees are major sources of the fragrant and highly valued resinous agarwood, comprised of hundreds of oxygenated sesquiterpenoids (STPs). Despite growing pressure on natural agarwood sources, the chemical complexity of STPs severely limits synthetic production. Here, we catalogued the chemical diversity in 58 agarwood samples by two-dimensional gas chromatography–mass spectrometry and partially recreated complex STP mixtures through synthetic biology. We improved STP yields in the unicellular alga Chlamydomonas reinhardtii by combinatorial engineering to biosynthesise nine macrocyclic STP backbones found in agarwood. A bioprocess following green-chemistry principles was developed that exploits ‘milking’ of STPs without cell lysis, solvent–solvent STP extraction, solvent–STP nanofiltration, and bulk STP oxy-functionalisation to obtain terpene mixtures like those of agarwood. This process occurs with total solvent recycling and enables continuous production. Our synthetic-biology approach offers a sustainable alternative to harvesting agarwood trees to obtain mixtures of complex, fragrant, oxygenated STPs.
README: General information
This README file was generated on 2024-01-12 by Sergio Gutiérrez
Dataset Title
A Synthetic Biology and Green Bioprocess Approach to Recreate Agarwood Sesquiterpenoid Mixtures
Date of Data Collection
2022-2023
Contributors
- Sergio Gutiérrez, Sebastian Overmans, Gordon B. Wellman: Bioengineering Program, BESE Division, KAUST, Thuwal, Saudi Arabia.
- Vasilios G. Samaras: Analytical Chemistry Core Lab (ACL), KAUST, Thuwal, Saudi Arabia.
- Claudia Oviedo: Advanced Membranes and Porous Materials Center (AMPM), PSE Division & Environmental Science and Engineering Program, BESE Division, KAUST, Thuwal, Saudi Arabia.
- Martin Gede: AMPM, PSE Division & Chemical Engineering Program, KAUST, Thuwal, Saudi Arabia.
- Gyorgy Szekely: AMPM, PSE Division, Environmental Science and Engineering Program, BESE Division & Chemical Engineering Program, KAUST, Thuwal, Saudi Arabia.
- Kyle J. Lauersen (Corresponding Author): Bioengineering Program, BESE Division, KAUST, Thuwal, Saudi Arabia.
Overview
This supplementary dataset, integral to our research paper, contains data crucial for understanding the key findings of our study. It includes multidimensional chromatographic and mass spectrometric analysis data, making it a valuable resource for metabolomics, phytochemistry, and related fields. The dataset presents diverse metabolic profiles and is ideal for comparative studies, compound identification, and bioinformatics analysis. The raw data from GCxGC-TOF/MS is in its original format, promoting transparency and allowing for varied data processing and analysis methods, thus facilitating diverse scientific interpretations and collaborative research.
Data Formats
Metadata and Binary Files:
- .nc (NetCDF):
- Contains detailed mass spectrometry data, including the mass-to-charge ratio (m/z) range from 50 to 600, crucial for understanding the metabolite profiles.
- Excel Files:
- .xlsx/.xls: Utilized for organizing and presenting data summaries and metadata, possibly including retention times, peak areas, and compound identifications.
- Text Files:
- .txt: May include methodological details, parameter settings for the GCxGC–TOF/MS system, or additional notes relevant to the experiments.
- GenBank Files:
- .gbk/.gb: Essential for detailing the genetic constructs used in the engineered algae strains, if relevant to the dataset.
Additional File Types:
- .xlm: Excel macro files, potentially containing scripts for automating data processing steps or analysis functions.
- .pei, .cpl, .dfa, .gci, .RBC, .RBS: Proprietary or instrument-specific formats related to the GCxGC-TOF/MS system, likely containing instrumental settings, method details, or other operational data. These file types reflect the specific configurations and methodologies employed, as detailed in the materials and methods section.
- log: Log files documenting the operational details of the GCxGC-TOF/MS analyses, including temperature settings, flow rates, and modulation periods.
Recommended Software for Data Analysis:
GCxGC-TOF/MS Data Analysis:
- Primary Software: GC Image™ Version 2.9 - Ideal for processing and visualizing .raw, .d, and .cdf files from agarwood samples.
- Alternative Software: OpenChrom - An open-source alternative that can handle various chromatography data formats.
GC-MS/FID Data Analysis:
- Primary Software: ChromaTOF (LECO) - Suitable for LECO instrument data.
- Additional Options: Agilent MassHunter - For Agilent instrument data; also supports a range of GC/MS file formats.
Genomic Data Analysis:
- Software Options: NCBI's BLAST, Artemis, and GBRAP (GenBank Retrieving, Analyzing, and Parsing software) for handling .gbk/.gb files.
Additional File Types:
- .xlm Files: Microsoft Excel - For handling Excel macro files.
- .pei, .cpl, .dfa, .gci, .RBC, .RBS: Proprietary or instrument-specific formats related to the GCxGC-TOF/MS system, likely containing instrumental settings, method details, or other operational data. These file types reflect the specific configurations and methodologies employed, as detailed in the materials and methods section.
- .log: Log files documenting the operational details of the GCxGC-TOF/MS analyses, including temperature settings, flow rates, and modulation periods.
Usage, Compatibility, and Accessibility
This dataset encompasses a wide range of file formats, catering to diverse research needs in the fields of metabolomics and genomics. By providing data in its original formats, from raw spectral files to GenBank genetic sequences, we ensure comprehensive access and maintain the integrity of the information.
We acknowledge that the dataset includes certain proprietary file types, such as .pei files, which are specific to the instruments and software used in our study. While these files are not essential for the primary data analysis, they are included to offer complete transparency regarding all the data generated during the research. This approach ensures that researchers have access to the fullest possible data set, providing an in-depth view of the study's methodologies and findings.
Researchers are encouraged to use the original data formats for their analyses to ensure the accuracy and reliability of their results. However, we understand that some file formats may require specific software or tools for access. In such cases, or for any assistance with data conversion and analysis, researchers are welcome to contact the authors. We are committed to providing support and guidance to facilitate the effective use of this comprehensive dataset in further research and analysis.
Content: Supplementary files
Supplementary Information File and Data within the Paper
This file, along with the data presented in the main paper, provides comprehensive support for our research findings. It includes results from advanced analytical techniques:
- GCxGC-TOF/MS: Detailed metabolomic analysis of agarwood samples.
- GC-MS/FID: Analytical data from the study of genetically engineered algae strains.
These datasets serve to substantiate the conclusions drawn in our study and are valuable for further scientific exploration and verification.
1. GCxGC-TOF/MS Raw Data Analysis of Agarwood Samples
The dataset comprises extensive raw data files from GCxGC-TOF/MS analysis of agarwood samples. For optimal analysis and interpretation, we recommend using GC Image™ Version 2.9 software, a specialized tool for processing and visualizing GCxGC-MS data. Its tailored features ensure a precise and detailed examination of the complex metabolic profiles present in these samples.
2. GCxGC-TOF/MS Data Report for Agarwood Samples:
This report presents a synthesized overview of the GCxGC-TOF/MS data, providing insights into the metabolic complexities of the agarwood samples. The report includes interpreted results, key findings, and relevant analytical parameters, offering a comprehensive understanding of the data.
3. GC-MS/FID Raw Data of Engineered Algae Strains:
Raw data files from GC-MS/FID analysis are included, offering an in-depth look at the metabolic impacts of genetic engineering in algae strains. This data is crucial for understanding the metabolic pathways and the effects of genetic modifications.
4. Genetic Constructs Documentation:
Detailed information on the genetic constructs used in the engineering of the algae strains is provided. These documents (.gbk/.gb files) include genetic sequences, annotations, and experimental design parameters, essential for replicating the experiments and comprehending the genetic modifications involved.
Supplementary File 1.
File Format
Excel File (.xlsx)
Content Overview
This file presents a detailed report for 58 agarwood samples, including both distillate ("Oudh") and wood pieces ("Bahkour").
Structure of the File
Tabs in Excel File
Each tab corresponds to one sample, ensuring organized data presentation.
Content in Each Tab
Two-Dimensional Gas Chromatogram Image
- Displays the 2D gas chromatogram for each distillate sample, with an m/z range of 50-600.
- Includes blob detection visualizations from GCxGC-TOF/MS, highlighting significant features in the chromatograms.
Compound Detection Table
Detailed listing of the top compounds detected in each sample through GCxGC-TOF/MS analysis using GC Image™ Version 2.9 software.
Table Columns
- BlobID: Unique identifier for each detected compound or feature in the chromatogram.
- Compound Name: The probable identification of the detected compound.
- Hit Formula: Molecular formula of the probable compound.
- Hit CAS#: Chemical Abstracts Service number, a unique identifier for the chemical substance.
- Hit Base Peak: The most intense peak (mass-to-charge ratio) in the mass spectrum of the compound.
- Retention I (min): The first dimension retention time in minutes.
- Retention II (sec): The second dimension retention time in seconds.
- Peak Value: The intensity of the chromatographic peak.
- Volume: The integrated total volume of the peak, reflecting the compound's abundance.
Purpose and Utility
- Provides an in-depth chemical profile for each agarwood sample, useful for comparative analysis, quality control, or further research into the metabolomic properties of agarwood.
- The combination of visual chromatograms and detailed quantitative data offers a comprehensive tool for researchers in phytochemistry, fragrance industry, or related fields.
Supplementary File 2.
File Format
Zip File (.zip)
Content Overview
This extensive collection contains all raw data files from GCxGC-TOF/MS analysis of 58 agarwood samples, encompassing both distillates ("Oudh") and wood pieces ("Bahkour").
Sample Folders Overview
Each folder, labeled from 01 to 58, corresponds to a specific sample in the dataset. Within these sample folders, data is meticulously organized into several subfolders, each designated for a distinct type of data crucial for the comprehensive analysis of the samples:
AM Folder (Analytical Method)
Overview
Housing files that comprehensively detail the analytical methods and instrument settings utilized in the study. This folder offers crucial insights into the experimental approach and the technical nuances of the instrumentation.
File Types
- .pei: Proprietary information related to the specific instrumentation used in the study. These files may include detailed instrument settings or configurations.
- .xml: Methodological information. In this context, it could detail the parameters and settings used in the gas chromatography and mass spectrometry analyses.
- .pm: Scripts or data processing information. These files are instrumental in understanding the computational approaches used in analyzing the data.
Subfolder Structure
Does not contain subfolders, indicating a direct and consolidated approach to storing methodological data. This streamlined structure facilitates quick access to important methodological information and instrumental configurations used in the study.
CA Folder (Chromatographic Analysis)
Overview
Dedicated to storing chromatographic data, which is fundamental for the analysis of compound separation and identification in the samples. This data is integral to understanding the distinct chromatographic profiles, aiding in the accurate identification and characterization of various compounds. The provided data and scripts enable a detailed analysis of the chromatographic separation, offering insights into the chemical composition and behavior of the samples under study. The inclusion of calibration and assignment data further enhances the reliability and interpretability of the chromatographic analysis.
File Types
- .xml: These files typically contain metadata or detailed information about the chromatographic analysis procedures, such as gradient profiles, temperature settings, and detector settings.
- .pl (Perl Scripts): Likely used for processing chromatographic data or automating certain aspects of data analysis, these scripts can be essential for custom analysis tasks or data manipulation.
Subfolder Structure
The CA Folder includes a single subfolder labeled as '0' (indicating the number zero), which corresponds to a specific set of chromatographic reads or analyses.
- Assign.xml: Contains assignment data for chromatographic peaks, linking them to specific compounds or compound classes.
- Calib.xml: Holds calibration data crucial for ensuring accurate and consistent chromatographic analyses across samples.
- Peak.pl: A Perl script file, which may be used for processing peak data, such as identifying, quantifying, or categorizing chromatographic peaks.
MC Folder (Mass Chromatography)
Overview
Containing mass spectrometry data files that are essential for the characterization and analysis of compounds. This folder provides a comprehensive view of the molecular components present in the samples.
File Types
- .xml: Contains metadata or methodological information related to mass spectrometry.
- .cpl: Likely includes information related to chromatographic peaks, which could be instrumental in identifying and quantifying compounds.
- .pm: May contain scripts or data processing information pertinent to mass spectrometry analysis.
- .dfa: Typically stores detailed mass spectrometric data, which could include mass-to-charge ratios, peak intensities, and spectral information vital for molecular analysis.
Subfolders Organization
The MC Folder is organized into five numerical subfolders (0 to 5), each corresponding to a specific read or type of data collected during the analysis. The contents of each subfolder are tailored to provide a focused view of the data relevant to that particular read:
- Meta.xml (in each subfolder): Stores metadata specific to each read, providing contextual information that aids in the interpretation of the data.
- Peak.cpl (in each subfolder): Contains data related to chromatographic peaks observed in each read, crucial for understanding the compound profiles.
- Proces.pm (in each subfolder): Likely includes scripts or processing parameters used in the analysis of the mass spectrometry data for that particular read.
- Profile.dfa (in each subfolder): Stores the mass spectrometry profile data for each read, including detailed information such as mass-to-charge ratios and peak intensities.
RD Folder (Raw Data)
Contains the unprocessed, raw data directly from the GCxGC-TOF/MS instrument.
File Types
- .cdf (Common Data Format)
- .gci (GC Image Data)
- .RBC (Raw Binary Chromatogram)
- .RBS (Raw Binary Spectrometry)
- .log (Log Files)
Note
We acknowledge the presence of certain proprietary file types within this folder, such as .pei, .cpl, .dfa, .gci, .RBC, and .RBS. While these files may not be essential for standard data analysis, they have been included to uphold transparency and provide a holistic view of all data generated during our study.
These proprietary files offer a comprehensive insight into the intricacies of our instrumentation and methodological configurations. They serve as a testament to the rigor and thoroughness of our research process. Researchers who do not require detailed knowledge of instrument settings or specific methodological configurations may prioritize their attention on the standard data files for their analyses.
Recommended Software for Data Analysis
Primary Recommendation
GC Image™ Version 2.9: This software is highly recommended for its exceptional capabilities in handling GCxGC-TOF/MS data. It excels in visualizing and interpreting complex chromatographic datasets, making it a crucial tool for thorough analysis of the data in this study.
Additional Software Options
- ChromaTOF (LECO): Particularly suited for processing data from LECO instruments. It is compatible with various GCxGC data formats, making it a versatile choice for data analysis.
- OpenChrom: A robust open-source alternative that supports a wide range of chromatography and mass spectrometry data formats. It is particularly useful for handling .cdf and .mzML files, offering a flexible solution for data analysis.
- Agilent MassHunter: An excellent option for analyzing data from Agilent instruments. It also supports a broad spectrum of GC/MS file formats, providing a comprehensive tool for data analysis.
- NIST AMDIS: A powerful software for deconvolution and compound identification in mass spectrometry. It is compatible with standard formats like .ms, making it invaluable for detailed molecular analysis.
- Thermo Xcalibur: Designed primarily for data from Thermo Fisher instruments. This software may also support additional formats like .raw, offering flexibility for data analysis across different instrument platforms.
- AnalyzerPro XD (SpectralWorks): A productivity software with support for two-dimensional chromatography and Direct MS, suitable for complex data sets.
- AnalyzerPro (SpectralWorks): Offers workflows for sample-to-sample comparison, target component analysis, quantitation, and library searching across LC-MS and GC-MS platforms.
- RemoteAnalyzer (SpectralWorks): An open access software solution optimized for managing multiple types of analyses and instrument types from various vendors, suitable for diverse laboratory settings.
Note on Proprietary File Types and Equipment Used
- The dataset includes proprietary file types such as .pei, .cpl, .dfa, .gci, .RBC, and .RBS. These formats are specific to the chromatography and mass spectrometry instruments used in this study. The primary equipment includes an Agilent 7890B gas-chromatography system, a Zoex ZX1 cryogenic thermal modulator, and a JEOL TOF MS (AccuTOF GCx-plus, JEOL, Japan). These files offer comprehensive details on the instrumental and methodological setup used for the GCxGC-TOF/MS analysis.
- While these proprietary files provide full transparency regarding data generation, they may require specific software for access, closely associated with the aforementioned equipment. For complete information on the analytical methods, instrument settings, and experimental procedures, please refer to the 'Materials and Methods' section of the dataset documentation. This section will provide additional context and guidance for understanding and utilizing these specialized file types effectively.
Purpose and Utility
- This file is a comprehensive resource for researchers aiming to conduct detailed analyses of agarwood samples.
- The structured and clearly labeled folders make navigating the dataset efficient, allowing for targeted analyses of specific samples or comparative studies across the collection.
- Offering raw data in this format empowers researchers to apply their analytical techniques, fostering diverse investigations in metabolomics, phytochemistry, and other relevant fields.
Supplementary File 3.
File Format
Zip File (.zip)
Content Overview
Supplementary File 3 encompasses a comprehensive collection of GC-TOF/MS raw data, meticulously obtained from GCxGC-TOF/MS analysis of 58 distinct agarwood samples. This includes both agarwood distillate, known as “Oudh,” and agarwood wood pieces, referred to as “Bahkour.”
Structure and Data Organization
- Data Files: The folder contains 58 individual text files (.txt), each distinctly labeled to correspond with the sample number (e.g., 01, 02, 03, etc.).
File Contents:
Each file houses data organized into two primary columns:
- Time of Elution Column: Indicates the elution time in the Gas Chromatography (GC) process, measured in minutes.
- TOF-MS Response Column: Shows the response recorded by the Time-Of-Flight Mass Spectrometry (TOF-MS) detector at the specific elution times.
This structured data format allows users to effectively reconstruct one-dimensional chromatograms for each sample.
Purpose and Utility
- By providing the elution time and corresponding TOF-MS responses, researchers can visualize the 'Y' axis as the detector response and the 'X' axis as the time of elution, facilitating a clear understanding of the chromatographic behavior of each sample.
- This data set is particularly useful for studies focusing on the identification of specific compounds, time-resolved analysis, and quantitative assessments in the context of agarwood research.
Supplementary File 4.
File Format
Excel File (.xlsx)
Content Overview
Supplementary File 4 comprises raw data obtained from GC-MS/FID analysis of engineered algae strains, presented in a structured and accessible Excel format. This data is vital for understanding the metabolic alterations in these strains.
Structure and Data Organization
Columns in Excel File:
- The first column of the Excel file details the time of elution in the Gas Chromatography (GC) process, measured in minutes.
Subsequent columns are dedicated to the Mass Spectrometry Detector (MSD) response at each elution time for a series of samples, including:
- Blank (Dodecane)
- Santalene (A)
- Zizaene (B)
- Aristolochene (C)
- Valencene (D)
- Delta-guaiene (E)
- Cadinol (F)
- Valerianol (G)
- Patchoulol (H)
- Bisabolol (I)
Purpose and Utility
- This data is instrumental for researchers in reconstructing one-dimensional chromatograms for each of the specified samples.
- The chromatograms can be visualized with the 'Y' axis representing the MSD response and the 'X' axis representing the time of elution.
- Chromatographic representations are crucial for analyzing the presence and concentration of various compounds in the engineered algae strains.
Supplementary File 5.
File Format
GenBank File (.gb or .gbk)
Content Overview
This file contains all the plasmid sequences used in genetically engineering algae strains in our study. It's essential for researchers interested in the genetic aspects of the algae modifications.
Structure and Annotation
- Plasmid Constructs: The file includes sequences of all plasmids used in the experiments.
- Annotations: Each sequence is annotated with details like gene names, promoter regions, coding sequences, and resistance genes. These provide clear information about each plasmid's function.
Purpose and Utility
- The file is useful for examining the specific genetic changes in the algae.
- Annotations help understand how each plasmid contributes to the algae’s genetic modification.
Software Compatibility and Accessibility
- Compatible with Genetic Software: The file can be opened using any genetic manipulation software that reads GenBank formats.
- Ease of Use: The standard GenBank format and clear annotations make the file accessible for those with a background in genetics.
Application in Research
- Researchers can use this file for replicating the study, conducting genetic analysis, or for further research in genetic engineering and synthetic biology.
- It serves as a reference for the plasmids used, allowing for further experimentation or design modifications in related fields.
Methods
Agarwood sample collection and processing
Agarwood samples were procured from the old-town market "Al Balad" in Jeddah, Saudi Arabia (21.481° N, 39.187° E) in Winter 2023. The collection includes 36 different samples of agarwood chips, dried Aquilaria spp. wood (also known as "bahkour"), and 22 agarwood steam-distillate oils, known as "oudh". The ages of the plants at harvest and origins of these samples cannot be accurately traced; however, many were labelled with their country of origin. The samples varied in price at the time of purchase, reflecting their purported rarity, the density of fragrant compounds, and complexity of aromatic notes (Supplementary Table 1). It was determined that the appropriate organic solvent for extraction was acetone based on preliminary analysis with different solvents, and each agarwood sample was then diluted to obtain chromatograms with clear peaks for product detection and identification. All samples were processed within 16 weeks of being procured. The wood samples were weighed (1 g) and ground into a fine powder using a combination of freeze–thaw cycles with liquid nitrogen and mechanical grinding. The homogenised samples were immersed in 5 mL of 1:1 hexane:acetone and subjected to ultrasonic agitation at 40°C for 8 h to facilitate terpenoid extraction. The samples were then passed through a 0.2 µm filter to obtain clear solvent extracts that were evaporated under a nitrogen stream for 20 min to concentrate the terpenoids in the samples. Concentrated samples were resuspended in 500 µL acetone and stored at –20°C until gas chromatography–flame-ionisation detector and mass spectrometry (GC–MS/FID) and two-dimensional gas chromatography with time-of-flight mass spectrometry (GCxGC–TOF/MS) analyses. All photographs were captured with a Canon EOS RP camera using a Canon RF 24–105 mm f/4-7.1 IS STM lens (Canon, Tokyo, Japan) and ColorChecker Passport (CCPP2, Calibrite LLC, DE, USA) used for colour calibration.
Algal strain cultivation
Chlamydomonas reinhardtii strain UPN22 was used for all experiments. C. reinhardtii UPN22 is a derivative of the UVM4 strain. It has been genetically enhanced to use phosphite and nitrate as a sole source of phosphorous and nitrogen, respectively, to minimise contamination and maximise cell densities in cultivation. Strains were cultured in Tris-acetate phosphite nitrate (TAPhi-NO3) liquid medium with shaking at 120 rpm or on solid agar and 150 µmol m–2 s–1 light intensity from a combination of cool- and warm white LED tubes (light spectra as reported in). 500 mL algal cultures were agitated with stirring in Erlenmeyer flasks under the same growth conditions.
Plasmid construction, algal transformation, and screening for sesquiterpenoid synthase expression
Heterologous expression of sesquiterpenoid synthases (STPSs) in C. reinhardtii was achieved through synthetic transgene redesign based on the amino-acid sequences of each STPS, following previously established protocols. We selected 21 STPSs from various species encoding isoforms that yield nine different STP skeletons (Table 1). Targeted STPSs for aristolochene, δ-guaiene, santalene, valencene, valerianol, zizaene, τ-cadinol, bisabolol, and patchoulol were designed; all accession numbers are listed in Table 1. Their amino-acid sequences were used to generate algal-adapted nucleotide coding sequences using the Intronserter programme. This programme back-translates amino-acid sequences to frequently used codons, removes unwanted restriction sites, and systematically integrates the first intron of CrRBCS2i1 at a set distance to enable expression from the C. reinhardtii nuclear genome, as previously reported. Algal-adapted coding sequences were synthesised and sub-cloned into pOptimized_3 expression plasmids by Genscript (Piscataway, NJ, USA). Ketocarotenoid biosynthesis in alga was achieved by transformation with the pOpt2_CrBKT_aadA plasmid, and knockdown of C. reinhardtii squalene synthase (Uniprot: A8IE29) was achieved using the previously reported luciferase–artificial microRNA expression plasmid pOpt2_ cCA_gLuc-TAA_i3_ami_Spec. Both plasmids confer resistance to spectinomycin as a selectable marker. Combining both cassettes into a single plasmid was unsuccessful. Co-transformation of both plasmids and selection on spectinomycin was combined with robotics assisted colony picking and plate-level screening of 768 transformants to find those with luciferase activity (indicative of SQS knockdown) and a brown-colony phenotype from CrBKT-mediated ketocarotenoid biosynthesis. All plasmid constructs used are listed in Supplementary Table 6 and the complete annotated plasmid sequences are available in Supplementary File 5.
The glass-bead protocol was used to transform the nuclear genome of C. reinhardtii with a plasmid DNA. Each plasmid was linearised using restriction enzymes (XbaI + KpnI, Thermo Scientific FastDigest), and 10 µg of DNA was used for each transformation. Following an ~8 h recovery period in liquid TAPhi-NO3 medium under low light, algal cells were plated on a selective medium with paromomycin (10 µg mL–1), spectinomycin (200 µg mL–1) or zeocin (15 µg mL—1) antibiotics either individual or combinations of selection agents relative to each target plasmid. Plates were illuminated continuously for ~7 d before colony picking. A PIXL robot (Singer Instruments, Watchet, UK) transferred up to 384 colonies per transformation event to TAPhi-NO3 agar plates. After an additional 3 d, a ROTOR robot (Singer Instruments) was used to replicate colonies onto new medium and plates containing amido black (150 µg mL–1) for fluorescence screening as previously described.
All algal-optimised STPSs were expressed as fusions with mVenus (yellow fluorescent protein) or the monomeric teal (cyan) fluorescent protein 1 (mTFP1). Transgene expression of each STPS was determined by fluorescence imaging at the agar-plate level, as previously described. Chlorophyll fluorescence was observed with 2 sec of 475/20 nm excitation and 640/160 nm emission to show colony presence/absence on amido black-containing plates. Cyan-green fluorescence was captured with 420/20 nm excitation and 480/20 nm emission filters using 2.5 min exposure. Yellow fluorescent signals were captured with 504/10 nm excitation and 530/10 nm emission filters with 30-sec exposures. Transformants displaying strong fluorescent-protein signals were selected and inoculated into 12-well plates containing 2 mL of liquid TAPhi-NO3 medium and grown with shaking at 160 rpm. Predicted molecular masses of the expressed heterologous STPS–FP fusions were verified using sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS–PAGE) in-gel fluorescence against the fusion-protein fluorescent reporter.
Heterologous sesquiterpenoid biosynthesis analyses
For each transformant, biosynthesis of heterologous STPs was assessed by gas chromatography–mass spectrometry (GC–MS/FID). Four individual transformants containing each plasmid with the highest fluorescence signals were selected, and solvent–culture two-phase living extractions were performed using a 10% v/v dodecane–culture overlay in 6-well plates, as previously reported17,53. Cultivations were performed in biological triplicate for 6 d in 4.5 mL TAPhi-NO3 media with 500 µL dodecane overlay. Phases were separated, and culture samples were taken for cell-density analysis by flow cytometry as previously described, and dodecane was spun at 20,000 x g for 3 min, then 150 µL clarified solvent was transferred into amber GC vials in triplicate prior to analysis.
Samples were analysed as previously described using an Agilent 7890A gas chromatograph equipped with a mass spectrometer and a flame-ionisation detector (GC–MS/FID). The gas chromatograph comprises a 5975C inert MSD with a triple-axis detector and a DB-5MS column (30 m × 0.25 mm i.d., 0.25 μm film thickness). The injector, interface, and ion-source temperature profiles were set to 250°C, 250°C, and 220°C, respectively. In splitless mode, 1 μL of the sample was injected using an autosampler (G4513A, Agilent). The column flow was constant at 1 mL min−1, with helium as carrier gas. The initial GC oven temperature was set to 80°C for 1 min, increased to 120°C at 10°C min−1, raised to 160°C at 3°C min−1, and to 240°C at 10°C min−1, holding for 3 min. After a 12-min solvent delay, mass spectra were recorded using a scanning range of 50–750 m/z at 20 scans per second. Chromatograms were analysed with MassHunter Workstation software version B.08.00 (Agilent), and STPs were identified using the National Institute of Standards and Technology (NIST) library (Gaithersburg, MD, USA). Further identification was conducted with purified standard calibration curves ranging from concentrations of 1–1,200 μM in dodecane of δ-guaiene (CAT#B942760), patchoulol (CAT#P206200), santalene (CAT#S15065), valerianol (CAT#V914000, Toronto Research Chemicals, ON, Canada), bisabolol (CAT#95426), valencene (CAT#06808), cedrene (CAT#22133, Sigma-Aldrich, MO, USA) (Supplementary Fig. 6). For compound identification, retention-time acquisition, internal digital-library calibration, and method development, we used a set of 12 microampules containing a standard terpene mixture, which covered 98 terpenes at 1 mM in methanol (CAT# MSITPN101, MetaSci, ON, Canada, Supplementary Table 4).
Two-dimensional Gas chromatography time-of-flight mass spectrometry (GCxGC–TOF/MS) analysis
Comprehensive two-dimensional gas chromatography time-of-flight mass spectrometry (GCxGC–TOF/MS) analysis of agarwood and distillate extracts in acetone was performed using an Agilent 7890B gas-chromatography system equipped with a Zoex ZX1 cryogenic thermal modulator and a JEOL TOF MS (AccuTOF GCx-plus, JEOL, Japan). The GCxGC system featured a normal (non-polar x mid-polar) two-dimensional column configuration, comprising a first-dimension column with a 30 m non-polar HP-5MS UI capillary column (5%-phenyl-methylpolysiloxane) and a second-dimension column with a 2 m mid-polar BPX-50 capillary column (50% phenyl polysilphenylene-siloxane). We used helium (99.999%) as the carrier gas at a constant flow rate of 0.8 mL min−1. The GCxGC–TOF/MS injector temperature was maintained at 300°C with a 10:1 split ratio. The oven temperature was initially held at 80°C for 1 min, then increased to 325°C at 2°C min−1. The modulation period was set at 6 s with a pulse time of 0.35 ms. The mass spectrometer operated in electron ionisation (EI+) mode at 70 eV. Both the transfer-line and ion-source temperatures were maintained at 250°C. The detector voltage of TOF was set to 2,500 Volts, and data were acquired at a rate of 50 Hz. Mass spectra were obtained within a mass-to-charge ratio (m/z) from 50 to 600.
Gas chromatography data analysis
The analysis of terpenoid extracts followed a procedure similar to previously reported methods, and qualitative analyses primarily relied on the retention index and match factor. The GC–MS/FID data were processed using the MassHunter Workstation software version B.08.00 (Agilent Technologies, USA). The identification of compounds was assisted by the NIST Mass Spectral Library Version 2.3 (National Institute of Standards and Technology, Gaithersburg, MD, USA). The mass-spectral data, derived from the GCxGC–TOF/MS analysis, were evaluated with GC ImageTM Version 2.9 software (Lincoln, NE, USA) and referenced against the NIST2020/EPA/NIH EI Mass Spectral Library. The spectral data were cross-referenced with library spectra to identify potential chemical structures, facilitated by calculating the match factor and probability, thereby generating a list of probable compound matches. The match factor directly compares the unknown mass spectrum peaks with those in the library spectra, indicating their similarity. In contrast, the probability determines the relative likelihood of the list of hits being accurate, assuming that the unknown spectrum is present in the library. Using these metrics provides the relative assuredness of matching chemical structures within the library spectra.
Sesquiterpenoid production and concentration
Transformants of best-performing STPS isoforms that accumulated the most of each heterologous sesquiterpenoid in screening conditions, were subjected to solvent–culture two-phase cultivations at 300 mL scale in TAPhi-NO3 medium using FC-3283 as a solvent underlay phase, as previously described. FC-3283 is a perfluorinated amine that is inert and denser than water, forming an underlay to the culture and accumulating heterologous terpene products from the algae. After 6 days of cultivation, gravity and gentle centrifugation separated the fluorocarbon and cultures. After centrifugation to clarify, fluorocarbons were subjected to liquid–liquid extraction to partition accumulated STPs into an equal volume of 96% ethanol. The mixture was shaken for 16 h at room temperature at 200 rpm. Next, samples were again centrifuged gently to further separate the phases. FC-3283, after ethanol extraction, can be reused on algal cultures and is effectively recycled in this process. 500 μL aliquots from each phase were sampled and stored in separate GC vials at –20°C for subsequent analysis and separation performance quantification.
For each of the nine STPs biosynthesised by the algae, 20 mL 96% ethanol-containing STOs were generated. The ethanol fractions were pooled and subjected to organic solvent nanofiltration (OSN) in a dead-end cell to concentrate the terpenes in ethanol without evaporative losses. A Duramem solvent-resistant membrane (Evonik, Germany) with a nominal molecular weight cut-off value of 300 g mol–1 suitable for STP retention and chemical compatibility with ethanol was selected. The 200 mL ethanol–sesquiterpenoid mixture was loaded into the OSN chamber containing a 16 mm membrane disc and subjected to 20 bar pressure delivered by CO2 as an inert gas, to drive the ethanol phase through the nanofiltration membrane at 2.84 L m–2 h–1 flux. While ethanol permeated the membrane, STPs were kept in the retentate due to their molecular weight and consequently concentrated in the ethanol. The permeate ethanol is suitable for recycling and can be reused in subsequent liquid–liquid extraction processes.
Bulk hydroxylation of sesquiterpenoid backbones
To selectively introduce hydroxyl groups at the double bonds present in the terpenes, hydroboration-oxidation reactions were performed using two distinct organoboron reagents: borane–tetrahydrofuran complex (BH3·THF) (CAT#176192, Sigma-Aldrich) and 9-borabicyclo[3.3.1]nonane (151076, Sigma-Aldrich). Three different stoichiometric ratios (0.5, 1.0, and 2.0) were explored for each reagent to investigate different degrees of hydroxylation within the STP mixture. The sterically hindered 9-BBN was anticipated to selectively react with the least sterically demanding sites of the terpenes, while the borane–tetrahydrofuran complex was expected to facilitate hydroxylation even at the more challenging endocyclic positions. To remove residual ethanol and water content from the concentrated terpene mixture, a rotary evaporator was used under reduced pressure at 40°C, ensuring effective removal without compromising the integrity of the terpenes. To ensure the complete removal of water, molecular sieves (CAT#105734, Sigma-Aldrich) were added to the terpene–THF solvent mixture. The reactions were carried out in anhydrous tetrahydrofuran (THF, Sigma-Aldrich) to enable removal by evaporation after completion. Hydrogen peroxide (H2O2, 30%, VWR Chemicals) and sodium hydroxide aqueous solution (NaOH, Sigma-Aldrich) were used to convert the organoboron intermediates into non-toxic, water-soluble boric acid after the reaction. All subsequent terpene derivatives were extracted from the aqueous mixture with ethyl acetate.
Data analysis
To evaluate the impact of engineered modifications on STP biosynthesis and growth characteristics in C. reinhardtii strains, one-way analysis of variance (ANOVA) was performed to compare mean STP (patchoulol) titres and growth rates among different transformants and the parental control strain. This statistical approach allowed for the simultaneous analysis of differences between multiple groups, to provide a robust assessment of the experimental manipulations. ANOVAs were performed separately for STP titres and growth rates under mixotrophic and phototrophic conditions. Post-hoc pairwise comparisons using Tukey's HSD were conducted to identify which groups were statistically significantly different. A one-way ANOVA was used to assess the effects of differences in STP biosynthesis among tested strategies (single or double transformation), followed by a post hoc Tukey's HSD test for specific pairwise data-set comparisons. Mean production values (in fg cell–1 and mg L–1) for each STP were compared, considering standard deviations to evaluate data variability. Mean values were considered statistically significantly different at a level of p < 0.05. For data analyses, JMP v.16 (SAS Institute Inc, NC, USA) and Rstudio 3.6.2 (Posit Software, Boston, USA) were used. Data visualisation was done using JMP v.16 and GraphPad Prism v.10 (GraphPad Software, MA, USA). Image adjustments involved ColourChecker calibration (Calibrite LLC, DE, USA) paired with Adobe Lightroom (Adobe Inc., CA, USA) for colour accuracy. The Gardner Colour Scale was used for agarwood sample assessment. Images were cropped and organised in Affinity Photo v.1.10.6 (Serif Ltd., WB, UK). Analysis of images and assignment of colour values according to the Gardner scale was conducted using ImageJ software (NIH, USA). Diagrams and illustrations were made using Affinity Designer v.1.10.6 (Serif Ltd., WB, UK), chemical structures with ChemDraw v.20.1 (PerkinElmer, MA, USA) and all visual elements were harmonised in Affinity Publisher v.1.10.6 (Serif Ltd., WB, UK).