Biosensor-driven strain engineering reveals key cellular processes for maximizing isoprenol production in Pseudomonas putida
Data files
Sep 19, 2025 version files 79 MB
-
Data_Dryad_Supplementary_Data_Updated_2025-9-18_2.zip
78.98 MB
-
README.md
18.98 KB
Abstract
Synthetic and systems biology now produces vast combinatorial designs, but high-throughput analytical methods are poorly matched to interrogate this search space. We addressed this challenge with a biosensor-driven strategy in Pseudomonas putida to enhance isoprenol production, a key precursor for an advanced aviation fuel. Our biosensor leverages the native response of P. putida to short-chain alcohols, enabling a conditional growth-based selection that identified competing cellular processes as targets to improve isoprenol production. An iterative and combinatorial strain engineering approach yielded a 36-fold increase in isoprenol production (~900 mg/L). Ensemble -omics analysis revealed key causal metabolic rewiring that enhanced production. Techno-economic analysis provided an economic viability context and confirmed that the benefits of adding amino acid supplements outweigh the additional costs. This study establishes a modular and broadly applicable biosensor-driven approach for optimizing heterologous pathways, advancing the science of microbial bioproduction, and driving sustainable bioproducts development for a resilient economy. This companion dataset contains the several raw datasets generated from this study that are not uploaded in specific repositories.
Dataset DOI: 10.5061/dryad.sbcc2frjq
Description of the data and file structure
These files include the raw data corresponding to the experiments described in Menasalvas et al, spanning from proteomics, genomics, gRNA targeting sequence analysis, alphafold struture prediction, and flow cytometry kinetic timecourse data, in Data_Dryad_Supplementary_Data_Updated_2025-9-18_2.zip
Files and variables
File: Menasalvas_et_al_Supplementary_Data.zip
Abbreviations used: HHK: hybrid histidine kinase; RB-TnSeq: Random Barcode Transposon Sequencing; RBS: Ribosome Binding Site; PCR: polymerase chain reaction; WT: wild type; CV: coefficient of variation; dCpf1/dCas12a: deactivated Cfp1/Deactivated Cas12a (CRISPRi system); gRNA: guide RNA; ORF: open reading frame; LC-MS/MS: liquid chromatography tandem Mass Spectrometry; PHA: polyhydroxyalkanoate; SNP: Single Nucleotide Polymorphism; GO: Gene Ontology; ONT: Oxford Nanopore Technologies
The datasets arranged here in the corresponding zip file contain 2 Excel sheets and three folders of raw data:
Supplementary Data 1 (Excel sheet):
Sheet 1. Metabolite concentrations from selected isoprenol producer strains.
| Column Header | Notes |
|---|---|
| Metabolite | |
| Relative vs Absolute Concentration | Absolute concentrations for metabolites are indicated where the concentration was determined using a standard curve calculated with the peak area of the authentic analyte. Relative concentrations are displayed where the concentration indicated was determined using response from a single chemical standard. |
| Average Concentration | Average value from 3 biological replicates. Strain names are indicated with the TEAM-XXXX format. GP = growth phase samples. PP = production phase samples. Fold Change was calculated by the determining the ratio of concentrations from the indicated strain IDs in during growth phase (GP) |
| Specific Concentration | The average metabolite concentration was normalized against the OD600 at the time of sample harvest. |
Sheet 2. Pooled gRNA targeting sequences synthesized for the library.
| Column Header | Notes |
|---|---|
| gene ID | The E. coli common gene name is used, followed by the gRNA targeting sequence used with the _TX schema, where X is the unique number given to the targeting sequence. |
| gRNA Sequence | Supplied 5'-3' notation |
| PAM | |
| Start | P. putida genome coordinates |
| End | |
| Strand | + or - strand |
| Oligonucleotide sequences (protospacer flanked by AarI Golden gate linkers) |
Sheet 3. Distribution of gRNA pooled library gRNA sequences.
| Column Header | Notes |
|---|---|
| gRNA ID | refer to Sheet 2 gRNA ID to find corresponding gene |
| Number of Reads | Raw Illumina read count |
Sheet 4. Lost gRNA sequences.
| Column Header | Notes |
|---|---|
| gene ID | The E. coli common gene name is used, followed by the gRNA targeting sequence used with the _TX schema, where X is the unique number given to the targeting sequence. |
| gRNA Sequence | Supplied 5'-3' notation |
| PAM | |
| Start | P. putida genome coordinates |
| End | |
| Strand | + or - strand |
Sheet 5. Illumina Genome Resequencing and Polymorphism Analysis of Selected Isoprenol Clones Analysis from breseq v 0.38.1
| Column Header | Notes |
|---|---|
| Sample | P. putida Strain Name |
| Evidence | RA = read alignment evidence. MJ = missing junction to reference sequence. JC = potential new junction. |
| Position | Genomic Coordinates in P. putida AE015451 |
| Mutation | |
| Annotation | |
| Gene | |
| Description |
Sheet 6. ShinyGO Enrichment Analysis of High Producer Isoprenol Strains.
| Column | Notes |
|---|---|
| Sample | Refer to P. putida proteomics dataset in main manuscript |
| Enrichment FDR | FDR = False discovery rate |
| Negative Log10 FDR | |
| nGenes | Number of corresponding genes included in set |
| Pathway Genes | Total number of corresponding genes in annotated pathway |
| Fold Enrichment | |
| Include In Map (Refer to Supplemental Figure) | |
| Pathway | |
| URL | |
| Genes |
Supplementary Data 2 (Excel sheet):
Proteomics samples. Each individual sheet in this file contains one separate experimental dataset.
Sheet 1. Proteomics Analysis of ∆yiaY ∆yiaZ Complementation by Varied Plasmid Constructs
| Column Header | Notes |
|---|---|
| Protein.Group | |
| Protein.Names | Alternate Protein.ID from UniPROT |
| Protein | Tertiary Protein.ID from UniPROT |
| Protein.Description | |
| Sample | Corresponding Strain Analyzed. Note: D= deletion ("∆") |
| Counts_Mean | Protein counts |
Sheet 2. Analysis of PJ23119-yiaY,yiaZ Constitutive Expression
| Column Header | Notes |
|---|---|
| Protein.Group | |
| Protein.Names | Alternate Protein.ID from UniPROT |
| Protein | Tertiary Protein.ID from UniPROT |
| Protein.Description | |
| Sample | Corresponding Strain Analyzed. |
| Replicate | |
| Value_Sum | Mean of protein counts across replicates |
Sheet 3. Proteomics Culture Format Evaluation of Isoprenol Producer Strains
| Column Header | Notes |
|---|---|
| Protein.Group | |
| Protein.Names | Alternate Protein.ID from UniPROT |
| Protein | Tertiary Protein.ID from UniPROT |
| Protein.Description | |
| Sample | Corresponding Strain Analyzed. |
| Replicate | |
| Counts_Sum |
Sheet 4. Growth/Production phase Samples of High Isoprenol Producer Strains TEAM-3174 & 3185 Compared to TEAM-2595
| Column Header | Notes |
|---|---|
| Protein.Group | |
| Protein.Names | Alternate Protein.ID from UniPROT |
| Protein | Tertiary Protein.ID from UniPROT |
| Protein.Description | |
| Sample | Corresponding Strain Analyzed. |
| Replicate | |
| Counts_Sum |
Sheet 5. Plasmid-born augmentation of Isoprenol pathway overexpression in genomically integrated producer strains
| Column Header | Notes |
|---|---|
| Protein.Group | |
| Protein.Names | Alternate Protein.ID from UniPROT |
| Protein | Tertiary Protein.ID from UniPROT |
| Protein.Description | |
| Sample | Corresponding Strain Analyzed. |
| Replicate | |
| Counts_Sum |
Supplementary Data 3 (Folder):
Important note: In keeping with DataDryad copyright requirements (relating to Deepmind licensing), AlphaFold3 raw pdb files have been replaced with corresponding 3D rotating movies generated with ChimeraX for hosting in this repository. To enable readers to generate the same predictions we provide the protein sequences below to be used at https://alphafoldserver.com. The corresponding PDB output and use licensing has been linked through to zenodo.
>PP_2682/YiaY
MSQSFSPLRKFVSPEIIFGAGCRHNVANYAKTFGARKVLVVSDPGVIAAGWVADVEASLQAQGIDYCLYTAVSPNPRVEEVMLGAEIYRQNHCDVIVAVGGGSPMDCGKAIGIVVAHGRSILEFEGVDMIRVPSPPLILIPTTAGTSADVSQFVIISNQQERMKFSIVSKAVVPDVSLIDPQTTLSMDPFLSACTGIDALVHAIEAFVSTGHGPLTDPHALEAMRLINGNLVEMIANPTDIALREKIMLGSMQAGLAFSNAILGAVHAMSHSLGGFLDLPHGLCNAVLVEHVVAFNYSSAPERFKVIAEVFGIDCRGLNHRQICGRLVEHLIALKRAIGFHETLGLHGVRTSDIPFLSQHAMDDPCILTNPRASSQRDVEVVYGEAL
>PP_2683/YiaZ
MARPSDEQQRALAGLLGLGDHSARKSHYPELSARLDELEAERNRYKWLFENAVHGIFQASLQDGMRAANPALARMLGYDDPQAVLFSLTQLAANLFDGGAEELQAITAVLAREHSLHGYETRLRRKDGSHLDVLMNLLLKPGHEGLVEGFVADITERKLAQQRLQQLNDELEQRVAARTDELLEARDAAEAANRSKDKYLAAASHDLLQPLNAARLLISTLRERPLPEAEHVLVERTHQALEGAEDLLTDLLDISRLDQAAVKPDVAVYRLDELFAPLVSEFSPVAEAAGLKLHARIADYAISTDLRLLTRILRNFLSNACRYTEEGRILLGARRRGGHLRLEVWDTGRGIAQDRLQDIFLEFNQLDVGRAADRKGVGLGLAIVERIAKILGYRIEVRSWLGRGSVFSIEVPLGKEVPLAVHQAVPLPSVGDPLPGRRLLVLDNEVSILESMGALLGQWGCEVVTATDREGALLALQGRAPELILADYHLDHGVVGCEVVRYLREHFATAIPAVIITADRSDQCRRGLQKLGAPLLNKPVKPGKLRAVLSQLLLVH
>PP_2664
MPATGLLSVAELQAELTRLQHQNHKLQRINDALIERIESGVTRGNDPYAAFQHSVVLAEQ VRERTDALNQAMAELKAVNRLLSEARQRAETAHQHQIRLITDNVPALIAYLNADLVYEFT NKVYEEWYCWPHGVMLGQSLREAHSEQHYQRLEGYVARALAGESVTFEFAETNINGQERY MLRSYVPNRLASGEVVGIFVLIRDITERRNTAQALHQAYQHLEQRVRERTAELTSLNDQL LREIEERSQAESRLREAKREAEQANLSKTKFLAAVSHDLLQPLNAARLFTSALLERDEPQ NAAHLVRNVSNSLEDVENLLGTLVDISKLDAGVIKADVAPFALHELMDNLAAEYVQVARS EGLELHFVGCSAVVRSDIQLLARILRNLLSNAIRYTPSGRVVLGCRRLRGGVRIEVWDSG IGIAEEHLQDMFLEFKRGDVQRPDQDRGLGLGLAIVEKIAGILGHRIRVRSWLGKGSVFA VEVPLSTTAPKAQPSQVICEPMLERLRGARVWVLDNDAAICAGMRTLLEGWGCRVVTALS EEDLARQVDNYHADADLLIADYHLDNDCNGVDAVARINARRAQPLPALMITVNYSNDLKQ QIRELGHTLMHKPVRPMKLKTAMSHLLASGLA
Supplementary Data 4 (File):
fast.genomics analysis of yiaY and yiaZ homolog co-occurrence in microbial genomes.
| Column Header |
|---|
| locusTag |
| proteinId |
| assemblyId |
| scaffoldId |
| geneBegin |
| geneEnd |
| strand |
| gtdbDomain |
| gtdbPhylum |
| gtdbClass |
| gtdbOrder |
| gtdbFamily |
| gtdbGenus |
| gtdbSpecies |
| strain |
| identity |
| alnLength |
| nGapOpens |
| qBegin |
| qEnd |
| sBegin |
| sEnd |
| eValue |
| bits |
Supplementary Data 5 (Folder):
Flow cytometry raw data for mcherry timecourse analysis. Samples included in this folder have the following naming convention: [well number of plate from prepared for Accuri cytometer analysis] [media condition, with or without added isoprenol] [strain: WT, pJ23119-PP2682,3, pBAD-PP2682,3 promoter variants] [timepoint in hour increments measured]
For example: the file named "A01 M9 WT 1 hr.fcs" is from well position A01; WT P. putida KT2440 was grown (with the biosensor plasmid) in M9 media and sampled at the 1 hour timepoint post isoprenol (+/-) induction in the experimental timecourse.
Representative Plasmidsaurus ONT gRNA amplicon sequencing reads. These files are the raw fastQ reads from the Oxford nanopore gRNA library sequencing post selection in M9 media. For the first round of enrichment, these are supplied in folder "gRNA Round 1DZHSX3_raw". FastQ file "PCR_829_gRNA_plusCVio_1" to PCR amplified samples using plasmid pTE829 as the RBS variant where crystal violent (CVio) was added to the samples to induce the isoprenol pathway. Similarly, "PCR_850_gRNA_plusCVio_4" uses plasmid pTE850 instead of pTE829 for a different RBS selection strength.
In the next iteration of gRNA selection, a larger number of samples were collected and are shown in folder "pTE965__pTE964gRNAinTEAM-29968raw-reads". These follow a more conventional naming structure: QYGTT7__X_X* *are reference numbers from the commercial ONT provider, followed by the P. putida strain ID (ie TEAM__2998) and the RBS plasmid variant (ie pTE965) and finally whether or not crystal violet (CVio) was added to the media or not. The Ecoli control condition is also included ("QYGTT7_17_17_Ec_gRNA_control_20240624").
Code/software
FastQ ONT reads can be viewed with commercial DNA sequencing alignment programs, including Geneious, IGV, or others. Flow cytometry data was exported from an Accuri C6 Flow Cytometer and was visualized with FlowJo (DNAStar). Excel sheets were generated with Excel for Mac (Microsoft Office 365, Excel Version 16.95.4 (25040241)). Alphafold structure predictions were visualized with the UCSF Chimera software package.
Access information
Other publicly accessible locations of the data:
- PDB output from AF3 is linked from this repository to Zenodo.
Data was derived from the following sources:
- Research data was generated at the LBNL Biosciences ESE Campus facility.
Flow cytometry: High-throughput flow cytometry experiments were performed using the Accuri C6 flow cytometer equipped with a microtiter plate autosampler (BD). Cells were prepared for isoprenol induction assays and sampled at the indicated timepoints. Upon sampling, cells were diluted to OD600 0.1 in 500 μl of PBS medium. A total of 30,000 events were recorded at a flow rate of 66 μl/min, and a core size of 22 μm. mCherry was excited at 552 nm at 70 mW and emission detected at 610 nm with a 20nm bandbass. Data acquisition was performed as described in the Accuri C6 Sampler User's Guide and analyzed with Treestar FloJo V10.1. No sample gates were applied during analysis.
Enrichment of Guide RNAs from Library Under PpedF-pyrF Selection: P. putida ∆pyrF strains transformed with a PpedF-pyrF plasmid were subject to triparental conjugation with the gRNA library harbored in E. coli DH10 with a E. coli pRK2013 tra+ helper strain. The three strains were spotted onto solid LB agar media and allowed to incubate overnight at 30 ˚C. The next day, a small amount of biomass from the conjugation was isolated with a sterile toothpick and used to inoculate 1.5 mL M9 medium kanamycin with or without 1µM CV in 24-deep well plates with 4 replicates from each conjugation. Samples were grown for 24 hours at which point we examined the cultures for growth. If the cultures showed turbidity or in the best case saturation after 24h post-inoculation, 30 uL of the culture was prepared for colony PCR and the gRNA sequences present on the dCpf1/CRISPRi plasmid were amplified by OneTaq PCR. gRNA amplicons were amplified using oligos TEAM-1174 (5’-gaccagttgcgcctgtcggtgttcagtg-3’) and TEAM-644 (5’-gatcttccccatcggtgatgtcg-3’). Biomass from the LB conjugation spots pre-selection and the E. coli DH10 strain harboring the library was also amplified and sequenced to verify the diversity of the initial distribution of gRNAs. gRNAs were sequenced using the Oxford Nanopore linear DNA amplicon service by Plasmidsaurus Inc (South San Francisco, CA). The rapid ONT sequencing service was chosen over other sequencing platforms and providers since it provided gRNA sequencing results as quickly as within 24 hours from sample submission, enabling rapid data analysis for future experimental planning. Raw reads were mapped to the pTE219 reference gRNA plasmid map using Geneious Prime and the aligned gRNA sequences downstream of the 5-’TTTN-3’ PAM sequence were extracted as a CSV file. Targeting spacer sequences were filtered to remove sequences that were 19 bases or fewer. Sequences were compared to the known gRNA targeting sequences (Supplementary Data 4) and implicated genes were selected based on the following criteria: (1) if a particular gRNAs was enriched (>5 reads in one biological replicate) (2) there are multiple gRNAs targeting the same gene (3) gRNAs target genes functionally related (ie, generation of a specific process) or targets in the same operon (4) the repeated occurrence of gRNAs or gene targets across multiple replicates. All selected targets from both rounds are described in Supplementary Tables 2 and 3. Verification of gRNA knockdown on isoprenol titers was analyzed in isogenic deletion strains to both reveal a fully penetrant phenotype and eliminate gene perturbations from potential off-target gRNA repression that would complicate interpretation of changes to isoprenol titers (97). Candidate genes from the gRNA enrichment were first grouped by function using HMMer and COG to identify non-redundant cellular processes. At random, we picked several from each category to design new gRNA plasmids and recombineering oligos, choosing 28 targets for the first enrichment screen and 30 for the second screen.
Computational Structure Predictions: To identify potential interaction domains between PP_2664, YiaY, and YiaZ, we used AlphaFold [DOI: 10.1038/s41586-021-03819-2], AlphaFill, and AlphaFold3 to model protein structures in the absence of evidence from protein crystallization studies. Protein sequences were identified from Uniprot (PP_2664: Q88JI5; YiaY: Q88JG7; Q88JG6). AlphaFold was run on a LBNL server described in [DOI: 10.1371/journal.pcbi.1011171]. AlphaFill was performed using a public webserver as described in [DOI: 10.1038/s41592-022-01685-y]. All structures were reanalyzed in AlphaFold3 (at alphafoldserver.com) as described in DOI: [10.1038/s41586-024-07487-w] and prepared for inclusion as figure panels using the ChimeraX software package [DOI: 10.1002/pro.4792], and a representation of this output is included in Supplementary Data 3.
