Biosensor-driven strain engineering reveals key cellular processes for maximizing isoprenol production in Pseudomonas putida

Menasalvas, Javier 1 ; Kulakowski, Shawn1; Chen, Yan1; Gin, Jennifer W.1 ; Turumtay, Emine Akyuz 1 ; Baral, Nawa Raj1; Apolonio, Morgan A.1; Rivier, Alex 1 ; Yunus, Ian S.1 ; Garber, Megan E.1; Scown, Corinne D.1 ; Adams, Paul D.1 ; Lee, Taek Soon 1 ; Blaby, Ian K.1 ; Baidoo, Edward E. K.1 ; Petzold, Christopher J.1 ; Eng, Thomas 1 ; Mukhopadhyay, Aindrila 1

Research facility: Lawrence Berkeley National Laboratory

Published Sep 19, 2025 on Dryad. https://doi.org/10.5061/dryad.sbcc2frjq

Data files

Sep 19, 2025 version files 79 MB

Data_Dryad_Supplementary_Data_Updated_2025-9-18_2.zip

78.98 MB
README.md

18.98 KB

Abstract

Synthetic and systems biology now produces vast combinatorial designs, but high-throughput analytical methods are poorly matched to interrogate this search space. We addressed this challenge with a biosensor-driven strategy in Pseudomonas putida to enhance isoprenol production, a key precursor for an advanced aviation fuel. Our biosensor leverages the native response of P. putida to short-chain alcohols, enabling a conditional growth-based selection that identified competing cellular processes as targets to improve isoprenol production. An iterative and combinatorial strain engineering approach yielded a 36-fold increase in isoprenol production (~900 mg/L). Ensemble -omics analysis revealed key causal metabolic rewiring that enhanced production. Techno-economic analysis provided an economic viability context and confirmed that the benefits of adding amino acid supplements outweigh the additional costs. This study establishes a modular and broadly applicable biosensor-driven approach for optimizing heterologous pathways, advancing the science of microbial bioproduction, and driving sustainable bioproducts development for a resilient economy. This companion dataset contains the several raw datasets generated from this study that are not uploaded in specific repositories.

Dataset DOI: 10.5061/dryad.sbcc2frjq

Description of the data and file structure

These files include the raw data corresponding to the experiments described in Menasalvas et al, spanning from proteomics, genomics, gRNA targeting sequence analysis, alphafold struture prediction, and flow cytometry kinetic timecourse data, in Data_Dryad_Supplementary_Data_Updated_2025-9-18_2.zip

Files and variables

File: Menasalvas_et_al_Supplementary_Data.zip

Abbreviations used: HHK: hybrid histidine kinase; RB-TnSeq: Random Barcode Transposon Sequencing; RBS: Ribosome Binding Site; PCR: polymerase chain reaction; WT: wild type; CV: coefficient of variation; dCpf1/dCas12a: deactivated Cfp1/Deactivated Cas12a (CRISPRi system); gRNA: guide RNA; ORF: open reading frame; LC-MS/MS: liquid chromatography tandem Mass Spectrometry; PHA: polyhydroxyalkanoate; SNP: Single Nucleotide Polymorphism; GO: Gene Ontology; ONT: Oxford Nanopore Technologies

The datasets arranged here in the corresponding zip file contain 2 Excel sheets and three folders of raw data:

Supplementary Data 1 (Excel sheet):

Sheet 1. Metabolite concentrations from selected isoprenol producer strains.

Column Header	Notes
Metabolite
Relative vs Absolute Concentration	Absolute concentrations for metabolites are indicated where the concentration was determined using a standard curve calculated with the peak area of the authentic analyte. Relative concentrations are displayed where the concentration indicated was determined using response from a single chemical standard.
Average Concentration	Average value from 3 biological replicates. Strain names are indicated with the TEAM-XXXX format. GP = growth phase samples. PP = production phase samples. Fold Change was calculated by the determining the ratio of concentrations from the indicated strain IDs in during growth phase (GP)
Specific Concentration	The average metabolite concentration was normalized against the OD₆₀₀ at the time of sample harvest.

Sheet 2. Pooled gRNA targeting sequences synthesized for the library.

Column Header	Notes
gene ID	The E. coli common gene name is used, followed by the gRNA targeting sequence used with the _TX schema, where X is the unique number given to the targeting sequence.
gRNA Sequence	Supplied 5'-3' notation
PAM
Start	P. putida genome coordinates
End
Strand	+ or - strand
Oligonucleotide sequences (protospacer flanked by AarI Golden gate linkers)

Sheet 3. Distribution of gRNA pooled library gRNA sequences.

Column Header	Notes
gRNA ID	refer to Sheet 2 gRNA ID to find corresponding gene
Number of Reads	Raw Illumina read count

Sheet 4. Lost gRNA sequences.

Column Header	Notes
gene ID	The E. coli common gene name is used, followed by the gRNA targeting sequence used with the _TX schema, where X is the unique number given to the targeting sequence.
gRNA Sequence	Supplied 5'-3' notation
PAM
Start	P. putida genome coordinates
End
Strand	+ or - strand

Sheet 5. Illumina Genome Resequencing and Polymorphism Analysis of Selected Isoprenol Clones Analysis from breseq v 0.38.1

Column Header	Notes
Sample	P. putida Strain Name
Evidence	RA = read alignment evidence. MJ = missing junction to reference sequence. JC = potential new junction.
Position	Genomic Coordinates in P. putida AE015451
Mutation
Annotation
Gene
Description

Sheet 6. ShinyGO Enrichment Analysis of High Producer Isoprenol Strains.

Column	Notes
Sample	Refer to P. putida proteomics dataset in main manuscript
Enrichment FDR	FDR = False discovery rate
Negative Log10 FDR
nGenes	Number of corresponding genes included in set
Pathway Genes	Total number of corresponding genes in annotated pathway
Fold Enrichment
Include In Map (Refer to Supplemental Figure)
Pathway
URL
Genes

Supplementary Data 2 (Excel sheet):

Proteomics samples. Each individual sheet in this file contains one separate experimental dataset.

Sheet 1. Proteomics Analysis of ∆yiaY ∆yiaZ Complementation by Varied Plasmid Constructs

Column Header	Notes
Protein.Group
Protein.Names	Alternate Protein.ID from UniPROT
Protein	Tertiary Protein.ID from UniPROT
Protein.Description
Sample	Corresponding Strain Analyzed. Note: D= deletion ("∆")
Counts_Mean	Protein counts

Sheet 2. Analysis of PJ23119-yiaY,yiaZ Constitutive Expression

Column Header	Notes
Protein.Group
Protein.Names	Alternate Protein.ID from UniPROT
Protein	Tertiary Protein.ID from UniPROT
Protein.Description
Sample	Corresponding Strain Analyzed.
Replicate
Value_Sum	Mean of protein counts across replicates

Sheet 3. Proteomics Culture Format Evaluation of Isoprenol Producer Strains

Column Header	Notes
Protein.Group
Protein.Names	Alternate Protein.ID from UniPROT
Protein	Tertiary Protein.ID from UniPROT
Protein.Description
Sample	Corresponding Strain Analyzed.
Replicate
Counts_Sum

Sheet 4. Growth/Production phase Samples of High Isoprenol Producer Strains TEAM-3174 & 3185 Compared to TEAM-2595

Column Header	Notes
Protein.Group
Protein.Names	Alternate Protein.ID from UniPROT
Protein	Tertiary Protein.ID from UniPROT
Protein.Description
Sample	Corresponding Strain Analyzed.
Replicate
Counts_Sum

Sheet 5. Plasmid-born augmentation of Isoprenol pathway overexpression in genomically integrated producer strains

Column Header	Notes
Protein.Group
Protein.Names	Alternate Protein.ID from UniPROT
Protein	Tertiary Protein.ID from UniPROT
Protein.Description
Sample	Corresponding Strain Analyzed.
Replicate
Counts_Sum

Supplementary Data 3 (Folder):

Important note: In keeping with DataDryad copyright requirements (relating to Deepmind licensing), AlphaFold3 raw pdb files have been replaced with corresponding 3D rotating movies generated with ChimeraX for hosting in this repository. To enable readers to generate the same predictions we provide the protein sequences below to be used at https://alphafoldserver.com. The corresponding PDB output and use licensing has been linked through to zenodo.

>PP_2682/YiaY
MSQSFSPLRKFVSPEIIFGAGCRHNVANYAKTFGARKVLVVSDPGVIAAGWVADVEASLQAQGIDYCLYTAVSPNPRVEEVMLGAEIYRQNHCDVIVAVGGGSPMDCGKAIGIVVAHGRSILEFEGVDMIRVPSPPLILIPTTAGTSADVSQFVIISNQQERMKFSIVSKAVVPDVSLIDPQTTLSMDPFLSACTGIDALVHAIEAFVSTGHGPLTDPHALEAMRLINGNLVEMIANPTDIALREKIMLGSMQAGLAFSNAILGAVHAMSHSLGGFLDLPHGLCNAVLVEHVVAFNYSSAPERFKVIAEVFGIDCRGLNHRQICGRLVEHLIALKRAIGFHETLGLHGVRTSDIPFLSQHAMDDPCILTNPRASSQRDVEVVYGEAL

>PP_2683/YiaZ
MARPSDEQQRALAGLLGLGDHSARKSHYPELSARLDELEAERNRYKWLFENAVHGIFQASLQDGMRAANPALARMLGYDDPQAVLFSLTQLAANLFDGGAEELQAITAVLAREHSLHGYETRLRRKDGSHLDVLMNLLLKPGHEGLVEGFVADITERKLAQQRLQQLNDELEQRVAARTDELLEARDAAEAANRSKDKYLAAASHDLLQPLNAARLLISTLRERPLPEAEHVLVERTHQALEGAEDLLTDLLDISRLDQAAVKPDVAVYRLDELFAPLVSEFSPVAEAAGLKLHARIADYAISTDLRLLTRILRNFLSNACRYTEEGRILLGARRRGGHLRLEVWDTGRGIAQDRLQDIFLEFNQLDVGRAADRKGVGLGLAIVERIAKILGYRIEVRSWLGRGSVFSIEVPLGKEVPLAVHQAVPLPSVGDPLPGRRLLVLDNEVSILESMGALLGQWGCEVVTATDREGALLALQGRAPELILADYHLDHGVVGCEVVRYLREHFATAIPAVIITADRSDQCRRGLQKLGAPLLNKPVKPGKLRAVLSQLLLVH

>PP_2664
MPATGLLSVAELQAELTRLQHQNHKLQRINDALIERIESGVTRGNDPYAAFQHSVVLAEQ VRERTDALNQAMAELKAVNRLLSEARQRAETAHQHQIRLITDNVPALIAYLNADLVYEFT NKVYEEWYCWPHGVMLGQSLREAHSEQHYQRLEGYVARALAGESVTFEFAETNINGQERY MLRSYVPNRLASGEVVGIFVLIRDITERRNTAQALHQAYQHLEQRVRERTAELTSLNDQL LREIEERSQAESRLREAKREAEQANLSKTKFLAAVSHDLLQPLNAARLFTSALLERDEPQ NAAHLVRNVSNSLEDVENLLGTLVDISKLDAGVIKADVAPFALHELMDNLAAEYVQVARS EGLELHFVGCSAVVRSDIQLLARILRNLLSNAIRYTPSGRVVLGCRRLRGGVRIEVWDSG IGIAEEHLQDMFLEFKRGDVQRPDQDRGLGLGLAIVEKIAGILGHRIRVRSWLGKGSVFA VEVPLSTTAPKAQPSQVICEPMLERLRGARVWVLDNDAAICAGMRTLLEGWGCRVVTALS EEDLARQVDNYHADADLLIADYHLDNDCNGVDAVARINARRAQPLPALMITVNYSNDLKQ QIRELGHTLMHKPVRPMKLKTAMSHLLASGLA

Supplementary Data 4 (File):

fast.genomics analysis of yiaY and yiaZ homolog co-occurrence in microbial genomes.

Column Header
locusTag
proteinId
assemblyId
scaffoldId
geneBegin
geneEnd
strand
gtdbDomain
gtdbPhylum
gtdbClass
gtdbOrder
gtdbFamily
gtdbGenus
gtdbSpecies
strain
identity
alnLength
nGapOpens
qBegin
qEnd
sBegin
sEnd
eValue
bits

Supplementary Data 5 (Folder):

Flow cytometry raw data for mcherry timecourse analysis. Samples included in this folder have the following naming convention: [well number of plate from prepared for Accuri cytometer analysis] [media condition, with or without added isoprenol] [strain: WT, pJ23119-PP2682,3, pBAD-PP2682,3 promoter variants] [timepoint in hour increments measured]

For example: the file named "A01 M9 WT 1 hr.fcs" is from well position A01; WT P. putida KT2440 was grown (with the biosensor plasmid) in M9 media and sampled at the 1 hour timepoint post isoprenol (+/-) induction in the experimental timecourse.

Representative Plasmidsaurus ONT gRNA amplicon sequencing reads. These files are the raw fastQ reads from the Oxford nanopore gRNA library sequencing post selection in M9 media. For the first round of enrichment, these are supplied in folder "gRNA Round 1DZHSX3_raw". FastQ file "PCR_829_gRNA_plusCVio_1" to PCR amplified samples using plasmid pTE829 as the RBS variant where crystal violent (CVio) was added to the samples to induce the isoprenol pathway. Similarly, "PCR_850_gRNA_plusCVio_4" uses plasmid pTE850 instead of pTE829 for a different RBS selection strength.

In the next iteration of gRNA selection, a larger number of samples were collected and are shown in folder "pTE965__pTE964gRNAinTEAM-29968raw-reads". These follow a more conventional naming structure: QYGTT7__X_X* *are reference numbers from the commercial ONT provider, followed by the P. putida strain ID (ie TEAM__2998) and the RBS plasmid variant (ie pTE965) and finally whether or not crystal violet (CVio) was added to the media or not. The Ecoli control condition is also included ("QYGTT7_17_17_Ec_gRNA_control_20240624").

Code/software

FastQ ONT reads can be viewed with commercial DNA sequencing alignment programs, including Geneious, IGV, or others. Flow cytometry data was exported from an Accuri C6 Flow Cytometer and was visualized with FlowJo (DNAStar). Excel sheets were generated with Excel for Mac (Microsoft Office 365, Excel Version 16.95.4 (25040241)). Alphafold structure predictions were visualized with the UCSF Chimera software package.

Access information

Other publicly accessible locations of the data:

PDB output from AF3 is linked from this repository to Zenodo.

Data was derived from the following sources:

Research data was generated at the LBNL Biosciences ESE Campus facility.

Flow cytometry: High-throughput flow cytometry experiments were performed using the Accuri C6 flow cytometer equipped with a microtiter plate autosampler (BD). Cells were prepared for isoprenol induction assays and sampled at the indicated timepoints. Upon sampling, cells were diluted to OD₆₀₀ 0.1 in 500 μl of PBS medium. A total of 30,000 events were recorded at a flow rate of 66 μl/min, and a core size of 22 μm. mCherry was excited at 552 nm at 70 mW and emission detected at 610 nm with a 20nm bandbass. Data acquisition was performed as described in the Accuri C6 Sampler User's Guide and analyzed with Treestar FloJo V10.1. No sample gates were applied during analysis.

Enrichment of Guide RNAs from Library Under P_pedF-pyrF Selection: P. putida ∆pyrF strains transformed with a P_pedF-pyrF plasmid were subject to triparental conjugation with the gRNA library harbored in E. coli DH10 with a E. coli pRK2013 tra+ helper strain. The three strains were spotted onto solid LB agar media and allowed to incubate overnight at 30 ˚C. The next day, a small amount of biomass from the conjugation was isolated with a sterile toothpick and used to inoculate 1.5 mL M9 medium kanamycin with or without 1µM CV in 24-deep well plates with 4 replicates from each conjugation. Samples were grown for 24 hours at which point we examined the cultures for growth. If the cultures showed turbidity or in the best case saturation after 24h post-inoculation, 30 uL of the culture was prepared for colony PCR and the gRNA sequences present on the dCpf1/CRISPRi plasmid were amplified by OneTaq PCR. gRNA amplicons were amplified using oligos TEAM-1174 (5’-gaccagttgcgcctgtcggtgttcagtg-3’) and TEAM-644 (5’-gatcttccccatcggtgatgtcg-3’). Biomass from the LB conjugation spots pre-selection and the E. coli DH10 strain harboring the library was also amplified and sequenced to verify the diversity of the initial distribution of gRNAs. gRNAs were sequenced using the Oxford Nanopore linear DNA amplicon service by Plasmidsaurus Inc (South San Francisco, CA). The rapid ONT sequencing service was chosen over other sequencing platforms and providers since it provided gRNA sequencing results as quickly as within 24 hours from sample submission, enabling rapid data analysis for future experimental planning. Raw reads were mapped to the pTE219 reference gRNA plasmid map using Geneious Prime and the aligned gRNA sequences downstream of the 5-’TTTN-3’ PAM sequence were extracted as a CSV file. Targeting spacer sequences were filtered to remove sequences that were 19 bases or fewer. Sequences were compared to the known gRNA targeting sequences (Supplementary Data 4) and implicated genes were selected based on the following criteria: (1) if a particular gRNAs was enriched (>5 reads in one biological replicate) (2) there are multiple gRNAs targeting the same gene (3) gRNAs target genes functionally related (ie, generation of a specific process) or targets in the same operon (4) the repeated occurrence of gRNAs or gene targets across multiple replicates. All selected targets from both rounds are described in Supplementary Tables 2 and 3. Verification of gRNA knockdown on isoprenol titers was analyzed in isogenic deletion strains to both reveal a fully penetrant phenotype and eliminate gene perturbations from potential off-target gRNA repression that would complicate interpretation of changes to isoprenol titers (97). Candidate genes from the gRNA enrichment were first grouped by function using HMMer and COG to identify non-redundant cellular processes. At random, we picked several from each category to design new gRNA plasmids and recombineering oligos, choosing 28 targets for the first enrichment screen and 30 for the second screen.

Computational Structure Predictions: To identify potential interaction domains between PP_2664, YiaY, and YiaZ, we used AlphaFold [DOI: 10.1038/s41586-021-03819-2], AlphaFill, and AlphaFold3 to model protein structures in the absence of evidence from protein crystallization studies. Protein sequences were identified from Uniprot (PP_2664: Q88JI5; YiaY: Q88JG7; Q88JG6). AlphaFold was run on a LBNL server described in [DOI: 10.1371/journal.pcbi.1011171]. AlphaFill was performed using a public webserver as described in [DOI: 10.1038/s41592-022-01685-y]. All structures were reanalyzed in AlphaFold3 (at alphafoldserver.com) as described in DOI: [10.1038/s41586-024-07487-w] and prepared for inclusion as figure panels using the ChimeraX software package [DOI: 10.1002/pro.4792], and a representation of this output is included in Supplementary Data 3.