Ancient gene clusters govern the initiation of monoterpenoid indole alkaloid biosynthesis and C3 stereochemistry inversion
Data files
Sep 30, 2025 version files 10.70 MB
-
CrTHAS2_for_XYZ.zip
257.88 KB
-
HpHYC3O_for_XYZ.zip
2.97 MB
-
Rauvolfia_tetraphylla.gff3.zip
2.97 MB
-
README.md
3.85 KB
-
RsHYC3R(CAD7)_for_XYZ.zip
1.51 MB
-
RtHYC3O_for_XYZ.zip
2.98 MB
Abstract
The inversion of C3 stereochemistry in monoterpenoid indole alkaloids (MIAs), derived from the central precursor strictosidine (3S), is a critical step for the biosynthesis of numerous 3R MIAs and spirooxindoles, including the antihypertensive drug reserpine. While early MIA biosynthesis preserves the 3S configuration, the mechanism underlying C3 inversion has remained unresolved. Here, we identify and biochemically characterize a conserved oxidase-reductase pair in the Gentianales order: the heteroyohimbine/yohimbine/corynanthe C3-oxidase (HYC3O) and C3-reductase (HYC3R), which together invert the 3S stereochemistry to 3R across diverse substrates. Notably, HYC3O and HYC3R reside in gene clusters in Rauvolfia tetraphylla and Catharanthus roseus, homologous to an elusive geissoschizine synthase (GS) cluster we also uncovered. In R. tetraphylla, these clusters are in tandem on a single chromosome, likely derived from segmental duplication, whereas in C. roseus they reside on separate chromosomes due to translocation. Comparative genomics indicate the GS cluster originated at the base of Gentianales (∼135 Mya), coinciding with the evolution of the strictosidine synthase cluster, while the reserpine cluster arose later in rauvolfioid Apocynaceae. Together, these findings uncover the genomic and biochemical basis for key events in MIA evolution and diversification, providing insights beyond the canonical vinblastine and ajmaline biosynthetic pathways.
This dataset contains supplementary files associated with the analyses presented in the manuscript. The data include genome annotations, molecular modeling coordinate files, and structural docking resources that support the identification and characterization of biosynthetic gene clusters and stereochemistry-inverting enzymes involved in monoterpenoid indole alkaloid (MIA) biosynthesis.
Description of the data and file structure
The dataset is organized into individual compressed archives. Each archive corresponds to a specific analysis described in the manuscript.
- Rauvolfia_tetraphylla.gff3.zip
Contains the genome annotation file (.gff3) for Rauvolfia tetraphylla, generated with GeMoMa (reference-based annotation modeler). The annotation is aligned to the R. tetraphylla genome assembly (NCBI accession: ASM3051222v1). - CrTHAS2_for_XYZ.zip
Contains.xyzcoordinate files for docking analyses of Catharanthus roseus tetrahydroalstonine synthase 2 (CrTHAS2) with ligands. Files were generated using Molecular Operating Environment (MOE). - HpHYC3O_for_XYZ.zip
Contains.xyzcoordinate files for Hameliapatens HYC3O (HpHYC3O) enzyme with ligands, generated in MOE. - RsHYC3R(CAD7)_for_XYZ.zip
Contains.xyzcoordinate files for Rauvolfia serpentina C3-reductase (RsHYC3R, also referred to as CAD7) with ligands, generated in MOE. - RtHYC3O_for_XYZ.zip
Contains.xyzcoordinate files for Rauvolfia tetraphylla HYC3O (RtHYC3O) enzyme with ligands, generated in MOE.
All XYZ files contain Cartesian atomic coordinates for enzyme–ligand complexes used in substrate docking and structural analyses. These are intended for reuse in molecular modeling workflows.
Variable and file definitions
- .gff3 annotation file
seqid: Chromosome or scaffold identifiersource: Annotation source (GeMoMa)type: Feature type (e.g., gene, mRNA, exon, CDS)start,end: Genomic coordinates (base pairs)score: Alignment/annotation score (NA if not applicable)strand: + / – strandphase: CDS phase (0–2; NA if not CDS)attributes: Gene ID, transcript ID, functional annotations
- .xyz coordinate files
ATOM: List of atoms in the modeled complexX, Y, Z: Cartesian coordinates (Ångströms)element: Atomic symbol (C, N, O, H, etc.)- Each file corresponds to a specific enzyme–ligand docking pose.
Sharing/Access information
- Genome assembly reference: Rauvolfia tetraphylla genome (ASM3051222v1) is publicly available on NCBI.
- Experimental context and analyses are described in the associated publication.
Code/Software
- Genome annotation: Generated with GeMoMa (v1.9). Open-source software available at: https://github.com/JStrozzi/GeMoMa.
- Molecular modeling and docking: Performed using Molecular Operating Environment (MOE), 2022.02 (Chemical Computing Group Inc.).
- Viewing files:
.gff3: Open with any genome browser or text editor (e.g., IGV, Apollo, Artemis, or command-line parsing)..xyz: Can be viewed with free/open-source molecular visualization tools, such as PyMOL (open-source edition), Avogadro, or VMD.
Notes for reuse
- The
.gff3annotation is specific to the R. tetraphylla genome assembly ASM3051222v1 and may not align properly to other assemblies. - XYZ coordinate files represent modeled docking poses; they are not experimentally determined structures. They can be used as starting points for comparative docking, energy minimization, or structural refinement.
- No additional preprocessing is required to use these files.
