Skip to main content
Dryad

Rule-based deconstruction and reconstruction of diterpene libraries: Categorizing foundational patterns & unravelling the structural landscape

Abstract

Terpenoids make up the largest class of specialized metabolites with over 180,000 reports currently across all kingdoms of life. Their synthesis accentuates one of natures most choreographed enzymatic and non-reversible chemistries, leading to an extensive range of structural functionality and diversity. Current terpenoid repositories provide a seemingly endless playground of information regarding structure, sourcing, and synthesis. Efforts here investigate entries for the 20-carbon diterpenoid variants and deconstruct the complex patterns into simple, categorical groups. This deconstruction approach reduces over 60,000 unique compound entries to less than 1,000 categorical structures. Furthermore, over 75% of all diversity can be represented by just 25 structures. Diterpenoid diversity was mapped at an atomic scale, across the total compound landscape, and distributed throughout the tree of life. Additionally, these core structures provide guidelines for predicting how this diversity first originates via the mechanisms catalyzed by diterpene synthases. Over 95% of diterpenoid structures rely on cyclization. Here a reconstructive approach is reapplied based on known biochemical rules to model the birth of compound diversity. This computational synthesis validates previously identified reaction products and pathways, as well as enables predicting trajectories for synthesizing real and theoretical compounds. This deconstructive and reconstructive approach applied to the diterpene landscape provides modular, flexible, and an easy-to-use toolset for categorically simplifying otherwise complex or hidden patterns.