Data from: Homospermidine synthase evolution and the origin(s) of pyrrolizidine alkaloids in Apocynaceae

Smith, Chelsea 1 ; Kaltenegger, Elisabeth2; Teisher, Jordan3; Moore, Abigail4; Straub, Shannon5; Livshultz, Tatyana1

Research facility: Academy of Natural Sciences of Drexel University

Published Jan 30, 2026 on Dryad. https://doi.org/10.5061/dryad.1c59zw3z2

Abstract

Premise: Enzymes encoded by paralogous genes producing identical specialized metabolites in distantly related plant lineages are strong evidence of parallel phenotypic evolution. Inference of phenotypic homology for metabolites produced by orthologous genes is less straightforward, since orthologs may be recruited in parallel into novel pathways. In prior research on pyrrolizidine alkaloids (PAs), specialized metabolites of Apocynaceae, the evolution of homospermidine synthase (HSS), an enzyme of PA biosynthesis, was reconstructed and a single origin of PAs inferred because HSS enzymes of all known PA-producing Apocynaceae species are orthologous and descended from an ancestral enzyme with the motif (VXXXD) of an optimized HSS. The

Methods: We increased sampling, tested the effect of amino acid motif on HSS function, revisited motif evolution, and tested for selection to infer evolution of HSS function and its correlation with phenotype.

Results: Some evidence supports a single origin of PAs: an IXXXD HSS-like gene, similar in function to VXXXD HSS, evolved in the shared ancestor of all PA-producing species; loss of HSS function occurred multiple times via pseudogenization and perhaps via evolution of an IXXXN motif. Other evidence indicates multiple origins: the VXXXD motif, highly correlated with the PA phenotype, evolved two or four times independently; the ancestral IXXXD gene was not under positive selection, while some VXXXD genes were; and substitutions at sites experiencing positive selection occurred on multiple branches in the HSS-like gene tree.

Conclusions: The complexity of the genotype-function-phenotype map confounds the inference of PA homology from HSS-like gene evolution in Apocynaceae.

https://doi.org/10.5061/dryad.1c59zw3z2

Description of the data and file structure

These datasets and results files are part of a project that studies the evolution of a gene paralog in to test hypotheses about the evolution of pyrrolizidine alkaloid defense in Apocynaceae, adding both taxa sampling and sequence to previous study on a similar topic. The alignment are iterations (explained below) of alignments of the deoxyhypusine synthase (dhs) gene and its putatively neofunctionalized homospermidine synthase (hss) gene. The latter gene is the first gene in the pyrrolizidine alkaloid biosynthesis pathway and it interestingly exists in plant species, including many in Apocynaceae, that do not produce pyrrolizidine alkaloids. There is an amino acid motif that differs between DHS and HSS, and has been connected to HSS function. It was hypothesized that a VXXXD amino acid motif in HSS, compared to the IXXXN motif in DHS, is indicative of the evolution of optimal HSS function (i.e. pyrrolizidine alkaloid biosynthesis). Previous study reconstructed a VXXXD motif in the ancestral Apocynaceae HSS; this reconstruction was updated here. Additionally, tests for selection were conducted using trees made from the final alignment in this dataset. Three different Hyphy package tests (aBSREL, MEME, RELAX) looked for evidence of positive selection on branches predicted to be when optimal HSS function (i.e. amino acid motif) or PA biosynthesis evolved, as well as evidence of relaxation of selection in clades that had a more DHS-like motif that may have lost HSS-like function.

Files and variables

Alignment files:

1. Initial_alignment.fasta: this alignment contains all dhs-like and hss-like exon-only contigs assembled by HybPiper. A Psi in the sequence name means that the there is an indel or point mutation that results in a stop codon (i.e. pseudogene) in that in-frame sequence.

2. Full_dataset_alignment.fasta: based on the maximum likelihood tree constructed using the Initial_alignment.fasta, partial gene contigs were combined. This alignment contains these combined contigs, as well as additional short contigs and potential pseudogenes. This alignment was used to make a "full dataset" tree, the topology of which was used as a constraint tree for the "full dataset topology tree" used in amino acid reconstructions and selection tests.

3. Reduced_dataset_alignment.fasta: short contigs (sequences that do not cover at least exons 2-6) and pseudogenes were removed from the Full_dataset_alignment.fasta and the resulting alignment was realigned to produce this alignment. This alignment is used to make both the "reduced dataset tree" and the "full dataset topology tree" used in amino acid reconstructions and selection tests.

4. Marsdenieae_GARD.fasta: this alignment contains all Marsdenieae sequences, as well as an outgroup, split at the recombination breakpoint suggested by the Hyphy GARD test.

5. Human_dhs_comparison_aa.fasta: this amino acid alignment contains the amino acid sequences of Human dhs, the Reduced_dataset_alignment.fasta from this dataset, and the angiosperm DHS/HSS alignment from previous analysis. The purpose of this alignment is to compare known functional areas in Human dhs to areas in angiosperm dhs and hss.

Amino acid reconstruction files:

1. Reduced_dataset_topo_full_reconstruction.txt: This is the PAML CodeML output of the joint and marginal reconstruction of the amino acid sequence of each ancestral branch for the Reduced dataset topology. Input files are the Reduced_dataset_alignment.fasta and the associated, unconstrained, maximum likelihood tree.

2. Full_dataset_topo_full_reconstruction.txt: This is the PAML CodeML output of the joint and marginal reconstruction of the amino acid sequence of each ancestral branch for the Full dataset topology. Input files are the Reduced_dataset_alignment.fasta and the resulting maximum likelihood tree constrained to reflect the topology of the tree built from the Full_dataset_alignment.fasta.

Selection test output files:

1. reduced_dataset_topo_absrel.JSON: This is the output file for the aBSREL test for positive selection on a priori branches in the maximum likelihood tree made from the Reduced_dataset_alignment.fasta. These branches were selected because they had I269V and/or N273D (i.e. evolution of more HSS-like motif from a more DHS-like motif) or they were an ancestor of a PA-producing clade. This file can be viewed at http://vision.hyphy.org/aBSREL.

2. full_dataset_topo_absrel.JSON:This is the output file for the aBSREL test for positive selection on a priori branches in the maximum likelihood tree made from the Reduced_dataset_alignment.fasta, constrained to the topology of the tree made from the Full_dataset_alignment.fasta.These branches were selected because they had I269V and/or N273D (i.e. evolution of more HSS-like motif from a more DHS-like motif) or they were an ancestor of a PA-producing clade. This file can be viewed at http://vision.hyphy.org/aBSREL.

3. reduced_dataset_topo_meme.JSON: this is the output file for the MEME test for positive selection on sites among a priori branches in the maximum likelihood tree made from the Reduced_dataset_alignment.fasta. These branches were selected because they had I269V and/or N273D (i.e. evolution of more HSS-like motif from a more DHS-like motif) or they were an ancestor of a PA-producing clade. This file can be viewed at http://vision.hyphy.org/MEME.

4. full_dataset_topo_meme.JSON: This is the output file for the MEME test for positive selection on sites among a priori branches in the maximum likelihood tree made from the Reduced_dataset_alignment.fasta, constrained to the topology of the tree made from the Full_dataset_alignment.fasta.These branches were selected because they had I269V and/or N273D (i.e. evolution of more HSS-like motif from a more DHS-like motif) or they were an ancestor of a PA-producing clade. This file can be viewed at http://vision.hyphy.org/MEME.

5. reduced_dataset_topo_RELAX_dhs_hss.json: this is the output file for the RELAX test for relaxation of selection on branches within an a priori test clade relative to an a priori reference clade. Here, the test clade is the hss-like clade and the reference clade is the corresponding dhs-like sequences in the maximum likelihood tree made from the Reduced_dataset_alignment.fasta. This file can be viewed at http://vision.hyphy.org/RELAX.

6. full_dataset_topo_RELAX_dhs_hss.json: this is the output file for the RELAX test for relaxation of selection on branches within an a priori test clade relative to an a priori reference clade. Here, the test clade is the hss-like clade and the reference clade is the corresponding dhs-like sequences in the maximum likelihood tree made from the Reduced_dataset_alignment.fasta, constrained to the topology of the tree made from the Full_dataset_alignment.fasta. This file can be viewed at http://vision.hyphy.org/RELAX.

7. reduced_dataset_topo_RELAX_dton.json: this is the output file for the RELAX test for relaxation of selection on branches within an a priori test clade relative to an a priori reference clade. Here, the test clades include clades where an ancestral D273N occurred and the reference sequences are IXXXD sequences in the larger clade in the maximum likelihood tree made from the Reduced_dataset_alignment.fasta. This file can be viewed at http://vision.hyphy.org/RELAX.

8. full_dataset_topo_RELAX_dton.json: this is the output file for the RELAX test for relaxation of selection on branches within an a priori test clade relative to an a priori reference clade. Here, the test clades include clades where an ancestral D273N occurred and the reference sequences are IXXXD sequences in the larger clade in the maximum likelihood tree made from the Reduced_dataset_alignment.fasta, constrained to the topology of the tree made from the Full_dataset_alignment.fasta. This file can be viewed at http://vision.hyphy.org/RELAX.

9. reduced_dataset_topo_RELAX_dton2.json: this is the output file for the RELAX test for relaxation of selection on branches within an a priori test clade relative to an a priori reference clade. Here, the test clades include clades where an ancestral D273N occurred and the reference sequences are IXXXD sequences in the larger clade in the maximum likelihood tree made from the Reduced_dataset_alignment.fasta. This file can be viewed at http://vision.hyphy.org/RELAX.

10. full_dataset_topo_RELAX_dton2.json: this is the output file for the RELAX test for relaxation of selection on branches within an a priori test clade relative to an a priori reference clade. Here, the test clades include clades where an ancestral D273N occurred and the reference sequences are IXXXD sequences in the larger clade in the maximum likelihood tree made from the Reduced_dataset_alignment.fasta, constrained to the topology of the tree made from the Full_dataset_alignment.fasta. This file can be viewed at http://vision.hyphy.org/RELAX.

11. reduced_dataset_topo_RELAX_in_idvd.json: this is the output file for the RELAX test for relaxation of selection on branches within an a priori test clade relative to an a priori reference clade. Here, the test clades include VXXXD clades and the reference clades the remaining HSS-like clade in the maximum likelihood tree made from the Reduced_dataset_alignment.fasta. This file can be viewed at http://vision.hyphy.org/RELAX.

12. full_dataset_topo_RELAX_in_idvd.json: this is the output file for the RELAX test for relaxation of selection on branches within an a priori test clade relative to an a priori reference clade. Here, the test clades include VXXXD clades and the reference clades the remaining HSS-like clade in the maximum likelihood tree made from the Reduced_dataset_alignment.fasta, constrained to the topology of the tree made from the Full_dataset_alignment.fasta. This file can be viewed at http://vision.hyphy.org/RELAX.

13. reduced_dataset_topo_RELAX_vtoi.json:this is the output file for the RELAX test for relaxation of selection on branches within an a priori test clade relative to an a priori reference clade. Here, the test clades include sequences in a clade where a V269I substitution occurred in the ancestro and the reference clades are the VXXXD sequences in the larger clade in the maximum likelihood tree made from the Reduced_dataset_alignment.fasta. This file can be viewed at http://vision.hyphy.org/RELAX.

Code/software

.fasta and .txt files will require text reading applications, like Notepad.

.JSON files can be viewed online at http://vision.hyphy.org

Access information

Other publicly accessible locations of the data:

Data was derived from the following sources:

Taxon sampling forDHS/HSS sequencing

One hundred seventy (170) accessions of 159 Apocynaceae species (Appendix S1: Table S2), including 10 known PA-producing species were sampled (Appendix S1: Table S2). The outgroup was Gelsemium sempervirens (Gelsemiaceae) (Antonelli et al., 2021). Previously published Apocynaceae DHS and HSS sequences from 25 species were also included (Livshultz et al., 2018).

Calotropis genome query

DHS- and HSS-like genes were extracted from the Calotropis gigantea genome (Hoopes et al., 2018) via tblastx searches with exon sequences of Parsonsia alboflavescens DHS and HSS (MG817648.1, MG817649.1) in Geneious Prime v.2020.0.3 (https://www.geneious.com).

DNA extraction, library preparation, targeted enrichment, sequencing

DNA extraction, library preparation, targeted enrichment, and paired-end sequencing were previously described by Straub et al. (2020). Probes were designed based on the Apocynaceae DHS and HSS sequences published by Livshultz et al. (2018), manually trimmed to not more than 200 bp beyond the 5¢ and 3¢ ends of the first and last exons. A total of 2707 probes targeted DHS and HSS (Straub et al., 2020). A few contaminated libraries (i.e., libraries containing DNA from more than one sample) were identified by mapping a sample’s reads to its own assembled plastome. Any samples with evidence of two divergent plastomes or divergent sequences across multiple single-copy nuclear loci were considered contaminated and excluded (unpublished data).

Contig assembly

Raw paired-end sequence reads were trimmed using Trimmomatic default settings (slidingwindow:10:20, minlen:40) (Bolger et al., 2014). Using the first stage of the MyBaits pipeline [BLASTN (Cameron and Williams, 2007), SPAdes (Bankevich et al., 2012)] with default options (SPAdes: k 21,33,55,77, cov-cutoff off, phred-offset 33) (Moore et al., 2018), trimmed sequence reads for each sample were binned as DHS/HSS by BLASTN using reference transcriptomic DHS-like** sequences (Appendix S1: Table S2), and binned reads were assembled using SPAdes. Libraries that did not have an assembled contig** with >20´ coverage were removed from analysis (Bentley et al., 2008; Dohm et al., 2008; Harismendy et al., 2009; Whittall et al., 2010; Straub et al., 2012). Retained SPAdes contigs were extended and fused by afin, an assembly finishing program (-s 50, -l 100) (McKain and Wilson, 2017).

Exon annotation

Retained afin contigs were annotated for exonic regions in Geneious Prime v.2020.0.3 (https://www.geneious.com) using DHS and HSS exons** from Parsonsia alboflavescens (GenBank accessions MG817648.1, MG817649.1) (50% identical threshold). Annotated contigs (exon and intron) were aligned within a sample to identify and manually annotate partial or divergent exons that did not meet the identity threshold.

Alignment construction

All alignments were constructed as nucleotide alignments using MAFFT v7.450 (default settings except gap penalty=3) (Katoh et al., 2002; Katoh and Standley, 2013) in Geneious Prime v.2020.0.3 (https://www.geneious.com).

Maximum likelihood tree construction

Maximum likelihood trees were constructed with rapid bootstrapping and partitioned by codon position in RAxML-HPC v.8 on XSEDE (v8.2.12) (GTR+GAMMA, 1000 bootstrap replicates) (Stamatakis, 2014) in CIPRES Science Gateway (v3.3) (Miller et al., 2010).

Gene assembly from contigs

Annotated exons were extracted and aligned with an existing Apocynaceae sequence alignment (Livshultz et al., 2018) and Calotropis gigantea sequences (Hoopes et al., 2018). A maximum likelihood tree was built from this “initial alignment” (Appendix S1: Tables S2, S3; Appendix S2). Contigs orthologous to Parsonsia alboflavescens HSS were considered HSS-like, those outside this clade, DHS-like. Contigs derived from the same library were then concatenated using the following algorithm. After excluding contigs with non-terminal stop codons, a strict consensus was calculated from overlapping partial contigs that were in a clade with other DHS-like or HSS-like sequences from the same tribe in the initial alignment gene tree (Appendix S2) from the same library. Amino acid similarity in the region of overlap of the combined contigs ranged from 89.6–100% (Appendix S1: Table S5). Then non-overlapping contigs were concatenated using the same grouping criteria. Polyphyletic contig pairs were grouped if there were no sequences from closely related taxa for them to cluster with, and there was no evidence of contamination in the source library.

Validation of gene assemblies

Sequence assemblies for samples that had Sanger sequences of the same locus from Livshultz et al. (2018) were validated via pairwise alignment and calculation of divergence. Here and otherwise, divergence/pairwise amino acid sequence identity was calculated in Geneious Prime v.2020.0.3 (https://www.geneious.com) using in-frame nucleotide alignments; this calculation excludes missing data.

Construction of matrices

Sanger sequences >95% identical to new sequences from the same sample were removed from the alignment. The “full data set” consisted of potential pseudogene sequences, full (minimum exons 2–6) DHS-like and HSS-like sequences, short DHS-like and HSS-like contigs (from samples in which an entire DHS-like or HSS-like gene had also been assembled by SPAdes/afin), and non-redundant Sanger sequences from Livshultz et al. (2018) (Appendix 1: Tables S2, S3). This full data set alignment was trimmed to produce a “reduced data set” alignment used for ancestral state reconstruction and selection analyses. Potential pseudogenes and sequences that did not span at minimum exons 2–6 were removed from this data set, and the remaining sequences were realigned. Thirty-seven base pairs were trimmed from the 5¢ end and 67 bp from the 3¢ end of this reduced data set alignment because these areas were highly variable.

Gene tree construction

Three gene trees were produced. A full data set tree was constructed that contains all contigs (candidate pseudogenes, short contigs, and full/concatenated/consensus contigs) (Appendix S3). A reduced data set tree contained only contigs with at least exons 2-6 and no non-terminal stop codons (Figures 3A, 3B, 4A; Appendix S4). Lastly, a third tree was constructed using the reduced data set alignment 100% constrained to the full data set tree topology, referred to as the “full data set topology” (Figures 3C, 3D, B; Appendix S5).

Tests for recombination

Marsdenieae HSS-like paralogs were investigated further using GARD (Kosakovsky Pond et al., 2006). GARD searches an alignment for a maximum number of breakpoints, builds phylogenies for every non-recombinant contig, and assesses those phylogenies using the Akaike information criterion (AIC). Potentially recombinant contigs were split at potential recombination breakpoints indicated by GARD and a Marsdenieae HSS-like gene tree (outgroup: Tassadia propinqua HSS-like gene) was rebuilt using maximum likelihood tree construction criteria described above.

Shimodaira–Hasegawa tests

To test alternate topologies of Apocynaceae DHS-/HSS-like gene trees and test the monophyly of IXXXN and IXXXD paralogs in Marsdenieae, Shimodaira–Hasegawa (SH) tests in RAxML-HPC2 on XSEDE (v8.2.12) [SS7] [TL8] (Stamatakis, 2014) as implemented on the CIPRES Science Gateway (Miller et al. 2010) were performed. The SH test rejects or fails to reject a null hypothesis of equal support for two given topologies.

Ancestral sequence reconstruction

Ancestral sequences were reconstructed using codeml Model M0 (default options except: model=0, NSsites=0, RateAncestor=1, cleandata=0) in PAML v4.9j (Yang, 1997, 2007). codeML integrates the Goldman and Yang model of amino acid substitution and assumes that selection pressure on an individual site is the same for every branch, produces joint likelihood reconstructions (all ancestral nodes reconstructed), and uses empirical Bayes procedure (Yang and Wang, 1995) for sequence reconstruction. Additionally, codeml calculates a marginal reconstruction (single nodes reconstructed), which includes posterior probabilities for reconstructed amino acids.

Tests for selection: hypotheses

If the origin of pyrrolizidine alkaloids was adaptive, and if selection for PAs caused adaptive evolution of HSS, we can make testable predictions about patterns of selection on HSS-like genes in Apocynaceae. If there was a single origin of PAs in the MRCA of all PA-producing taxa (Figure 3A, C), we predict positive selection on the HSS-like gene of this MRCA (Figure 3A, C, branch A, w > 1), followed by purifying selection on branches that retained optimized HSS function (i.e., IXXXD motif in clade L or VXXXD motif in clade G) and relaxed selection on branches that lost optimized HSS function (i.e., evolution of IXXXN motif, Figure 3A, C, clades I, J, w = 1, k < 1). We also tested whether the evolution of an IXXXD motif (Figure 3A, clade K, w = 1, k < 1) in two PA-free Alafia species from an ancestral VXXXD motif (Figure 3A, clade G) is a result of loss of function. (While PAs have been reported from Alafia, the two sequenced species tested negative for PAs [Barny et al., 2021].])

In contrast, if parallel recruitment of the ancestral HSS-like paralog led to multiple origins of the PA biosynthetic pathway (Figure 3B, D), we predict relaxed selection (due to loss of DHS function) on the ancestral HSS-like gene (Figure 3B, D, branch A, w = 1), followed by positive selection on the ancestral HSS of each lineage where the HSS VXXXD motif (present in all sequenced PA-producing Apocynaceae species) and/or PA-production evolved (Figure 3B, D, branches B, C, D, E, F, G, to Isonema, to Strophanthus w > 1). Under the multiple origin scenario, branches with IXXXN and IXXXD motifs should remain under relaxed selection (if they did not evolve some new function); we also predict relaxation of selection in the HSS clade (clade A, k < 1) relative to the DHS clade (clade H), since most HSS branches would be nonfunctional and under relaxed selection (Figure 3B, D), while DHS is an essential gene that should always be under strong purifying selection.

Tests for selection: analyses

[A]BSREL (adaptive branch-site random effects likelihood) (Smith et al., 2015) was used to test for positive selection on selected branches (Figure 3, ω > 1). It fits optimal ω (dN/dS ratio) distributions to each branch by assigning each site to one of up to three ω rate categories, which generates the optimal ω distribution for each branch. For branches with more than one ω rate, the larger one is mapped in Figure 4, Appendices S4 and S5. Positive selection is inferred on a priori selected branches by comparing, via likelihood ratio test (LRT), the optimized ω distribution to a null model with constraint ω < 1 for all sites.

MEME (Mixed Effects Model of Evolution) (Murrell et al., 2012) was used to identify sites under positive selection on pre-specified branches (Figure 3, ω > 1). It allows selection to vary both among branches and among sites. Each branch is assigned one of two ω rate classes at each amino acid site. A single α (synonymous substitution rate) is shared among all branches. First, the nonsynonymous substitution rate (b[TL9] -) is estimated for each site; β- is constrained to be less than or equal to α (i.e., evolving neutrally). Second, the nonsynonymous substitution rate (b+[TL10] ) is unconstrained in the full model. Likelihood ratio tests are used to compare the full model with a null model where β+ is constrained to be less than α.

RELAX (Wertheim et al., 2015) was used to test for neutral evolution. It tests for relaxation and intensification of positive and purifying selection on pre-specified test branches (Figure 3A, C, lineages K, I, J) compared to a set of designated reference branches (Figure 3A, C, lineages G, L). The RELAX null model assigns all sites into one of three rate classes (ω1 = purifying, ω2 = neutral, ω3 = positive selection). The full (alternative) model introduces a selection intensity parameter, k, and raises ω^k on the test branches. The null model constrains k = 1, which forces the same ω distribution in both the test and reference branch sets. If the likelihood ratio tests find the alternative model is significantly better fit, a value of k >1 is considered evidence of intensified selection and a k <1, relaxed selection in the test branches relative to the designated reference branches. The RELAX general descriptive model was used to calculate the k values mapped in Figure 4, Appendices S4 and S5, from RELAX analyses comparing all HSS-like branches to DHS-like branches (Appendix S1: Table S12.4). Rather than using the a priori test and reference branch sets, the general descriptive model fits the three ω rates to all branches, and an individual k for each branch (Wertheim et al., 2015).

Comparison with human DHS

The reduced data set was aligned with the angiosperm DHS/HSS alignment from Livshultz et al. (2018) and human DHS (GenBank: P49366) to produce a “human and plant DHS/HSS alignment” (Appendix S1: Table S3). Human DHS amino acid site functions were described by Wator et al. (2020). The amino acid positions of DHS monomer interaction sites and functional sites (e.g., active site tunnel entrance) were taken from annotations on structure PDB ID 6XXM (the crystal structure of human DHS complexed with putrescine; Wator et al., 2020) using the NCBI Structure feature (Madej et al., 2014) in iCN3D v.2.24.4 (Wang et al., 2020). These sites were manually annotated on the human DHS** sequence in the alignment to enable comparison between human and plant DHS/HSS** amino acid positions.

Site-directed mutagenesis Parsonsia alboflavescens HSS

The open reading frame of Parsonsia alboflavescens HSS (PaHSS), cloned in an expression vector (NovagenTM pET28a, Millipore Sigma, Billerica, MA, USA) with an artificial N-terminal hexahistidine (6xHis) tag extension, was used as template for site-directed mutagenesis guided by Liu and Naismith (2008). Primer pairs to introduce the single mutations V269 to I269 (numbering of the amino acids follows that of Kaltenegger et al., 2013) and D273 to N273 as well as to double mutation V269XXXD273 to I269XXXN273 are given in Appendix S1, Table S4. PCRs with 12 amplification cycles were performed in a 25-µL reaction mixture with Phusion High-Fidelity DNA Polymerase (ThermoFisher Scientific, Waltham, Massachusetts, USA) according to the manufacturer’s instructions; annealing temperature is given in Appendix S1: Table S4. The PCR products were treated with restriction enzyme DpnI [BEH11] [TL12] (ThermoFisher Scientific, Waltham, MA, USA) at 37°C for 1 h, diluted with water (1:10), subsequently propagated in Escherichia coli TOP10 (ThermoFisher Scientific), and sent out for Sanger sequencing (MWG Eurofins Genomics, Ebersberg, Germany) to identify successful mutants.

Heterologous expression, purification, and activity assays of P. alboflavescens HSS and mutants

The complete ORF of the PaHSS and the mutant variants were expressed in Escherichia coli BL21(DE3) and purified as described by Ober and Hartmann (1999a). Protein purification was monitored via SDS-PAGE analysis, and protein quantities were estimated based on UV absorption at 280 nm [TL13] and the specific extinction of the respective protein, calculated with the PROTPARAM web tool in ExPASy (Gasteiger et al., 2005) and with the Bradford method (Bradford, 1976). The oligomerization state of the purified proteins was analyzed by size exclusion chromatography coupled to UV. Eight to 15 µg of affinity-purified DHS and HSS in borate buffer (~42 kDa) were analyzed on an analytical size-exclusion chromatography (SEC) column (MabPac Sec-1, 5 µm 300 Å, 4 ´ 150 mm) equilibrated with 50 mM phosphate buffer (pH 6.8) plus 0.3 M NaCl (0.2 mL/min flow), connected to an UltiMate 3000 system and a DAD-3000 diode array detector (ThermoFisher Scientific). Proteins were monitored at 280 nm. Cytochrome c[BEH14] [TL15] (12 kDa) and BSA (monomer 66.5 kDa, dimer 132 kDa) were used as reference proteins.

For biochemical characterization, the purified proteins were concentrated and suspended in borate-based (50 mM borate-NaOH buffer, pH 9) assay buffer, which included the additives DTT (1 mM) and EDTA (0.1 mM). The in vitro assays were performed as described by Kaltenegger et al. (2021). In short, 5–40 µg purified recombinant protein were incubated with putrescine and spermidine (400 µM each) in the presence of NAD (2 mM) in borate-based assay buffer to determine the enzyme’s ability to produce homospermidine. Product formation was quantified via derivatizing the reaction mixture with 9‑fluorenylmethyl chloroformate (FMOC, Sigma) and subsequent analyses by HPLC coupled with UV detection. To detect the enzymes’ ability to utilize the eIF5A, assays were hydrolyzed as described by Kaltenegger, et al. (2021) derivatized with FMOC and analyzed by HPLC coupled with FLD to quantify deoxyhypusine, 1,3-diaminopropane, and canavalmine.

Data from: Homospermidine synthase evolution and the origin(s) of pyrrolizidine alkaloids in Apocynaceae

Data files

Abstract

README: Data from: Homospermidine synthase evolution and the origin(s) of pyrrolizidine alkaloids in Apocynaceae

Description of the data and file structure

Files and variables

Code/software

Access information

Methods

Works referencing this dataset