Skip to main content

The evolution of multi-gene families and metabolic pathways in the evening primroses (Oenothera: Onagraceae): a comparative transcriptomics approach


Kariñho-Betancourt, Eunice et al. (2022), The evolution of multi-gene families and metabolic pathways in the evening primroses (Oenothera: Onagraceae): a comparative transcriptomics approach, Dryad, Dataset,


The plant genus Oenothera has played an important role in the study of genome evolution and plant defense and reproduction. Here, we built on the 1kp transcriptomic dataset and developed a molecular resource of 63 transcriptomes and present a large-scale comparative study across 29 Oenothera species. We produced 2.3 million transcripts and 25.4 Mb of total length assembly per individual. We used this transcriptome resource to examine genome-wide evolutionary patterns and functional diversification by searching for orthologous genes and performed gene family evolution analysis. We found wide heterogeneity in gene family evolution across the genus, with section Oenothera exhibiting the most pronounced evolutionary changes. Overall, more significant expansions occurred than contractions. We also analyzed the molecular evolution of phenolic metabolism by retrieving proteins annotated for phenolic enzymatic complexes. We identified 1,568 phenolic genes arranged into 83 multigene families that varied widely across the genus. All taxa experienced rapid phenolic evolution involving 33 gene families, which exhibited large expansions, gaining about 2-fold more genes than they lost. Upstream enzymes phenylalanine ammonia-lyase (PAL) and 4-coumaroyl: CoA ligase (4CL) accounted for most of the significant expansions and contractions. Our results suggest that adaptive responses to environmental stress coupled with non-adaptive evolutionary forces have contributed to Oenothera diversification and rapid gene family evolution.


Total RNA was isolated from leaf tissue of 63 individuals from Oenothera spp. following the CTAB/Acid Phenol/Silica Membrane method. Sixty-three TrueSeq libraries were prepared using total mRNA . Libraries were sequenced under a pair-end (50 samples) mode using Illumina HiSeq and under a single-end mode (13 samples) using GAIIx. RNA-seq reads (SRA project number: PRJEB4922) were trimmed and filtered for quality using fastp v.0.20.0. The “--cut_tail” flag was used to truncate reads if a 4 bp window fell below an average PHRED quality score of 20. Quality-controlled RNA-seq reads were then assembled into transcripts using Trinity v2.11.0 using default settings except for CPU and memory allocation.


National Council of Science and Technology (CONACyT)

Max Planck Society

Natural Sciences and Engineering Research Council