Data from: A high-quality genome assembly of the tetraploid Teucrium chamaedrys unveils a recent whole genome duplication and a large biosynthetic gene cluster for diterpenoid metabolism
Data files
Jul 09, 2025 version files 3.34 GB
-
README.md
1.66 KB
-
Teucrium_chamaedrys_final_annotation.gff3
124.31 MB
-
Teucrium_chamaedrys_final_assembly.fasta
2.98 GB
-
Teucrium_chamaedrys_final_proteins.fasta
46.54 MB
-
Teucrium_chamaedrys_final_transcripts.fasta
137.02 MB
-
Teucrium_GC-MS.zip
58.74 MB
-
teucrium_sequences_for_tree.fasta
70.86 KB
Abstract
Teucrium chamaedrys, also called wall germander, is a small woody shrub native to the Mediterranean region. Its name is derived from the Greek words meaning ‘ground oak’, since its tiny leaves resemble those of an oak tree. Teucrium species are prolific producers of diterpene skeletons and compounds, which afford them valuable properties widely co-opted in traditional and Western medicines. Specifically, Teucrium is well known for making clerodane-type diterpenoids that are produced from the backbone kolavanyl diphosphate. In order to begin to elucidate some of the complex biosynthetic pathways of these medicinal compounds, we identified and functionally characterized several kolavanyl diphosphate synthases from T. chamaedrys. Along the way, we discovered the genome of this species to be one of the largest genomes published from the Lamiaceae family, to which it belongs. This tetraploid, 3 Gbp genome is especially rich in diterpene synthase genes, with 74 putative sequences identified. The vast majority of these diterpene synthase genes belong to four genomic loci, representative of the four copies of the genome. Comparative genomics show that this cluster is mirrored in the closely related species, T. marum. Along with the presence of several cytochrome p450 sequences, this region is one of the largest biosynthetic gene clusters identified to date. Its remarkable chemistry and model tetraploidy make T. chamaedrys an interesting model for studying genomic evolution and adaptation in plants.
Genome files:
Finalized files for the genome assembly and annotation of Teucrium chamaedrys. Assembly is derived from Nanopore and Illumina reads, assembled with Flye and corrected with BWA. The annotation was created using Maker and Braker.
- Teucrium_chamaedrys_final_assembly.fasta
- Teucrium_chamaedrys_final_annotation.gff3
- Teucrium_chamaedrys_final_proteins.fasta
- Teucrium_chamaedrys_final_transcripts.fasta
Phylogeny files:
FASTA file containing the amino acid sequence of each predicted diTPS in Teucrium chamaedrys, Teucrium marum, and Teucrium canadense. Sequences were predicted using BLAST from protein annotation files. This file represents figure 3A.
- teucrium_sequences_for_tree.fasta
GC-MS files:
Date of file name indicates date run on instrument. All include DXS + GGPPS even if not explicitly stated. All samples were extracted in hexane and ran on an Agilent 7890 A. These files represent figure 4.
- Teucrium_GC-MS.zip
- 20240717_AEB282_DXS-GGPPS.CDF
- 20240717_AEB284_ArTPS2.CDF
- 20240717_AEB288_ArTPS2-SsSS.CDF
- 20240717_AEB291_TchaTPS1.CDF
- 20240717_AEB292_TchaTPS2.CDF
- 20240717_AEB293_TchaTPS3.CDF
- 20240717_AEB294_TcanTPS1.CDF
- 20240717_AEB296_TchaTPS1-SsSS.CDF
- 20240717_AEB297_TchaTPS2-SsSS.CDF
- 20240717_AEB298_TchaTPS3-SsSS.CDF
- 20240717_AEB299_TcanTPS1-SsSS.CDF
Code/Software:
.CDF files can be viewed in the open source OpenChrom software.