Organellar tRNAs in parasitic plant species
Data files
Aug 02, 2023 version files 26.77 KB
-
20230331_absencepresence_retarget_both.csv
8.67 KB
-
AA_codes.txt.xlsx
9.63 KB
-
gbk.zip
6.55 KB
-
README.md
1.92 KB
Oct 18, 2024 version files 26.79 KB
Abstract
Eukaryotic nuclear genomes often encode distinct sets of protein translation machinery for function in the cytosol vs. organelles (mitochondria and plastids). This phenomenon raises questions about why multiple translation systems are maintained even though they are capable of comparable functions, and whether they evolve differently depending on the compartment where they operate. These questions are particularly interesting in land plants because translation machinery, including aminoacyl-tRNA synthetases (aaRS), is often dual-targeted to both the plastids and mitochondria. These two organelles have quite different metabolisms, with much higher rates of translation in plastids to supply the abundant, rapid-turnover proteins required for photosynthesis. Previous studies have indicated that plant organellar aaRS evolve more slowly compared to mitochondrial aaRS in other eukaryotes that lack plastids. Thus, we investigated the evolution of nuclear-encoded organellar and cytosolic translation machinery across a broad sampling of angiosperms, including non-photosynthetic (heterotrophic) plant species with reduced rates of plastid gene expression to test the hypothesis that translational demands associated with photosynthesis constrain the evolution of bacterial-like enzymes involved in organellar tRNA metabolism. Remarkably, heterotrophic plants exhibited wholesale loss of many organelle-targeted aaRS and other enzymes, even though translation still occurs in their mitochondria and plastids. These losses were often accompanied by apparent retargeting of cytosolic enzymes and tRNAs to the organelles, sometimes preserving aaRS-tRNA charging relationships but other times creating surprising mismatches between cytosolic aaRS and mitochondrial tRNA substrates. Our findings indicate that the presence of a photosynthetic plastid drives the retention of specialized systems for organellar tRNA metabolism.
README: Organellar tRNAs in Parasitic Plant Species
Here are input genebank (.gbk) files for various mitochondrial and plastid genomes, and accompanying scripts for generating figures plotting presence/absence analysis of tRNAs in relation to aminoacy tRNA-synthetases.
Description of the data and file structure
- Zipped data directory gbk/ contains genebank annotation files which are input for plotting absence/presence of organellar tRNAs, aminoacyl tRNA-synthetases (Fig. 4)
- trna_parse2.pl is a perl script used for extracting tRNAs from gbk files, originally written by Dan Sloan.
- jupyter notebook 20240820_parsetRNA.ipynb was used to generate plot (Fig. 4)
- 20240820_absencepresence_retarget.csv is a metadata file containing a data matrix detailing for each aaRS whether the organellar enzyme is absent/present, and whether the cytosolic enzyme is retargeted/not retargeted/unknown. This data was used by 20240820_parsetRNA.ipynb to overlay tRNA data. More information about ortholog presence/absence data from https://doi.org/10.5061/dryad.0cfxpnw7p, targeting from https://doi.org/10.5061/dryad.6hdr7sr5x. Also see manuscript for more details.
- AA_codes.txt is a metadatafile containing the 1 letter and 3 letter codes for each amino acid, used by 20240820_parsetRNA.ipynb for axis labels.
Sharing/Access information
.gbk files were downloaded from NCBI. See manuscript for more details.
Code/Software
Requires Bioperl, python3, Jupyter notebooks
extract tRNAs from Genebank files (.gbk):
for file in *.gbk; do trna_parse2.pl $file > ${file%.gbk}_tRNA.tab; done
Then open 20240820_absencepresence_retarget.ipynb to plot. Ensure output .csv file from targeting analysis (i.e.20240820_absencepresence_retarget_both.csv) is included with info
python packages required include:
pandas
numpy
seaborn
matplotlib
string
re
os
Version Changes
October 2024: Use of updated metadata file (20240820_absencepresence_retarget.csv) reflecting reanalysis with updated gene models, see publication and associated datasets. Update to 20240820_parsetRNA.ipynb- plot plastid targeting predictions for Rafflesiaceae species. Previously these were excluded.
Methods
Input genebank (.gbk) files for various mitochondrial and plastid genomes, and accompanying scripts for presence/absence analysis of tRNAs.
Usage notes
Jupyter notebooks/Python3 is required to open s and run .ipynb notebooks. Bioperl is required to run perl script.