Massive RNA editing in ascetosporean mitochondria
Data files
Apr 16, 2025 version files 158.33 KB
-
ADAR-ADAT_allEuk_4_FC901-SRM_strict.trimal.fst
25.29 KB
-
dtst1_for_18S.fasta
132.31 KB
-
README.md
735 B
Abstract
Adenosine-to-inosine(A-to-I) substitution type of RNA editing is mediated by adenosine deaminase acting on RNA (ADAR) and is involved in various essential cellular functions. As ADAR functions in the metazoan nucleus, it may have evolved from adenosine deaminase acting on tRNA in the metazoan ancestor. Although ADAR has not been detected in fungi and early-branching opisthokonts, it has not been explored in other eukaryotic lineages. Here, we detected and analyzed ADAR from two novel rhizarian amoebae and determined that other protists also possess ADAR. This finding indicates that ADAR may have originated from the last common ancestor of eukaryotes (LECA). Furthermore, the ADAR of the rhizarian amoeba is localized in its mitochondria and may be involved in the massive RNA editing in the mitochondria. Such complex RNA editing in the rhizarian amoeba may have a function to mask some lethal mutations in the mitochondrial genomes, which possibly contributes to the acceleration of diversification instead of the increase of future extinction risk.
Dataset DOI: https://doi.org/10.5061/dryad.mcvdnck4z
Description of the data and file structure
The dataset for molecular phylogenetic analysis used in the paper “Massive RNA editing in ascetosporean mitochondria.”
Files and variables
File: dtst1_for_18S
Description: The 18S rRNA gene sequences of Paradinida sp. FC901 and SRM-001
File: ADAR-ADAT_allEuk_4_FC901-SRM_strict.trimal.fst
Description: Main dataset for molecular phylogenetic analysis of ADAR-ADAT
Code/software
It can be viewed with a text editor.
Access information
Other publicly accessible locations of the data:
- NCBI
Sequencing analyses
Approximately 200 ml of mid-exponential phase cell cultures of Paradinida spp. FC901 and SRM-001 were centrifuged at 2,400 × g for 5 min. The cell pellets were frozen and sent to the sequencing company (Azenta, Tokyo, Japan), and the library reconstruction and sequencing analyses were conducted using the default setting. The details of the analyses and the sequence outputs are summarized in Table S4.
The raw fastq data of DNA-seq was divided into 100 subsets using SeqKit, and three subsets of each species were subjected to the contig assembly using SPAdes 3.13 with default settings. From each assembly data, a single possible mitochondrial genomic fragment was detected by BLASTN using the mitochondrial genome sequence of Ophirina amphinema (GenBank accession number: LC369600.1) as the query sequence. The detected sequences were identical among the three subset analyses of each species, while the starting position of each sequence differed. By comparing these sequences, a circular mitochondrial genome of FC901 and SRM-001 was reconstructed. The same assembly analyses were also conducted using the RNA-seq data. The obtained mitochondrial sequences, which were reconstructed from RNA-seq data, were subjected to annotation using MFannot (https://megasun.bch.umontreal.ca/apps/mfannot/) and compared with those assembled from DNA-seq data using Mesquite 3.10.
For analysing the transcriptome data, three RNA-seq datasets of Paradinida sp. FC901 was combined into a single dataset. The fastq data of each species was subjected to contig assembly using SPAdes 3.13 with the ‘--rna’ option. From the reconstructed contigs, their ADAR and ADAT sequences were searched by TBLASTN using the ADAR sequence of Symbiodinium microadriaticum (OLQ07757; E-value cut-off was set to 10−10). We also searched publicly available sequencing data (Table S3) for ADAR and ADAT sequences of other protists using the same approach. The detected sequences (e.g., ADAR and ADAT of Phaeodactylum tricornutum) were also used as the query in further searches for identifying more ADARs and ADATs. The obtained sequences were aligned with the metazoan ADARs and ADATs and then subjected to automated alignment using MAFFT v7.471 with the ‘L-INS-’ option. The aligned sequences were masked for the phylogenetic analysis using trimAl v1.4 with the ‘strict’ option. This initial dataset contained all the detected sequences, including the partial short and highly divergent sequences, and only 94 positions were included in the phylogenetic analysis. The tree topology and branch lengths were inferred using the maximum-likelihood (ML) methods using IQ-TREE 2.2.0 with the LG+F+I+G4 model. The robustness of the ML phylogenetic tree was evaluated using a non-parametric ML bootstrap analysis with the LG+F+I+G4 model (100 replicates). We also conducted Bayesian phylogenetic analysis with the CAT + GTR model using PhyloBayes MPI v. 1.8a. The analysis included two Markov chain Monte Carlo runs of 100,000 cycles with a ‘burn-in’ of 25,000 cycles. The consensus tree with branch lengths and Bayesian posterior probabilities was calculated from the remaining trees. Based on these findings, we revised the main dataset, excluding 14 partial and divergent ADAR sequences from an initial alignment. The main dataset was prepared using the same method used for the initial dataset and comprised 209 positions. The same methods were used to infer the phylogenetic tree and statistical support. Of the newly detected ADAR sequences, 12 sequences were retained in the main dataset, as well as the ADAR of Paradinida sp. FC901 were subjected to motif identification by HMMER 3.3 (http://hmmer.org) against the Pfam database.
The 18S rRNA gene sequences of Paradinida sp. FC901 and SRM-001 were determined using the DNA that was extracted with Qiagen DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) from 20 ml of culture. The primers used were Euk1A and EukB. The sequences were added to the alignment that was created based on the method proposed by Ward et al. and aligned using MAFFT v7.471 with the default settings. The ML tree with the non-parametric bootstrap analyses of 1,000 replicates and the Bayesian tree were reconstructed using the same methods described earlier.