Dataset from: The origin and fate of fungal mitochondrial horizontal gene transferred sequences in orchids (Orchidaceae)
Data files
Jun 12, 2023 version files 996.67 KB
-
Folder_1_README.txt
568 B
-
Folder_1.zip
713 KB
-
Folder_2_README.txt
323 B
-
Folder_2.zip
12.26 KB
-
Folder_3_README.txt
406 B
-
Folder_3.zip
219.85 KB
-
Folder_4_README.txt
1.86 KB
-
Folder_4.zip
33.45 KB
-
ncbi_IDs_alldata.tsv
12.68 KB
-
README.md
2.26 KB
Abstract
The transfer of DNA among distantly related organisms is relatively common in bacteria but less prevalent in eukaryotes. Among fungi and plants, few of these events have been reported. Two segments of fungal mitochondrial DNA have been discovered in the mitogenome of orchids. Here, we build on their work to understand the timing of those transfer events, which orchids retain the fungal DNA, and the fate of the foreign DNA during orchid evolution. We update the content of the large DNA fragment and establish that it was transferred to the most recent common ancestor of a highly diverse clade of epidendroid orchids that lived ~28–43 Mya. Also, we present hypotheses of the origin of the small transferred fragment. Our findings deepen the knowledge of these interesting DNA transfers among organelles and we formulate a probable mechanism for these horizontal gene transfer events.
We sampled 61 taxa from across Orchidaceae of which two were taken from available data on GenBank (mitochondrial sequences of Erycina pusilla: KJ501971, KJ501972, KJ501991, KJ501994, and Gastrodia elata: MF070088 and MF070090). The HGT regions of Apostasia shenzhenica and Triphora wagneri were taken from Sinn & Barrett (2020).
DNA extractions, sequencing, and data assembly were performed as in Valencia-D. et al. (2021). In short, total DNA was extracted following a modified Doyle and Doyle (1987) procedure (Neubig et al., 2014). Sequencing of total DNA without enrichment was performed using Illumina HiSeq or NovaSeq technologies, generating paired-end read sequences of 100, 150, or 250 bp length. The data was processed using Geneious Prime 2020.3 (Kearse et al., 2012).
To capture the HGT region, the sequences found in other orchids by Sinn and Barrett (2020) we used. Geneious mapper and Bowtie2 (Langmead & Salzberg, 2012) were used with low sensitive parameters to capture reads that were at least 75% similar to the references (Geneious mapper with maximum mismatch per read 25%, gaps of 5% as a maximum per read and maximum gap size of 5 nt, other parameters were left at the default value; Bowtie mapper settings: alignment type: local, Lowest Sensitivity/Very Fast). Then, the reads captured in the reference assemblies were de novo assembled (mapper: Geneious, maximum mismatch per read 5%, gaps of 5% as a maximum per read and maximum gap size of 2 nt). To account for possible spurious scaffolds (i.e., non-mitochondrial or low-quality), contigs with depth below 5× or over 200× were excluded as well as those with pairwise identity below 95% and those shorter than 500 nt in length. All contigs were inspected to identify possible erroneous assemblies of the reads and to ensure the contiguity. The consensus sequences of these contigs were annotated to confirm the presence of the FMTs.
Eleven native mitochondrial genes were recovered from orchids, i.e., atp6, ccmFC and FN, cob, mttB, nad2 and 4, rps1, 2, 3, and 4, including the introns.
We sampled fungal mitogenomes from 32 species in the subphylum Ustilaginomycotina. The mitogenomes of ten species were downloaded from GenBank and additional mitochondrial sequences for twenty-two taxa were obtained from the whole-genome shotgun contigs repository. Annotations of the sequences were made through the MITOS web server (Bernt et al., 2013) and then inspected in Geneious using Ustilago maydis (DQ157700) as a reference.
Three main datasets were generated with the following data: (1) native orchid mitochondrial data (eleven genes), (2) small-FMT data from orchids and the putative xenologous tRNAs from fungi, and (3) large-FMT data from orchids and the putative xenologues from the fungi (and excluding the endonuclease gene). Alignments were performed in Geneious Prime. The datasets were analyzed as follows:
(1) native orchid mitochondrial data: genes with introns were aligned individually with MAFFT (Katoh & Standley, 2013), and genes without introns were aligned by translation. Then the genes were concatenated in one matrix. Phylogenetic trees were made using Maximum likelihood and Bayesian Inference methods.
(2) small-FMT data: the whole region between the tRNA(Thr) to tRNA(Lys) genes was included for the orchid samples. The fungal tRNAs were annotated in each genome and extracted. Orchid and fungal samples that did not have the three tRNA genes were excluded from the analyses. Alignments with and without Dendrobium and Triphora were performed with the MAFFT plugin on Geneious. Phylogenetic trees were made using Maximum likelihood methods.
(3) large-FMT data: orchids and fungal sequences were aligned separately and then, the data were combined and realigned. Orchid data were aligned using MAFFT. Fungal genes were aligned gene by gene with MAFFT (for tRNA genes) or by translation (for protein-coding genes). For the combined analyses, fungal locus alignments were concatenated in the position and direction in which they appear on the large-FMT. Alignments with and without fungal data were made, and for the latter, with and without Triphora. Phylogenetic trees were made using Maximum likelihood methods. The dataset of fungal and orchids data including Triphora was also analyzed using Bayesian Inference methods.
The adjacent regions to the large-FMT were also inspected. Alignments of homologous regions on the 5 prime and the 3 prime flanks are included in these files.
Maximum Likelihood analyses of the datasets were performed with IQTree2 (Nguyen et al., 2015) using ModelFinder (Kalyaanamoorthy et al., 2017) to identify the best model in each case. Branch supports were calculated using 100 nonparametric bootstrap replicates.
We evaluated the divergence times for the native mitochondrial genes and the large-FMT plus fungal xenologous genes matrices using BEAST 2.6.3. (Bouckaert et al., 2019) implemented on CIPRES Science Gateway (Miller, Pfeiffer, & Schwartz, 2010). We used the substitution rates obtained for each dataset selected as most appropriate in ModelFinder according to the Bayesian Information Criterion (Table 4). An uncorrelated relaxed clock model was used with a log-normal distribution (Drummond et al., 2006). The tree was produced using a birth-death model (Gernhard, 2008) under a topology with constrained root or ingroup. The native mitochondrial genes analysis was rooted with Apostasioideae. For the analysis of the third dataset (large-FMT plus the corresponding putative xenologues from the fungi combined), Malassezia was designated as the root, following Kijpornyongpan et al. (2018).
A log-normal distribution was applied to set the priors. For the native mt-orchid analysis the calibration date used was the age of the branching point of the subfamily Apostasioideae with the rest of the orchids, estimated by Givnish et al. (2015) at 83.6 Mya. To account for the uncertainty in the estimation the standard deviation (s) was set at 0.07 to obtain a 95% confidence interval of 74.3–93.6 Mya. We performed four replicate BEAST analyses with 10 million MCMC iterations and sampled once every 1000 generations. For the third dataset, a calibration point was set at the root of all Ustilaginomycotina with a mean of 270 Mya (s=0.05), following the estimations of Kijpornyongpan et al. (2018). Three replicates were run in BEAST with 200 million generations and sampled every 5000 generations. The log files of the MCMC runs were examined using Tracer 1.7.1 (Rambaut et al., 2018) and the burn-in trees were removed where they reached stationarity in the resultant likelihood of trees observed. The log files and the resulting trees from the runs were combined using LogCombiner v2.6.3 (Drummond & Rambaut, 2007). Maximum clade credibility trees were calculated using TreeAnnotator v2.6.3 (Drummond & Rambaut, 2007) and visualized in FigTree v1.4.4 (https://github.com/rambaut/figtree/releases, retrieved on Dec 6, 2019).