Multiple alignment data for diatom plastomes and mitogenomes
Data files
Jun 30, 2025 version files 9.01 MB
-
CP125Align.fasta
7.75 MB
-
MT75Align.fasta
1.26 MB
-
README.md
1.36 KB
Abstract
Diatoms are pivotal in global oxygen, carbon dioxide, and silica cycling, contributing significantly to photosynthesis and serving as fundamental components in aquatic ecosystems. Recent advancements in genomic sequencing have shed light on their evolutionary dynamics, revealing evolutionary complex genomes influenced by symbiotic relationships and horizontal gene transfer events. By analyzing 120 plastome and 70 mitogenome publicly available sequences, this paper aims to elucidate the evolutionary dynamics of diatoms across diverse lineages. In comparing genomic events between plastomes and mitogenomes, gene losses and pseudogenes were more frequently observed in plastomes, while they were less commonly found in mitogenomes. Overall, gene losses were abundant in the plastomes of Astrosyne radiata, Toxarium undulatum, and Proboscia sp. Frequently lost and pseudogenized genes were acpP, ilv, serC, tsf, tyrC, ycf42 and bas1. In mitogenomes, mttB, secY and tatA genes were lost repeatedly across several diatom taxa. Analysis of nucleotide substitution rates indicated that, in general, mitogenomes were evolving at a more rapid rate compared to plastomes. This is contrary to what was observed in synteny analyses where plastomes exhibited greater structural rearrangements compared to mitogenomes with the exception of the genera Coscinodiscus and one group of species within Thalassiosira.
https://doi.org/10.5061/dryad.76hdr7t5j
Description of the data and file structure
A set of 78 chloroplast and 25 mitochondrial protein-coding genes, common to 120 plastomes and 70 mitogenomes, along with five outgroups, were identified and extracted using Geneious Prime v2023.2.1. Each gene underwent individual alignment using the MAFFT software plug-in in Geneious Prime. The alignment was refined using trimAl to ensure nucleotide bases corresponding to amino acids were aligned and removing large gaps with ambiguous regions that had low nucleotide similarities within the alignments. The aligned and trimmed genes were then concatenated, resulting in a final alignment length of 61,931 bp for chloroplasts and 16,826 bp for mitochondria.
Files and variables
File: MT75Align.fasta
Description: 25 genes, 70 taxa, 5 outgroups (alignment length: 16,826bp)
File: CP125Align.fasta
Description: 78 genes, 120 taxa, 5 outgroups (alignment length: 61,931bp)
Code/software
Geneious Prime v2023.2.1 (https://www.geneious.com)
Access information
Other publicly accessible locations of the data:
- N/A
Data was derived from the following sources:
- NCBI Genbank
- GeSeq in Chlorobox
