Data from: where the minor things are: a pan-eukaryotic survey suggests neutral processes may dominate minor spliceosomal intron evolution
Data files
Sep 23, 2023 version files 15.59 GB
-
README.md
6.55 KB
-
WtMTA.db.gz
15.59 GB
Abstract
Spliceosomal introns are gene segments removed ("spliced") from RNA transcripts by large ribonucleoprotein machineries called spliceosomes. In some eukaryotes a second spliceosome (the minor/ U12-type) is responsible for processing a tiny minority of introns. Despite its seemingly modest role, minor splicing has persisted for roughly 1.5 billion years of eukaryotic evolution. Identifying and cataloging minor introns in > 3000 eukaryotic genomes, we report diverse evolutionary histories including surprisingly high numbers of minor introns in some fungi and green algae, repeated massive loss, as well as several general biases in the positional and genic distributions of minor introns. We estimate that ancestral minor intron densities were comparable to those of the most minor intron-rich species, suggesting a trend of long-term stasis. Finally, three findings suggest a major role for neutral processes in minor intron evolution. First, we find highly similar patterns of minor and major intron evolution, in contrast to the predictions of both functionalist and deleterious models. Second, we find that observed functional biases among minor intron-containing genes are largely explained by these genes' greater ages. Third, we find no association of intron splicing with cell proliferation in a minor intron-rich fungus, suggesting that regulatory roles are lineage-specific and thus cannot offer a general explanation for minor splicing's persistence. These data constitute the most comprehensive view to date of modern minor introns, their evolutionary history, and the forces shaping minor splicing, and provide a foundation for future studies of these remarkable genomic elements.
This data is a combination of publicly-available annotated genome data from resources such as NCBI, Ensembl and JGI, as well as novel information related to the classification of intron sequences as either major (U2)- or minor (U12)-type. The classification data was obtained using intronIC (https://github.com/glarue/intronIC). The archive data is formatted as a single SQLite database file, and includes metadata about introns, transcripts (mRNAs) and genomes from minor-intron-containing species.
The database was formatted as a single SQLite database using the open-source tool Datasette (https://datasette.io/, https://github.com/simonw/datasette). Any tool that can read SQLite data should be able to interface with the database. The simplest method, however, is to use Datasette itself to explore the data locally (an online version of the same data is available at https://www.introns.info), which requires the installation of Datasette (see README file for overview).