Skip to main content
Dryad

Data from: The evolution of protein-coding gene structure in eukaryotes

Data files

Apr 02, 2024 version files 15.28 GB

Abstract

Introns are highly prevalent in most eukaryotic genomes. Despite the accumulating evidence for benefits conferred by the possession of introns, their specific roles and functions, as well as the processes shaping their evolution, are still only partially understood. Here we explore the evolution of the eukaryotic gene intron-exon structure by focusing on several key features such as the intron length, the number of introns, and the intron-to-exon ratio of protein-coding genes. We utilize whole genome data from 590 species covering the main eukaryotic taxonomic groups and analyze them within a statistical phylogenetic framework. We found that the basic gene structure differs markedly among the main eukaryotic phyla, with animals, and particularly chordates, displaying intron-rich genes, compared to plants and fungi. Reconstruction of gene structure evolution suggests that these differences had evolved prior to the divergence of the phyla, and have remained mostly conserved within groups. We revisit the previously reported association between the genome size and the mean intron length, and report that the correlation patterns differ considerably among phyla. Our findings suggest that the evolution of introns may be affected by different processes across the eukaryotic tree. The substantial diversity in gene structures may indicate that introns play different molecular and evolutionary roles in different organisms.