Data from: Exploring the selective constraint on the sizes of insertions and deletions in 5’ untranslated regions in mammals
Data files
May 31, 2011 version files 8.51 MB
-
alignment_human_mouse.rar
Abstract
Background: Small insertions and deletions ("indels" with size ≦ 100 bp) whose lengths are not multiples of three (non-3n) are strongly constrained and depleted in protein-coding sequences. Such a constraint has never been reported in noncoding genomic regions. In 5'untranslated regions (5'UTRs) in mammalian genomes, upstream start codons (uAUGs) and upstream open reading frames (uORFs) can regulate protein translation. The presence of non-3n indels in uORFs can potentially disrupt the functions of these regulatory elements. We thus hypothesize that natural selection disfavors non-3n indels in 5'UTRs when these regulatory elements are present. Results: We design the Indel Selection Index to measure the selective constraint on non-3n indels in 5'UTRs. The index controls for the genomic compositions of the analyzed 5'UTRs and measures the probability of non-3n indel depletion downstream of uAUGs. By comparing the experimentally supported transcripts of human-mouse orthologous genes, we demonstrate that non-3n indels downstream of two types of uAUGs (alternative translation initiation sites and the uAUGs of coding sequence-overlapping uORFs) are underrepresented. The results hold well regardless of differences in alignment tool, gene structures between human and mouse, or the criteria in selecting alternatively spliced isoforms used for the analysis. Conclusions: To our knowledge, this is the first study to demonstrate selective constraints on non-3n indels in 5'UTRs. Such constraints may be associated with the regulatory functions of uAUGs/uORFs in translational regulation or the generation of protein isoforms. Our study thus brings a new perspective to the evolution of 5'UTRs in mammals.