Transcriptome Analysis Reveals Extensive Alternative Splicing-Coupled Nonsense-Mediated mRNA Decay in a Human Cell Line
Data files
Sep 25, 2015 version files 240.59 MB
Abstract
To further explore the regulatory potential of nonsense-mediated mRNA decay (NMD) in human cells, we globally surveyed the transcripts targeted by this pathway via RNA-Seq analysis of HeLa cells in which NMD had been inhibited. We first identified those transcripts with both a premature termination codon more than 50 nucleotides upstream of an exon-exon junction (50nt rule) and a significant increase in abundance upon NMD inhibition. Remarkably, at least 2,793 transcripts derived from 2,116 genes are physiological NMD targets (9.2% of expressed transcripts and >20% of alternatively spliced genes). Our analysis identifies previously inferred unproductive isoforms and numerous previously uncharacterized ones. NMD-targeted transcripts were derived from genes involved in many functional categories, and are particularly enriched for RNA splicing genes and ultraconserved elements. By investigating the features of all transcripts impacted by NMD, we find that the 50nt rule is a strong predictor of NMD degradation while 3’ UTR length generally has only a small effect in human cells. Additionally, thousands more transcripts without a premature termination codon upstream of an exon-exon junction in the main coding sequence contain a uORF and display significantly increased abundance upon NMD inhibition indicating potentially widespread regulation through decay coupled with uORF translation. Our results support the hypothesis that alternative splicing coupled with NMD is a prevalent post-transcriptional mechanism in human cells with broad potential for biological regulation.
Paired end reads for each library were aligned to the NCBI human RefSeq transcriptome (Pruitt et al., 2009) with Bowtie (Langmead et al., 2009) to determine the average insert size and standard deviation, required as a parameter by TopHat (Trapnell et al., 2009). The reads of each library were then aligned to the human genome (hg19 assembly, Feb. 2009; downloaded from UCSC genome browser (Fujita et al., 2011)) using TopHat v1.2.0 with default parameters plus the following: --coverage search, --allow indels, --microexon search, and --butterfly search. Cufflinks 1.0.1 (Roberts et al., 2011; Trapnell et al., 2010) was used to assemble each set of aligned reads into transcripts with the UCSC known transcript set (Fujita et al., 2011) as the reference guide, along with the following parameters: --frag-bias-correct, and --multi-read-correct. Cuffcompare (a sub-tool of Cufflinks) was used to merge the resulting sets of assembled transcripts. Each junction was assigned a Shannon entropy score based on offset of spliced reads across all four libraries. Transcripts with a junction that had an entropy score <1 and was not present in the reference annotation were filtered out. Cuffdiff (a sub-tool of Cufflinks) was used to quantify and compare transcript abundance (measured by FPKM, Fragments Per Kilobase per Million reads) between the UPF1 knockdown and control samples. For each sample, the reads from two biological replicates were provided. The following parameters were used: --frag-bias-correct and --multi-read-correct. Only transcripts with FPKM>1 in either the control or UPF1 knockdown sample were used for further analysis. A transcript was called significantly more abundant in the UPF1 knockdown sample if Cuffdiff called it significantly changing and the fold change was greater than 1.5x. Significantly decreased transcript abundances were determined in the same way. For each transcript, the coding sequence (CDS) was determined as described in the Supplementary Methods. A coding sequence was defined to terminate in a premature stop codon (PTC50nt) if it stops at least 50 nucleotides upstream of the last exon-exon junction (50nt rule in mammals (Nagy and Maquat, 1998)). NMD targets were defined as those transcripts with both a PTC50nt and significantly increased expression abundance in NMD inhibited (UPF1 knockdown) cells. The transcripts must also increase in each biological replicate when analyzed independently and come from a gene with a non-PTC50nt-containing isoform with FPKM>0. To obtain a more reliable list of NMD-targeted transcripts, only those transcripts that adhered to either of the following criteria were kept: 1) No non-PTC50nt-containing isoform from the gene was more than 1.2-fold higher in the NMD inhibited sample, or 2) the PTC50nt-containing isoform increased at least 2x more than the sum of all non-PTC50nt-containing isoform FPKMs from the gene in NMD inhibited cells.