Skip to main content

Transcriptome Analysis Reveals Extensive Alternative Splicing-Coupled Nonsense-Mediated mRNA Decay in a Human Cell Line

Cite this dataset

French, Courtney; Wei, Gang; Brenner, Steven (2015). Transcriptome Analysis Reveals Extensive Alternative Splicing-Coupled Nonsense-Mediated mRNA Decay in a Human Cell Line [Dataset]. Dryad.


To further explore the regulatory potential of nonsense-mediated mRNA decay (NMD) in human cells, we globally surveyed the transcripts targeted by this pathway via RNA-Seq analysis of HeLa cells in which NMD had been inhibited. We first identified those transcripts with both a premature termination codon more than 50 nucleotides upstream of an exon-exon junction (50nt rule) and a significant increase in abundance upon NMD inhibition. Remarkably, at least 2,793 transcripts derived from 2,116 genes are physiological NMD targets (9.2% of expressed transcripts and >20% of alternatively spliced genes). Our analysis identifies previously inferred unproductive isoforms and numerous previously uncharacterized ones. NMD-targeted transcripts were derived from genes involved in many functional categories, and are particularly enriched for RNA splicing genes and ultraconserved elements. By investigating the features of all transcripts impacted by NMD, we find that the 50nt rule is a strong predictor of NMD degradation while 3’ UTR length generally has only a small effect in human cells. Additionally, thousands more transcripts without a premature termination codon upstream of an exon-exon junction in the main coding sequence contain a uORF and display significantly increased abundance upon NMD inhibition indicating potentially widespread regulation through decay coupled with uORF translation. Our results support the hypothesis that alternative splicing coupled with NMD is a prevalent post-transcriptional mechanism in human cells with broad potential for biological regulation.


HeLa cells were inoculated on plates in Dulbecco’s MEM medium with 0.1 mM non-essential amino acids and 10% fetal bovine serum, incubated at 37°C and 5% CO2. Plates at 80% cell confluency were transfected with plasmids pSUPERpuro-hUpf1/II and pSUPERpuro-Scramble, whose functions were to knockdown UPF1 and act as a mock control, respectively. Transfections were performed using Lipofectamine™ LTX and PLUS™ Reagents (Invitrogen) according to the manufacturer’s protocol, and the following culture according to the published method (Paillusson et al., 2005). The whole cell lysates were prepared in 1% sodium dodecyl sulfate and incubated at 100 °C for 5 min, then centrifuged at 12,000 rpm for 10 min. Total RNA was extracted using the QIAGEN RNeasy® Mini kit according to the manufacturer’s manual. Directional and paired-end RNA-seq libraries were constructed according to the protocol published on the Illumina website (, with a few changes: The adapters were prepared according to the reported methods (Vigneault et al., 2008). The PCR process to prepare the library was divided in two steps. In the first step, 3 cycles of PCR were performed (according to the protocol) to prepare the library template, and then the library was run on a 2% agarose gel, fragments of desired size were cut out and isolated by QIAquick® Gel Extraction Kit. A second round of PCR (12 cycles) was performed to enrich the library and then it was purified twice with Agencourt AMPure XP kit as suggested by the Illumina protocol. The libraries were then assayed by Agilent 2100 BioAnalyzer. These RNA-Seq libraries were prepared from cells with inhibited NMD and control cells for two biological replicates. One biological replicate was sequenced on an Illumina GAIIx machine and the other on HiSeq 2000.
Paired end reads for each library were aligned to the NCBI human RefSeq transcriptome (Pruitt et al., 2009) with Bowtie (Langmead et al., 2009) to determine the average insert size and standard deviation, required as a parameter by TopHat (Trapnell et al., 2009). The reads of each library were then aligned to the human genome (hg19 assembly, Feb. 2009; downloaded from UCSC genome browser (Fujita et al., 2011)) using TopHat v1.2.0 with default parameters plus the following: --coverage search, --allow indels, --microexon search, and --butterfly search. Cufflinks 1.0.1 (Roberts et al., 2011; Trapnell et al., 2010) was used to assemble each set of aligned reads into transcripts with the UCSC known transcript set (Fujita et al., 2011) as the reference guide, along with the following parameters: --frag-bias-correct, and --multi-read-correct. Cuffcompare (a sub-tool of Cufflinks) was used to merge the resulting sets of assembled transcripts. Each junction was assigned a Shannon entropy score based on offset of spliced reads across all four libraries. Transcripts with a junction that had an entropy score <1 and was not present in the reference annotation were filtered out. Cuffdiff (a sub-tool of Cufflinks) was used to quantify and compare transcript abundance (measured by FPKM, Fragments Per Kilobase per Million reads) between the UPF1 knockdown and control samples. For each sample, the reads from two biological replicates were provided. The following parameters were used: --frag-bias-correct and --multi-read-correct. Only transcripts with FPKM>1 in either the control or UPF1 knockdown sample were used for further analysis. A transcript was called significantly more abundant in the UPF1 knockdown sample if Cuffdiff called it significantly changing and the fold change was greater than 1.5x. Significantly decreased transcript abundances were determined in the same way. For each transcript, the coding sequence (CDS) was determined as described in the Supplementary Methods. A coding sequence was defined to terminate in a premature stop codon (PTC50nt) if it stops at least 50 nucleotides upstream of the last exon-exon junction (50nt rule in mammals (Nagy and Maquat, 1998)). NMD targets were defined as those transcripts with both a PTC50nt and significantly increased expression abundance in NMD inhibited (UPF1 knockdown) cells. The transcripts must also increase in each biological replicate when analyzed independently and come from a gene with a non-PTC50nt-containing isoform with FPKM>0. To obtain a more reliable list of NMD-targeted transcripts, only those transcripts that adhered to either of the following criteria were kept: 1) No non-PTC50nt-containing isoform from the gene was more than 1.2-fold higher in the NMD inhibited sample, or 2) the PTC50nt-containing isoform increased at least 2x more than the sum of all non-PTC50nt-containing isoform FPKMs from the gene in NMD inhibited cells.