--- title: "Supplement for: Molecular mechanisms of postmating prezygotic reproductive isolation uncovered by transcriptome analysis" author: "James B. Pease, Rafael F. Guerrero, Natasha A. Sherman, Matthew W. Hahn, Leonie C. Moyle" output: pdf_document: toc: true fig_caption: yes header-includes: - \usepackage[labelfont=bf,labelsep=colon]{caption} - \usepackage[section]{placeins} --- \renewcommand{\thefigure}{S\arabic{figure}} \renewcommand{\thetable}{S\arabic{table}} \captionsetup{width=0.75\linewidth} --- ```{r external, include=FALSE, cache=FALSE} knitr::read_chunk('solDX_source.R') ``` ```{r load_packages, include=FALSE, cache=FALSE} ``` ```{r global_opts, include=FALSE} knitr::opts_chunk$set(tidy=TRUE, cache=FALSE, echo=FALSE, warning=FALSE, message=FALSE, comment="" ) roundDigits <- 2 #Digits to keep in prettyNum ``` ```{r define_constants, include=FALSE} ``` ```{r functions, include=FALSE} ``` ```{r read_raw_data, include=FALSE} ``` ```{r filter_low_expr} ``` ```{r lm1_prep} ``` ```{r proc_lm1} ``` \pagebreak #Differential expression analysis The goal is to produce a list of candidates for reproductive incompatibilities between *Solanum lycopersicum* and *S. pennellii*, based on the differential expression of genes in reproductive tissues. We analyze reads from `r length(y)` RNA-seq libraries, mapped to the *S. lycopersicum* genome (SL2.50). We remove `r dim(y)[1]-sum(keep)` transcripts with very low expression (`r prettyNum( 100*(dim(y)[1]-sum(keep))/dim(y)[1], digits=roundDigits)`% of all transcripts present in the samples). Included in the analysis are genes with minimum `r MIN_CPM` count per million in at least `r MIN_LIBS` libraries. The genome-wide patterns can be described by the pairwise correlations between mean expression levels. \footnotesize ```{r spearman, cache=FALSE} ``` \normalsize We analyze the data using the `limma` package, carrying out a `voom` transformation and empirical Bayes (`eBayes`) correction. The linear model includes the UI factor and controls for individual effects. Here is the main code used. ```{r main_lm1, echo=TRUE, tidy=FALSE} ``` The model allows us to compare non-UI vs. UI-competent tissues:[*lyc*^pen^, *lyc*^self^, *lyc*^S-^, *lyc*^S+^] versus [*pen*^S-^, *pen*^S+^, *pen*^self^, *pen*^lyc^], while controlling for expression in *pen*^S-5^ -- immature styles in the UI-competent species. We consider genes that fulfill the following criteria: * Difference between UI and non-UI tissues is significant at p-value < `r prettyNum(Pval_lmer, format='e', digits=roundDigits)`, and absolute fold change > log2(`r LOGFC_lmer`). * Expression in *pen*^S-5^ is significantly down with respect to the most up-regulated tissue (at FDR = 1%, and fold-change > 2). Specifically, + If UI > non-UI, then UI > *pen*^S-5^. + If non-UI > UI, then non-UI > *pen*^S-5^. * Mean expression in pollen < `r MAX_POLLEN_CPM`cpm, and ratio of mean pollen to style expression < 1. * Expression level > `r CPM_THRES`cpm in at least one sample. We find a total of `r dim(lm1.screen )[1]` candidates with these criteria. Of those, `r sum(lm1.screen$logFC>1)` are up-regulated, while `r sum(lm1.screen$logFC<1)` are down-regulated in UI tissues. \pagebreak \small ```{r lm1_toptable} lm1_out <- lm1.screen %>% transmute(gene, P.Value = p.UI, logFC, Annotation=unname(annot$annShort[match(gene, annot$gene)]) )%>% arrange(desc(logFC)) lm1_out$P.Value <-prettyNum(lm1_out$P.Value, format='e', digits=roundDigits) lm1_out$P.Value[lm1_out$P.Value=="0"] <- "<2.0e-16" lm1_out_up <- lm1_out %>%filter(row_number() < 51) lm1_out_down <- lm1_out %>%filter(row_number() > n()-50)%>%arrange(logFC) kable(lm1_out_up, align='l', digits=roundDigits, caption="Top-50 upregulated in UI-competent tissues") ``` \pagebreak ```{r lm1_downtable} kable(lm1_out_down, align='l', digits=roundDigits, caption="Top-50 downregulated in UI-competent tissues") ``` \normalsize \pagebreak ```{r lm1_topPlot, fig.height= 8, fig.width=7.5, fig.cap="Expression in the Top-20 UI upregulated genes by tissue. Red squares: S. lycopersicum, green circles: S. pennellii"} topCheckAll(log2(cpm(y0)+1), match(lm1_out_up$gene,row.names(y0$counts)), s, 20, FALSE) ``` \pagebreak ```{r lm1_bottomPlot, fig.height= 8, fig.width=7.5, fig.cap="Expression in the Top-20 UI downregulated genes by tissue. Red squares: S. lycopersicum, green circles: S. pennellii"} topCheckAll(log2(cpm(y0)+1), match(lm1_out_down$gene,row.names(y0$counts)), s, 20, FALSE) ``` ```{r model_lmUS} ``` \pagebreak #Unpollinated styles The full model includes the comparison between groups that include both unpollinated and pollinated styles, which could lead to some undesired background noise. To verify our results, we carry out an analysis including only unpollinated styles (*i.e.*, [*pen*^S-^, *pen*^S+^] versus [*lyc*^S-^, *lyc*^S+^]). Using the same filtering criteria as above, this second model yields a total of `r dim(lmUS.screen )[1]` significant genes. Of those, `r dim(lmUS.newgenes)[1]` are not present in our original list of candidates, and there is a good correlation between rank in the full model and rank when comparing unpollinated styles only (Figure S3). ```{r UP_plot, fig.cap="Correlation in rank order of genes between full model and one comparing only UI-nonUI in unpollinated styles."} lmUS_plot ``` \pagebreak \normalsize #UI cross-activated genes We also search for genes that are activated by the UI interaction itself by comparing UI-competent unpollinated and pollinated styles (*i.e.*, *pen*^lyc^ vs. [*pen*^self^, *pen*^S-^, *pen*^S+^]). To avoid having highly expressed *S. lycopersicum* pollen genes affect the results, we include only genes for which the ratio of expression in *S. lycopersicum* pollen and *pen*^lyc^ samples is less than 1. ```{r proc_lmC1} ``` ```{r main_lmC1} ``` ```{r filter_lmC1} ``` ```{r model_lmPS} ``` These are the differentially expressed genes for this comparison (`r dim(lmC1.screen)[1]` significant genes at FDR = 10%): \small ```{r ui_out_toptable} UIcross_out <- lmC1.screen %>% transmute(gene, P.Value=p.val, logFC, Annotation=unname(annot$annShort[match(gene, annot$gene)]) )%>% arrange(desc(abs(logFC)))%>% filter(row_number()<21) UIcross_out$P.Value <-prettyNum(UIcross_out$P.Value, format='e', digits=roundDigits) UIcross_out$P.Value[UIcross_out$P.Value=="0"] <- "< 2.0e-16" kable(UIcross_out, align='l', digits=roundDigits, caption="Genes differentially expressed in *pen*^lyc^ samples vs. other UI-competent tissues") ``` \normalsize An alternative to search for UI cross-activated genes is to compare *pen*^lyc^ to non-UI pollinated styles (*i.e.*, *pen*^self^, *lyc*^self^, *lyc*^pen^). However, this model yields only `r dim(lmPS.screen)[1]` significant genes (FDR = 10%), of which `r lmPS.newgenes$gene` are not included in the previous model: \small ```{r ps_out_toptable} lmPS_out <- lmPS.screen %>% transmute(gene, P.Value=p.val, logFC, Annotation=unname(annot$annShort[match(gene, annot$gene)]) )%>% arrange(desc(abs(logFC)))%>% filter(row_number()<21) lmPS_out$P.Value <-prettyNum(lmPS_out$P.Value, format='e', digits=roundDigits) lmPS_out$P.Value[lmPS_out$P.Value=="0"] <- "< 2.0e-16" kable(lmPS_out, align='l', digits=roundDigits, caption="Genes differentially expressed in *pen*^lyc^ vs. non-UI pollinated styles") ``` \pagebreak #Allele-specific expression ```{r read_allelespec_data} ``` ```{r proc_allelespec} ``` ```{r lm_allelespec} ``` ```{r allelespec_filters} ``` Counts of allele-specific reads were possible to infer for `r dim(pollen.lm)[1]` genes. Mean expression level in these genes is `r prettyNum( mean(rowMeans(cpm(pollen.y))), digits=roundDigits)` cpm for pollen, and `r prettyNum( mean(rowMeans(cpm(style.y))), digits=roundDigits)` cpm in styles. These allele-specific reads allow us to search for genes that change in styles and pollen in response to the UI cross. For the style-specific response we require two comparisons to be significant: expression in *lyc*^pen^sty vs. *pen*^lyc^sty, and [*pen*^S+^,*pen*^S-^] vs. *pen*^lyc^sty. We take genes with significance at FDR=10%, at least 2-Fold change, and include only those with a maximum expression of at least 5 cpm. For the pollen-specific response we compare *lyc*^pen^pol vs. *pen*^lyc^pol, at FDR=10% and at least 2-Fold change. Additionally, we consider only genes that do not show significant (FDR=10%) differences in [*lyc*^germ^, *lyc*^dry^] vs. [*pen*^germ^, *pen*^dry^]. This last filter allows us to focus on genes that have changed only as a result of the cross. A comparison between [*lyc*^germ^, *lyc*^dry^] vs. *pen*^lyc^pol is not straightforward, since it involves comparing read counts in libraries extracted from very different tissues (pollen and pollinated styles). After all filters, we find `r dim(pollen.screen)[1]` significant genes in pollen and `r dim(style.screen)[1]` significant genes in styles. `r allele.overlap.print` of these genes appear in the main analysis. We find `r length(allele.interList)` genes that are significant in styles and pollen. \small ```{r poll_allele_toptable} pollen_allele_out <- pollen.screen %>% transmute(gene, P.Value=p.val, logFC, Annotation=unname(annot$annShort[match(gene, annot$gene)]) )%>% arrange(desc(abs(logFC)))%>% filter(!gene%in% allele.interList)%>% filter(row_number()<21) pollen_allele_out$P.Value <-prettyNum(pollen_allele_out$P.Value, format='e', digits=roundDigits) pollen_allele_out$P.Value[pollen_allele_out$P.Value=="0"] <- "<2.0e-16" kable(pollen_allele_out, align='l', digits=roundDigits, caption="Top-20 UI vs. non-UI, for pollen-specific read counts") ``` \pagebreak ```{r sty_allele_toptable} style_allele_out <- style.screen %>% transmute(gene, P.Value=p.val, logFC, Annotation=unname(annot$annShort[match(gene, annot$gene)]) )%>% arrange(desc(abs(logFC)))%>% filter(!gene%in% allele.interList)%>% filter(row_number()<21) style_allele_out$P.Value <-prettyNum(style_allele_out$P.Value, format='e', digits=roundDigits) style_allele_out$P.Value[style_allele_out$P.Value=="0"] <- "<2.0e-16" kable(style_allele_out,align='l', digits=roundDigits, caption="Top-20 UI vs. non-UI, for style-specific read counts") interList_out <-style.screen %>% arrange(desc(abs(logFC)))%>% transmute(gene, Annotation=unname(annot$annShort[match(gene, annot$gene)]) )%>% filter(gene%in% allele.interList) kable(interList_out,align='l', digits=roundDigits, caption="Genes that show a pollen and style response in UI cross") ``` \pagebreak #Summary of expression patterns in *a priori* candidates ```{r the45_table} ``` ```{r the45table_out} the45_out <- the45%>%mutate(P.Value =prettyNum(P.Value, format='e', digits=roundDigits)) the45_out$P.Value[the45_out$P.Value=="0"] <- "<2.0e-16" kable(the45_out, align='l', digits=roundDigits, caption="Expression in 45 *a priori* candidates. UI: mean expression in UI-competent tissues, in cpm. FC: Fold change with respect to non-UI tissues. P.Value: significance of a simple t-test for UI vs. non-UI tissues.") ``` \pagebreak #Species differences in pollen expression ```{r pollen_only} ``` \normalsize Finally, we searched for genes that show significant differences in their expression levels in pollen samples. We fit a linear model with two effects (species: *lyc* or *pen*, and tissue: *germ* or *dry*), and control for individual effects (as a random effect). Including the "tissue" variable allows us to gain power by reducing variance within species samples. We take genes with statistically significant differences (at FDR=1%) and fold change greater than 4. Using these criteria, we find `r dim(lmPolCheck.screen)[1]` genes differentially expressed between species, and `r prettyNum(pollen_up_pct, digits=roundDigits)`% of these are upregulated. There are `r dim(lmpollen_lm1_overlap)[1]` genes that also appear in our list of UI candidates (from the full model above). From those overlaping genes, `r prettyNum(overlap_up_pct, digits=roundDigits)`% of those are up-regulated in S. pennellii. \small ```{r pollen_only_toptable} lmPolCheck_out <- lmPolCheck.screen %>% transmute(gene, P.Value = p.val, logFC, Annotation=unname(annot$annShort[match(gene, annot$gene)]) ) lmPolCheck_out$P.Value <-prettyNum(lmPolCheck_out$P.Value, format='e', digits=roundDigits) lmPolCheck_out$P.Value[lmPolCheck_out$P.Value=="0"] <- "<2.0e-16" lmPolCheck_up <- lmPolCheck_out %>% arrange(desc(logFC))%>% filter(row_number() < 51) lmPolCheck_dn <- lmPolCheck_out %>% arrange(logFC)%>% filter(row_number() <51) kable(lmPolCheck_up, align='l', digits=roundDigits, caption="Top-50 upregulated in pollen of *S. pennellii* with respect to *S. lycopersicum*") ``` \pagebreak ```{r pollen_only_down} kable(lmPolCheck_dn, align='l', digits=roundDigits, caption="Top-50 downregulated in pollen of *S. pennellii* with respect to S. *lycopersicum*") ``` \pagebreak ```{r write_tables} ``` #R session info ```{r sess} sessionInfo() ```