Costanzo, Michael1; VanderSluis, Benjamin2; Koch, Elizabeth N.2; Baryshnikova, Anastasia3; Pons, Carles2; Tan, Guihong1; Wang, Wen2; Usaj, Matej1; Hanchard, Julia1; Lee, Susan D.4; Pelechano, Vicent5; Styles, Erin B.1; Billmann, Maximilian6; Van Leeuwen, Jolanda1; Van Dyk, Nydia1; Lin, Zhen-Yuan7; Kuzmin, Elena1; Nelson, Justin2; Piotrowski, Jeff S.1; Srikumar, Tharan8; Bahr, Sondra1; Chen, Yiqun1; Deshpande, Raamesh2; Kurat, Christoph F.1; Li, Sheena C.1; Li, Zhijian1; Mattiazzi Usaj, Mojca1; Okada, Hiroki9; Pascoe, Natasha1; San Luis, Bryan-Joseph1; Sharifpoor, Sara1; Shuteriqi, Emira1; Simpkins, Scott W.2; Snider, Jamie1; Garadi Suresh, Harsha1; Tan, Yizhao1; Zhu, Hongwei1; Malod-Dognin, Noel10; Janjic, Vuk11; Przulj, Natasa10; Troyanskaya, Olga G.12; Stagljar, Igor1; Xia, Tian2; Ohya, Yoshikazu9; Gingras, Anne-Claude1; Raught, Brian8; Boutros, Michael6; Steinmetz, Lars M.5; Moore, Claire L.4; Rosebrock, Adam P.1; Caudy, Amy A.1; Myers, Chad L.2; Andrews, Brenda1; Boone, Charles1
Published Aug 27, 2017
on Dryad.
https://doi.org/10.5061/dryad.4291s
INTRODUCTION: Genetic interactions occur when mutations in two or more genes combine to generate an unexpected phenotype. An extreme negative or synthetic lethal genetic interaction occurs when two mutations, neither lethal individually, combine to cause cell death. Conversely, positive genetic interactions occur when two mutations produce a phenotype that is less severe than expected. Genetic interactions identify functional relationships between genes and can be harnessed for biological discovery and therapeutic target identification. They may also explain a considerable component of the undiscovered genetics associated with human diseases. Here, we describe construction and analysis of a comprehensive genetic interaction network for a eukaryotic cell.
RATIONALE: Genome sequencing projects are providing an unprecedented view of genetic variation. However, our ability to interpret genetic information to predict inherited phenotypes remains limited, in large part due to the extensive buffering of genomes, making most individual eukaryotic genes dispensable for life. To explore the extent to which genetic interactions reveal cellular function and contribute to complex phenotypes, and to discover the general principles of genetic networks, we used automated yeast genetics to construct a global genetic interaction network.
RESULTS: We tested most of the ~6000 genes in the yeast Saccharomyces cerevisiae for all possible pairwise genetic interactions, identifying nearly 1 million interactions, including ~550,000 negative and ~350,000 positive interactions, spanning ~90% of all yeast genes. Essential genes were network hubs, displaying five times as many interactions as nonessential genes. The set of genetic interactions or the genetic interaction profile for a gene provides a quantitative measure of function, and a global network based on genetic interaction profile similarity revealed a hierarchy of modules reflecting the functional architecture of a cell. Negative interactions connected functionally related genes, mapped core bioprocesses, and identified pleiotropic genes, whereas positive interactions often mapped general regulatory connections associated with defects in cell cycle progression or cellular proteostasis. Importantly, the global network illustrates how coherent sets of negative or positive genetic interactions connect protein complex and pathways to map a functional wiring diagram of the cell.
CONCLUSION: A global genetic interaction network highlights the functional organization of a cell and provides a resource for predicting gene and pathway function. This network emphasizes the prevalence of genetic interactions and their potential to compound phenotypes associated with single mutations. Negative genetic interactions tend to connect functionally related genes and thus may be predicted using alternative functional information. Although less functionally informative, positive interactions may provide insights into general mechanisms of genetic suppression or resiliency. We anticipate that the ordered topology of the global genetic network, in which genetic interactions connect coherently within and between protein complexes and pathways, may be exploited to decipher genotype-to-phenotype relationships.
Data File S1. Raw genetic interaction datasets - Pair-wise interaction format
This folder contains complete SGA genetic interaction data for the following:
• A list of all query and array mutant strains represented in the genetic interaction
network along with their corresponding fitness estimates
• Nonessential x Nonessential network (NxN)
• Essential x Essential network (ExE)
• Essential x Nonessential network (ExN)
• Genetic interactions involving DAmP alleles of essential genes
• Genetic interactions for HSP90 and corresponding controls
The interaction datasets are provided in a tab-delimited format with 11 columns:
• Query ORF
• Query gene name
40
• Array ORF
• Array gene name
• Array Type (DMA or TSA) and Temperature (26o
C or 30o
C)
• Genetic interaction score (ε)
• P-value
• Query single mutant fitness (SMF)
• Array SMF
• Double mutant fitness
• Double mutant fitness standard deviation
Data File S1_Raw genetic interaction datasets - Pair-wise interaction format.zip
Data File S2. Raw genetic interaction datasets - Matrix format
This file contains complete SGA genetic interaction data matrix for the following:
• Nonessential x Nonessential network (NxN)
• Essential x Essential network (ExE)
• Combined essential and nonessential network (ExN)
Data File S2_Raw genetic interaction datasets - Matrix format.zip
Data File S3. Genetic interaction profile similarity matrices
Matrix files containing genetic interaction profile similarity values (as measured by Pearson correlation) for every pair of mutant strains in the dataset. Similarity values were computed for essential (ExE), non-essential (NxN) and the global similarity network derived from a combined set of all genetic interactions (ExE, NxN, ExN) as described above (see "Constructing genetic interaction profile similarity networks"). Each matrix contains 2 sets of row and column headers, providing a unique allele name for every mutant strain (row & column header #1) as well as a systematic ORF name (row & column header #2).
Data File S3_Genetic interaction profile similarity matrices.zip
Data File S4. GO bioprocess functions predicted by the nonessential and essential similarity networks using a K-nearest neighbor approach
This file reports the performance of gene function prediction for non-essential or essential genes based on genetic interaction profiles. For both classes of genes (either nonessential or essential), the performance of a KNN classifier is reported as the Precision at 25% Recall based on interactions derived from TS queries (PR_TSQ) or nonessential deletion queries (PR25_SN). Although analyses were performed using complete genetic interaction profiles (e.g. negative and positive genetic interactions), similar prediction performance was obtained using genetic interaction profiles based on negative interactions alone.
Data File S4_GO bioprocess functions predicted by the nonessential and essential similarity networks using a K-nearest neighbor approach.xlsx
Data File S5. SAFE analysis_Gene cluster identity and functional enrichments
This file lists the results from SAFE analysis of the global genetic profile similarity network (Fig. 1 and Fig. 2). Functional terms enriched within specific network clusters associated with GO biological processes (14) and/or protein complexes (Data File S12). A list of genes comprising each bioprocess-enriched cluster shown on the global similarity network is also provided. Functional terms enriched within specific network clusters associated with cell compartments (17, 119) are all shown on Fig. 2B.
Data File S5_SAFE analysis_Gene cluster identity and functional enrichments.xlsx
Data File S6. Genetic profile similarity-based hierarchy analysis
The first tab (“Gene to hierarchy cluster mapping”) lists the clusters identified at each level of the genetic interaction-based hierarchy and the deletion and TS allele array mutants assigned to each cluster. Examples of clusters described in the main text are highlighted. The subsequent 9 tabs indicate enrichment of clusters resolved at the specified profile similarity range for specific cell compartments (Cyclops_enrich), biological processes (GO BP_enrich), protein complexes (complex_enrich) and KEGG pathways (KEGG_enrich). The final tab in the file indicates the clusters used to map the functional distribution of negative and positive interactions shown in Fig. 5D.
Data File S6_Genetic profile similarity-based hierarchy analysis.xlsx
Data File S7. Pleiotropic gene analysis
This file lists nonessential and essential query genes associated with high confidence pleiotropy scores based on their genetic interactions derived from the TSA (Essential derived pleiotropy) and DMA (Nonessential derived pleiotropy). The file also contains a second list of nonessential and essential query genes that participate in many genetic interactions but exhibited low pleiotropy scores indicating that these genes are more functionally specific.
Data File S7_Pleiotropic gene analysis.xlsx
Data File S8. Mass spectrometric evidence for Ipa1 interactions
This file lists proteins identified with high confidence as specific physical interactors with strains expressing Ipa1-GFP from its endogenous locus or Ipa1-HA from a galactose-inducible plasmid.
Data File S8_Mass spectrometric evidence for Ipa1 interactions.xlsx
Data File S9. High and low interaction degree genes
This file lists the negative and positive interaction degree associated with every nonessential deletion (sn#), essential TS (tsq#), and DAmP (damp#) query mutant strain screened against the DMA (“query degree X DMA” tab) and/or TSA (“query degree X TSA” tab). A subset of strains were found to carry a second, spontaneous suppressor mutation that affected fitness of the query mutant strain. Strains carrying a suppressor mutation mapped through SGA analysis are indicated (“-supp”). Query mutants comprising the 20% highest and lowest degree groups of strains are indicated. Furthermore, a “Co-batch signal” rank is provided for every query (see “Co-batch filtering of query mutant strains”). Low ranks correspond to evidence for lingering batch effects. Another column, “ Gene with correlated GI profiles that are co-annotated with the query gene (%)", provides the percent of correlated gene pairs that are co-annotated to the particular query. A low negative interaction degree (e.g. 20% lowest negative interaction degree) coupled with a low co-batch rank (e.g. < ~0.2) and a low fraction of correlated pairs that share a similar functional annotation with a given query strain (e.g. < ~0.15) may be indicative of a low confidence screen. However, these criteria should be considered as loose indicators and not definitive metrics of screen quality and thus, should not be used as strict filters on the global interaction dataset. Another list (“Queries removed - batch effects” tab) indicates ~300 query strains that exhibited severe systematic batch effects and thus were removed from the indicated data set. Finally, two additional tabs provide the negative and positive interaction degree associated with every nonessential (“nonessential array degree” tab) and essential (“essential array degree” tab) array mutant, respectively.
Data File S9_High and low interaction degree genes.xls
Data File S10. Correlation analysis of query strain GI degree
As a complement to analysis of array strains (fig. S11-S12), GI degrees were calculated for query strains by counting negative interactions (tab 1, interactions with DMA strains; tab 2, interactions with TSA strains) and by counting positive interactions (tab 3, interactions with DMA strains; tab 4, interactions with TSA strains). Essential and nonessential queries were analyzed separately and results are labeled by grouped column headers. Wilcoxon rank-sum tests compared the GI degree in paired gene sets defined by absence and presence of each binary feature tested (top table). If the P-value is significant (< 0.05), the “Test result” column describes the degree of the set of genes for which the listed binary feature is true (compared to the set for which the feature is false). Tests were not performed, indicated by “N/A”, if data were present for fewer than 50 strains; strains with missing data were excluded from the tests. Pearson’s correlation (column labeled “r”) was used to measure associations between GI degree and features that are continuous or counts (bottom table). Uncorrected P-values are shown. The features examined in this analysis are described above (see Methods section entitled, “ Genetic interaction degree and frequency analysis”). Given that analysis of different features required using different statistical tests and some features are not expected to be independent of each other, no multiple hypotheses correction procedures were used. We do note that 31 gene features were tested.
Data File S10_Correlation analysis of query strain GI degree.xlsx
Data File S11. Nonessential and essential GI hub functional enrichment analysis
This file lists GO biological process, molecular function and cellular component terms that are enriched among of 10% of array strains with the most negative and 10% of array strains with the most positive interactions identified in the ExE network. Enrichments are also included for the 5% of array strains with the most negative interactions and 5% of array strains with the most positive interactions in the NxN genetic interaction network.
Data File S11_Nonessential and essential GI hub functional enrichment analysis.xlsx
Data File S12. Protein complex standard
This file provides a list of protein complexes compiled from two sources: Baryshnikova 2010 (5) and Benschop 2010 (117).
Data File S12_Protein complex standard.xlsx
Data File S13. Genetic interaction enrichment among protein complexes
This file lists all possible pairs of protein complexes tested in the ExE, NxN and ExN networks. Enrichment for negative and positive interactions between genes in the same complex or between genes in different complexes is indicated. In addition, enrichment for genetic interaction in general (Combined interaction enrichment) regardless of the type is also indicated. Finally, Interaction Sign Bias indicates the distribution of interactions between genes within the same complex or between genes in different complexes. The interaction sign bias is computed as the mean over all interactions for within a given complex or between a pair of complexes. For example, an interaction sign bias of -1 indicates that all interactions identified between a set of genes encoding complexes members are negative, whereas a score of 1 indicates that only positive interactions were identified between a particular set of protein complex encoding genes. Rows highlighted in blue indicated complex-complex pairs enriched for negative interactions where greater than 75% of all interactions detected were negative. Rows highlighted in yellow indicated complex-complex pairs enriched for negative interactions where greater than 75% of all interactions detected were positive. The analysis shown in Fig. 7 is based on subset of complexes composed of 75% essential genes (i.e. considered essential complexes) or 75% nonessential genes (i.e. considered nonessential complexes). The complexes used for this analysis and their enrichment results are listed in the tabs labeled, “_filtered”. The tabs named “all” list within and between complex enrichment for all protein complexes without prior filtering of complexes composed of less than 75% essential or 75% nonessential genes.
Data File S13_Genetic interaction enrichment among protein complexes.xlsx
Data File S14. D. melanogaster S2 cell fitness
This file provides D. melanogaster S2 cell fitness upon RNAi-mediated 26S proteasome depletion and Bortezomib treatment.
Data File S14_D. melanogaster S2 cell fitness.xlsx
Data File S15. Protein complex interaction enrichment and bias
This file indicates fold enrichment and biases in positive vs. negative interaction frequency for protein complexes and is described in detail above (see ìAnalysis of protein complexes exhibiting a positive interaction enrichment biasî). Rows highlighted in yellow indicate protein complexes that show > 1.5X enrichment for positive interactions (ìE_fold_pos) stronger enrichment for positive versus negative interactions when screened against the essential TSA. The file consists of the following columns:
(A) Protein complex name
(B) Number of complex member-encoding query genes screened against the DMA (ìqueries_vs_DMAî).
(C) Number of complex member-encoding query genes screened against the TSA (ìqueries_vs_TSAî).
(D) Nonessential-negative GI fold enrichment (ìN_fold_negî): negative interaction fold enrichment for a complex of interest with nonessential genes not in the complex.
(E) Essential-negative GI fold enrichment (ìE_fold_negî): negative interaction fold enrichment for a complex of interest with essential genes not in the complex.
(F) Nonessential-positive GI fold enrichment (ìN_fold_posî): positive interaction fold enrichment for a complex of interest with nonessential genes not in the complex.
(G) Essential-positive GI fold enrichment (ìE_fold_posî): positive interaction fold enrichment for a complex of interest with essential genes not in the complex. Complexes with a positive GI enrichment > 1.5X are highlighted in yellow. These values were used to generate Fig. 8C.
(H) Positive GI bias with essential genes (ìposGI_bias_with_Eî): the relative positive:negative enrichment ratio of essential to nonessential genes for the complex of interest (calculated as [D/E]/[F/G]). Complexes with a positive GI enrichment > 1 and a positive GI bias > 1.5 are highlighted in yellow. These values were used to generate Fig. 8D.
(I) Positive GI bias with nonessential genes (posGI_bias_with_Nî): the relative positive:negative enrichment ratio of nonessential to essential genes for the complex of interest (calculated as [F/G]/[D/E]).
Data File S15_Protein complex interaction enrichment and bias.xlsx
Data File S16. Genetic suppression analysis
This file includes raw data from spot dilution growth assays to identify positive interactions that can be classified as genetic suppression. The suppression score is based on visual assessment of double mutant strain growth relative to a wild type and single mutant control strains. The score reflects strength of suppression with a score of 4 indicative of a strong suppression interaction where double mutant growth exceeded growth of the sickest single mutant and a score of 0 indicates failure to confirm a suppression interaction.
Data File S16_Genetic suppression analysis.xlsx
Data File S17. YNL181W chemical genetics data
Relative growth of a YNL181W-DAmP strain (CG score) measured in the presence of 92 different compounds.
Data File S17_YNL181W chemical genetics data.xlsx