Skip to main content

Dataset for article: Co-evolutionary landscape at the interface and non-interface regions of protein-protein interaction complexes

Cite this dataset

Mukherjee, Ishita; Chakrabarti, Saikat (2021). Dataset for article: Co-evolutionary landscape at the interface and non-interface regions of protein-protein interaction complexes [Dataset]. Dryad.


Proteins involved in interactions throughout the course of evolution tend to co-evolve and compensatory changes may occur in interacting proteins to main­tain or refine such interactions. However, certain residue pair alterations may prove to be detrimental for functional interactions. Hence, determining co-evolutionary pairings that could be structurally or functionally relevant for maintaining the conservation of an inter-protein interaction is important. Inter-protein co-evolution analysis in several complexes utilizing multiple existing methodologies suggested that co-evolutionary pairings can occur in spatially proximal and distant regions in inter-protein interactions. Subsequently, the Co-Var (Correlated Variation) method based on mutual information and Bhattacharyya coefficient was developed, validated, and found to perform relatively better than CAPS and EV-complex. Interestingly, while applying the Co-Var measure and EV-complex program on a set of protein-protein interaction complexes, co-evolutionary pairings were obtained in interface and non-interface regions in protein complexes. The Co-Var approach involves determining high degree co-evolutionary pairings that include multiple co-evolutionary connections between particular co-evolved residue positions in one protein with multiple residue positions in the binding partner. Detailed analyses of high degree co-evolutionary pairings in protein-protein complexes involved in cancer metastasis suggested that most of the residue positions forming such co-evolutionary connections mainly occurred within functional domains of constituent proteins and substitution mutations were also common among these positions. The physiological relevance of these predictions suggests that Co-Var can predict residues that could be crucial for preserving functional protein-protein interactions. Finally, Co-Var web server ( that implements this methodology identifies co-evolutionary pairings in intra and inter-protein interactions.


A number of protein-protein interaction complexes [100] were identified from previous published data (1-3) and complexes involving proteins with sufficient number of homologs and available crystal structure were selected. Around 50 protein complexes were considered as “positive set”. Additionally, non-interacting proteins from the Negatome database (4) were considered as the “negative set”. Close orthologs or similar sequences were determined using DELTA-BLAST (Domain enhanced lookup time accelerated BLAST) (5) and taxonomy filtered non-redundant sequences having E-value <= 1E-04, query coverage >= 70%, sequence identity >= 45% were utilized for preparing multiple sequence alignments (MSA) representative of each sequence family in MAFFT (6). Alignments for homologous sequences of the representative interacting and non-interacting proteins in the “positive set” and the “negative set” were prepared in this manner.


  1. Mintseris, J. and Weng, Z. (2003), Atomic contact vectors in protein‐protein recognition. Proteins, 53: 629-639. doi:10.1002/prot.10432
  2. Sowmya, G., Breen, E. J., & Ranganathan, S. (2015). Linking structural features of protein complexes and biological function. Protein science : a publication of the Protein Society24(9), 1486-94.
  3. Rodriguez-Rivas, J., Marsili, S., Juan, D., & Valencia, A. (2016). Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone. Proceedings of the National Academy of Sciences of the United States of America, 113(52), 15018–1502 doi:10.1073/pnas.1611861114
  4. Smialowski, P., Pagel, P., Wong, P., Brauner, B., Dunger, I., Fobo, G., Frishman, G., Montrone, C., Rattei, T., Frishman, D., et al. (2009). The Negatome database: a reference set of non-interacting protein pairs. Nucleic acids research38(Database issue), D540-4.

  5. Boratyn, G. M., Schäffer, A. A., Agarwala, R., Altschul, S. F., Lipman, D. J., & Madden, T. L. (2012). Domain enhanced lookup time accelerated BLAST. Biology direct7, 12.doi:10.1186/1745-6150-7-12
  6. Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on Fast Fourier transform. Nucleic Acids Res. 2002;30(14):3059–66.

Usage notes

Alignments utilised for validating the Co-Var methodology (“positive set” and “negative set”) are provided in the 'Validation Set' folder. Predicted co-evolutionary pairings and the distances among them for the 'Validation Set' are included in the 'Program Outputs' folder. 

Data pertaining to the application  case studies has been included in the 'Case studies' folder.

Processed data utilised for generating the final figures corresponding to each figure is provided in the 'Processed data' folder. 

Availability: Co-Var web server implementing this methodology that identifies co-evolutionary pairings in intra-protein, inter-protein, protein-DNA or protein-RNA complexes is available at and the Co-Var package can be downloaded from


Department of Science and Technology, Award: GAP362