CAIRA: Catalytic associated irregular residue analyser
Data files
Oct 29, 2024 version files 1.17 MB
-
logo.ico
270.40 KB
-
logo.jpg
645.25 KB
-
merops_dictionary.json
252.21 KB
-
README.md
2.99 KB
Abstract
For the semi-automated identification of potential cleavage specificity-modulating SNVs we developed a program, called CAIRA (Catalytic Associated Irregular Residue Analyser). Entering the UniProt ID of a protease, CAIRA takes the predicted AlphaFold structure, generates a user-definable radius (e.g. 20 Å) around the active site and filters the COSMIC database for cancer-associated SNVs within this radius, which might have cleavage specificity-modulating effects. CAIRA can be used with all types of proteases (metallo, serine, aspartyl and cysteine proteases). To further assess their potential impact, SNV-affected amino acids are labelled in the protein structure of a downloadable pdb-file.
https://doi.org/10.5061/dryad.c59zw3rj5
Description of the data and file structure
For the semi-automated identification of potential cleavage specificity-modulating SNVs we developed a program, called CAIRA (Catalytic Associated Irregular Residue Analyser). Entering the UniProt ID of a protease, CAIRA takes the predicted AlphaFold structure, generates a user-definable radius (e.g. 20 Å) around the active site and filters the COSMIC database for cancer-associated SNVs within this radius, which might have cleavage specificity-modulating effects. CAIRA can be used with all types of proteases (metallo, serine, aspartyl and cysteine proteases). To further assess their potential impact, SNV-affected amino acids are labelled in the protein structure of a downloadable pdb-file.
Files and variables
File: logo.ico
Description: logo of CAIRA 3.0 (is included in CAIRA3.0.exe). The logo of CAIRA was designed with the help of Adobe Firefly.
File: merops_dictionary.json
Description: Cleavage specificity logos within CAIRA3.0.exe are generated using cleavage site specificity data from MEROPS the Peptidase Database (Rawlings, N.D., Waller, M., Barrett, A.J. & Bateman, A. (2014) MEROPS: the database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res 42, D503-D509). Cleavage specificity data obtained used for this from MEROPS data base is included in merops_dictionary.json.
File: logo.jpg
Description: logo of CAIRA 3.0 (is included in CAIRA3.0.exe). The logo of CAIRA was designed with the help of Adobe Firefly.
Code/software
The CAIRA program can be run from CAIRA3.0.exe (this is a one file version). The underlying python script is included in CAIRA3.0.py. AAS_in_tube2.py contains functions that are imported into CAIRA 3.0.
Catalytic associated irregular residue analyser (CAIRA) was programmed using python 3.12.5 in IDLE environment.
Used packages:
requests
csv
pandas
subprocess
os
matplotlib.pyplot
numpy
matplotlib.backends.backend_tkagg
Bio
sys
io
json
shutil
PyQt5
PyQt5.QtWidgets
PyQt5.QtCore
PyQt5.QtGui
OpenGL.GL
OpenGL.GLU
OpenGL.GLUT
matplotlib.backends.backend_qt5agg
FigureCanvasQTAgg
re
math
sys
Access information
Data was derived from the following sources:
- merops_dictionsry.json includes cleavage specificity data obtained from MEROPS data base (Rawlings, N.D., Waller, M., Barrett, A.J. & Bateman, A. (2014) MEROPS: the database of proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res 42, D503-D509). http://merops.sanger.ac.uk/. Licence: The MEROPS database is is provided under the terms of the GNU Library General Public License and the complete content of the database is referred to as the ‘Library’.
CAIRA was programmed using python. Protein structures were obtained from AlphaFold. Cleavage specificity logos were generated as described above using cleavage site specificity data from MEROPS the Peptidase Database. Cancer-associated SNVs data was taken from the COSMIC database (cancer.sanger.ac.uk;). Catalytically important amino acid residue(s) within the active site were extracted from UniProt annotation and are used as the centre of the sphere. For proteases with more than one catalytically important amino acid residue within the active site (aspartyl, serine and cysteine proteases) CAIRA uses the middle between the relevant residues as the centre of the sphere. In the beta version, the ‘use binding cavity (beta)’ button can also be additionally used to switch to a version in which the binding cavity is imitated and SNVs can be output within it. For this purpose, cylinders of adjustable size are placed around the chemical bonds of the backbone of the protease's propeptide and amino acids of the protease that lie with at least one atom within these cylinders are filtered out and checked for annotated SNVs. The logo of CAIRA was designed with the help of Adobe Firefly.