Data from: Compromise docking power evaluation of liganded crystal structures of Mpro SARS-CoV-2
Data files
Jan 07, 2024 version files 59.11 MB
-
Compounds_structures.zip
-
README.md
Abstract
A set of 406 liganded SARS-CoV-2 Mpro crystal structures originally downloaded from RCSB PBD database is provided. Ligand and protein files are processed and corrected for various types of structural errors and are provided in pdbqt and mol2 formats for immediate use in molecular docking programs AutoDock, AutoDock Vina, and PLANTS. Data are utilized in calculations of newly defined compromise docking power to monitor the performance of above-mentioned software. The provided dataset can also be used for benchmarking of other software and molecular docking protocols on liganded SARS-CoV-2 Mpro systems.
README: Data from: Compromise docking power evaluation of liganded crystal structures of Mpro SARS-CoV-2
https://doi.org/10.5061/dryad.5hqbzkhc6
A set of 406 liganded SARS-CoV-2 Mpro crystal structures originally downloaded from RCSB PBD database is provided. Ligand and protein files are processed and corrected for various types of structural errors and are provided in pdbqt and mol2 formats for immediate use in molecular docking programs AutoDock, AutoDock Vina, and PLANTS. Data are utilized in calculations of newly defined compromise docking power to monitor the performance of above-mentioned software. The provided dataset can also be used for benchmarking of other software and molecular docking protocols on liganded SARS-CoV-2 Mpro systems.
Description of the data and file structure
Data are provided as one archive file (zip), which contains covalent and noncovalent directories. Each of these directories contains additional directories (247 in the covalent and 169 in the noncovalent directory) denoted by the 4 symbol code, corresponding to PDB code on the RCSB database. Every such directory contains protein and ligand structures in pdbqt and/or mol2 formats, ready for use in AutoDock and AutoDock Vina, and PLANTS, respectively.
Methods
Provided data set is comprised of 406 SARS-CoV-2 Mpro crystal structures (169 noncovalent and 247 covalent in pdbqt and mol2 formats) ready to use in molecular docking programs AutoDock, AutoDock Vina, and PLANTS.
The initial dataset containing 671 SARS-CoV-2 Mpro crystal structures was downloaded (10th of February 2022) from RCSB PDB database. 161 unliganded structures and two structures containing unparametrized atoms in the ligand structure (Se, Zn) were discarded from further processing. The remaining 508 crystal structures were then stripped of disordered atoms, crystal waters, ions, and cosolvents and aligned to a reference structure (PDB ID: 6wqf). Crystal structures were then split into separate files for each monomer present, and the first chain containing a ligand-protein pair was selected for further processing. The integrity of the ligand structure was validated by expressing its InChIKey, using OpenBabel 2.3.2, and comparing it with its RCSB entry. Discrepancies were recorded and addressed manually by transforming the ligand pdb structure from the respective monomer unit to the mol2 format using OpenBabel and editing the individual atoms/bond orders to produce the desired ligand structure. Ligands with a missing fragment (102 in total), i.e., more than hydrogen atoms were omitted. OpenBabel was then used to produce the desired file format of both ligand and protein structures, pdbqt for AutoDock and AutoDock Vina with only polar hydrogens present, and mol2 for PLANTS, with added Gasteiger charges.