Skip to main content
Dryad

Curated in vivo subset of ZINC15: Correction of formal charge and molecular structure in mol2 format

Data files

May 14, 2026 version files 20.05 MB
May 19, 2026 version files 20.05 MB

Click names to download individual files

Abstract

The database of molecules, referred to as in vivo, was downloaded on 10/12/2021 from ZINC15 (https://zinc15.docking.org/). The database contained a total of 60,411 molecules, which were available in mol2 format. Errors related to structure (number of atoms) and formal charge were found when comparing information from the mol2 formats of individual molecules and their InChI codes present on the ZINC15 website. The reference number of atoms was obtained from the InChI code, as the sum of the number of atoms from the formula (main layer of InChI code), which was modified by adding/subtracting the (de)protonation information from the protonation sublayer (/p). The reference number of atoms was compared with the number of atoms from the mol2 format. The reference formal charge was obtained as the sum of the charges given in the charge sublayer (/q) and the protonation sublayer (/p) of the InChI code. The reference formal charge was compared with the formal charge obtained by summing the partial charges from the mol2 format, with 0.1 e chosen as the acceptable deviation. Together 1,115 corrected molecules (curated by Open Babel 3.1.1) are provided in mol2 format. In addition, a bash script is available for downloading the in vivo database, along with detailed instructions for reproducing the described workflow.