Supporting Information: Toward learned chemical perception of force field typing rules
Data files
May 08, 2018 version files 6.47 GB
-
molecules_and_forcefields.tar.gz
7.51 MB
-
README.md
3.86 KB
-
SMARTY.tar.gz
746.43 MB
-
SMIRKY_100k_iterations.tar.gz
3.67 GB
-
SMIRKY_10k_iterations.tar.gz
1.52 GB
-
SMIRKY_50k_iterations.tar.gz
499.71 MB
-
SMIRKY_analysis.tar.gz
24.60 MB
Abstract
The Open Force Field Initiative seeks to to automate force field development in order to advance force fields and improve accuracy (openforcefield.org). An important part of this effort includes automating the determination of chemical perception --- that is, the way force field parameters are assigned to a molecule based on chemical environment. We developed a novel technology for this purpose, termed SMARTY. It generalizes atom typing by using direct chemical perception with SMARTS strings adopting a hierarchical approach to type assignment. The SMARTY technology enables creation of a move set in atom-typing space that can be used in a Monte Carlo optimization. We demonstrate the power of this approach with a fully automated procedure that is able to re-discover human-defined atom types in the traditional small molecule force field parm99/parm@Frosst. We furthermore extend this tool to direct chemical perception of valence types (bonds, angles, and torsions) via SMIRKS strings to create SMIRKY, and, again, assess how well the automated Monte Carlo scheme can discover the valence parameters in smirnoff99Frosst without human intervention.
This repository contains input files, output files, anlysis results, and the required scripts to perform that analysis for a series of tests with SMARTY and SMIRKY. All tests were performed on three molecule sets: AlkEthOH is a simple set of 42 molecules with only alkanes, ethers, and alcohols; PhEthOH was created as an extension of this set to include aromatic carbons in addition to the alkanes for a total of 200 molecules; and lastly, MiniDrugBank was created to mimic the complexity of the DrugBank database with a minimal number of molecules containing 371 druglike molecules in the final set (github.com/openforcefield/MiniDrugBank).
For more information on SMARTY and SMIRKY including the source code for these tools, instructions on their use, and detailed examples for implementation checkout the relevant github repository: github.com/openforcefield/smarty
Methods
These results were collected from SMARTY and SMIRKY simulations on three different molecule sets.
For details on these tools see our github repository: github.com/openforcefield/smarty
Usage notes
This repository includes input files, output files, and score vs. iteration plots for all SMARTY and SMIRKY simulations with every molecule set.
This also includes molecule files that are human and machine readable.
It contains heat map results shared here as images and as computer readable csv files.
This also includes all of the python scripts used to analyze these results and to create the figures in this paper.
For more details on organization and contents see the internal README files