Skip to main content
Dryad

Data from: Machine learning-based discovery of molecular descriptors that control polymer gas permeation

Data files

Feb 29, 2024 version files 1.16 MB

Abstract

While machine learning has found increasing use in predicting the properties of polymeric materials with only a knowledge of chain architecture, determining the molecular factors underpinning properties ("interpretable AI") has remained less well explored. We show that encoding chain chemistry in commonly employed formats, e.g., binary-valued fingerprints, leads to uniqueness issues during the hashing process to save storage space. This is because the hashing algorithm can map several chemical moieties into the same bit. These issues carry over into the ML algorithms, especially for “inverse” design and interpretable AI, and cannot be avoided by changing the length of the fingerprint. Using MACCS key featurizations of monomer repeats resolves some of these issues, and we show that a few substructures consistently appear in top features for maximizing permeability across several gases and ML models. These are carbon-carbon double bonds (as in polyacetylenes) especially when they are associated with methyl groups (found in branching architectures). These results, derived from the limited data set of ~500 polymers with experimental gas permeation data, are in agreement with physical insight and thus provide a robust foundation which could further enable study of these material classes through detailed experiments and simulations.