Stable and accurate orbital-free density functional theory powered by machine learning
Data files
Aug 06, 2025 version files 248.68 GB
-
assemble_and_extract.sh
1.52 KB
-
models.tar
641.88 MB
-
QM9_perturbed_fock.tar.part_aa
9.66 GB
-
QM9_perturbed_fock.tar.part_ab
9.66 GB
-
QM9_perturbed_fock.tar.part_ac
9.66 GB
-
QM9_perturbed_fock.tar.part_ad
9.66 GB
-
QM9_perturbed_fock.tar.part_ae
9.66 GB
-
QM9_perturbed_fock.tar.part_af
9.66 GB
-
QM9_perturbed_fock.tar.part_ag
9.66 GB
-
QM9_perturbed_fock.tar.part_ah
9.66 GB
-
QM9_perturbed_fock.tar.part_ai
9.66 GB
-
QM9_perturbed_fock.tar.part_aj
9.66 GB
-
QM9_perturbed_fock.tar.part_ak
9.66 GB
-
QM9_perturbed_fock.tar.part_al
9.66 GB
-
QM9_perturbed_fock.tar.part_am
9.66 GB
-
QM9_perturbed_fock.tar.part_an
9.66 GB
-
QM9_perturbed_fock.tar.part_ao
9.66 GB
-
QM9_perturbed_fock.tar.part_ap
9.66 GB
-
QM9_perturbed_fock.tar.part_aq
9.66 GB
-
QM9_perturbed_fock.tar.part_ar
9.66 GB
-
QM9_perturbed_fock.tar.part_as
9.66 GB
-
QM9_perturbed_fock.tar.part_at
9.66 GB
-
QM9_perturbed_fock.tar.part_au
5.62 GB
-
QMUGS_perturbed_fock.tar.part_aa
9.66 GB
-
QMUGS_perturbed_fock.tar.part_ab
9.66 GB
-
QMUGS_perturbed_fock.tar.part_ac
9.66 GB
-
QMUGS_perturbed_fock.tar.part_ad
9.66 GB
-
QMUGS_perturbed_fock.tar.part_ae
8 GB
-
QMUGS.tar
2.47 GB
-
QMUGSBin0_perturbed_fock.tar
5.18 MB
-
QMUGSBin0QM9_perturbed_fock.tar
14.38 MB
-
README.md
2.97 KB
Abstract
Hohenberg and Kohn have proven that the electronic energy and the one-particle electron density can, in principle, be obtained by minimizing an energy functional with respect to the density. While decades of theoretical work have produced increasingly faithful approximations to this elusive exact energy functional, their accuracy is still insufficient for many applications, making it reasonable to try and learn it empirically. Using rotationally equivariant atomistic machine learning, we obtain for the first time a density functional that, when applied to the organic molecules in QM9, yields energies with chemical accuracy relative to the Kohn-Sham reference while also converging to meaningful electron densities. Augmenting the training data with densities obtained from perturbed potentials proved key to these advances. This work demonstrates that machine learning can play a crucial role in narrowing the gap between theory and the practical realization of Hohenberg and Kohn’s vision, paving the way for more efficient calculations in large molecular systems.
This directory contains the data and models of the paper "Stable and Accurate Orbital-Free Density Functional Theory Powered by Machine Learning".
It contains results from Kohn-Sham DFT calculations of molecules from the QM9 and QMUGS datasets, conducted at the PBE/6-31G(2df,p) level of theory with PySCF and partially with perturbed external potentials, yielding a more varied data distribution, as detailed in the associated paper. Furthermore, the results of density-fitting the Kohn-Sham data to yield labels in a linear combination of atomic basis functions (LCAB) Ansatz are included, with energies, electron density coefficients, and gradients w.r.t. the density. Finally, we provide checkpoints of models trained on said labels, which can be used to run orbital-free density optimization with STRUCTURES25.
Data extraction
The data is stored in tar files for smaller directories and split tar files for the larger directories. The data can be extracted by running the following line.
Set the environment variable DFT_DATA to the created data folder to run training or density optimization.
To delete the downloaded tar files after successful extraction, you can run
in this directory.
Data structure
After extraction, you will have a data folder which contains the following directories:
.
├── QM9_perturbed_fock
│ ├── dataset_statistics
│ ├── labels
│ ├── split.pkl
│ └── split.yaml
├── QMUGS
│ ├── QMUGSBin0.csv
│ ├── QMUGSLargeBins.csv
│ └── labels
├── QMUGSBin0QM9_perturbed_fock
│ ├── dataset_statistics
│ ├── split.pkl
│ └── split.yaml
├── QMUGSBin0_perturbed_fock
│ ├── dataset_statistics
│ ├── split.pkl
│ └── split.yaml
└── QMUGS_perturbed_fock
└── labels
The labels directories contain the actual labels in the LCAB Ansatz, i.e., density coefficients, energies, and gradients of these energies w.r.t. the density. The structure is designed such that QM9_perturbed_fock, QMUGS, and QMUGS_perturbed_fock store all data, from raw to Kohn-Sham to labels. Other directories like QMUGSBin0QM9_perturbed_fock exist to create a split file and the corresponding dataset statistics.
Checkpoints
The checkpoints can be found in models/train/runs. To use them for density optimizations or fine-tunings, set DFT_MODELS to the models folder.
Code
The code can be accessed via zenodo (https://zenodo.org/records/14940433). It includes REPLICATION_GUIDE.md with information on how to replicate our results. The latest version of our functional will be provided at https://github.com/sciai-lab/structures25.
