Skip to main content

Data for: Foundations of a fast, data-driven, machine-learned simulator

Cite this dataset

Howard, Jessica N.; Mandt, Stephan; Whiteson, Daniel; Yang, Yibo (2021). Data for: Foundations of a fast, data-driven, machine-learned simulator [Dataset]. Dryad.


We introduce Optimal-Transport-based Unfolding and Simulation (OTUS), a novel, fast simulator based on unsupervised machine-learning that is capable of predicting experimental data from theoretical models. Simulations are crucial in science because they map from theoretical models to experimental data, allowing scientists to test predictions of theoretical models against the reality of experiments.  Experimental data is often reconstructed from indirect measurements causing the aggregate transformation from theoretical models to experimental data to be poorly described by analytical methods. Scientists instead rely on ad-hoc, numerical simulations at great computational cost. Capable of learning directly from data, OTUS trains a probabilistic autoencoder to transform directly between theoretical models and experimental data. This is achieved by identifying the probabilistic autoencoder's latent space with the space of theoretical models, causing the decoder network to become a fast, predictive simulator with the potential to replace current, computationally costly simulators. Using particle physics as an illustrative example, we provide proof-of-principle results for Z-boson and top-quark decays, but stress that OTUS can be widely applied to other fields.


The data is divided into two sets corresponding to the experiments in the publication (preprint: FinalData_ppzee.hdf5 and FinalData_ppttbar.hdf5.

The data was generated using Madgraph5 v. [1], Pythia v.8.240 [2], Delphes v.3.4.1 [3], and ROOT v.6.08/00 [4]. Relevant run cards can be found with the code repository linked with this dataset.

[1] Johan Alwall et al. MadGraph 5 : Going Beyond. arxiv:1106.0522. 2011. URL:

[2] Torbjorn Sjostrand, Stephen Mrenna, and Peter Z. Skands. “PYTHIA 6.4 Physics and Manual”. In: JHEP 0605 (2006), p. 026. DOI: 10.1088/1126-6708/2006/05/026. arXiv: hep-ph/0603175 [hep-ph].

[3] J. de Favereau et al. “DELPHES 3, A modular framework for fast simulation of a generic collider experiment”. In: JHEP 02 (2014), p. 057. DOI: 10.1007/JHEP02(2014)057. arXiv: 1307.6346 [hep-ex].

[4] R. Brun and F. Rademakers. “ROOT: An object oriented data analysis framework”. In: Nucl. Instrum. Meth. A 389 (1997). Ed. by M. Werlen and D. Perret-Gallix, pp. 81–86. DOI: 10.1016/S0168-9002(97)00048-X

Usage notes

Further detail can be found in attached readme file (readme_data.txt), the publication (preprint:, and the code repository linked with this dataset.


National Science Foundation, Award: DGE-1633631

National Science Foundation, Award: DGE-1839285

Office of Science, Award: DE-SC0009920

Hasso Plattner Foundation

National Science Foundation, Award: 1928718

National Science Foundation, Award: 2003237

National Science Foundation, Award: 2007719

Intel (United States)

Qualcomm (United States)

Defense Advanced Research Projects Agency, Award: HR001120C0021