We introduce Optimal-Transport-based Unfolding and Simulation (OTUS), a novel, fast simulator based on unsupervised machine-learning that is capable of predicting experimental data from theoretical models. Simulations are crucial in science because they map from theoretical models to experimental data, allowing scientists to test predictions of theoretical models against the reality of experiments. Experimental data is often reconstructed from indirect measurements causing the aggregate transformation from theoretical models to experimental data to be poorly described by analytical methods. Scientists instead rely on ad-hoc, numerical simulations at great computational cost. Capable of learning directly from data, OTUS trains a probabilistic autoencoder to transform directly between theoretical models and experimental data. This is achieved by identifying the probabilistic autoencoder's latent space with the space of theoretical models, causing the decoder network to become a fast, predictive simulator with the potential to replace current, computationally costly simulators. Using particle physics as an illustrative example, we provide proof-of-principle results for Z-boson and top-quark decays, but stress that OTUS can be widely applied to other fields.

The data is divided into two sets corresponding to the experiments in the publication (preprint: https://arxiv.org/abs/2101.08944): FinalData_ppzee.hdf5 and FinalData_ppttbar.hdf5.

The data was generated using Madgraph5 v.2.6.3.2 [1], Pythia v.8.240 [2], Delphes v.3.4.1 [3], and ROOT v.6.08/00 [4]. Relevant run cards can be found with the code repository linked with this dataset.

[1] Johan Alwall et al. MadGraph 5 : Going Beyond. arxiv:1106.0522. 2011. URL: http://arxiv.org/abs/1106.0522.

[2] Torbjorn Sjostrand, Stephen Mrenna, and Peter Z. Skands. “PYTHIA 6.4 Physics and Manual”. In: JHEP 0605 (2006), p. 026. DOI: 10.1088/1126-6708/2006/05/026. arXiv: hep-ph/0603175 [hep-ph].

[3] J. de Favereau et al. “DELPHES 3, A modular framework for fast simulation of a generic collider experiment”. In: JHEP 02 (2014), p. 057. DOI: 10.1007/JHEP02(2014)057. arXiv: 1307.6346 [hep-ex].

[4] R. Brun and F. Rademakers. “ROOT: An object oriented data analysis framework”. In: Nucl. Instrum. Meth. A 389 (1997). Ed. by M. Werlen and D. Perret-Gallix, pp. 81–86. DOI: 10.1016/S0168-9002(97)00048-X

Further detail can be found in attached readme file (readme_data.txt), the publication (preprint: https://arxiv.org/abs/2101.08944), and the code repository linked with this dataset.

Data for: Foundations of a fast, data-driven, machine-learned simulator

Data files

Abstract

Data for: Foundations of a fast, data-driven, machine-learned simulator

Data files

Abstract

Methods

Usage notes

Works referencing this dataset