Benchmarking speech-to-text robustness in noisy emergency medical dialogues: An evaluation of stt-models under realistic acoustic conditions
Data files
Dec 03, 2025 version files 1.33 MB
-
README.md
3.36 KB
-
synthetic_dialogs.json
1.32 MB
Abstract
This dataset provides a representative JSON file containing 99 fully synthetic German emergency medical service (EMS) dialogue texts used as ground-truth material in our benchmarking study on speech-to-text (STT) robustness. Each dialogue represents a prehospital scenario with conversational exchanges between EMS personnel and patients, and includes synthetic clinical information (e.g., diagnoses, medications, and vital signs).
In the associated study, these texts served as the basis for generating synthetic audio samples with text-to-speech systems and for evaluating multiple STT models under controlled noise conditions. Only the underlying dialogue texts are included here, as they form the necessary foundation for reproducing the audio corpus or for conducting related benchmarking tasks.
All contents are fully synthetic, contain no personal or sensitive information, and may be used freely for research and methodological development.
Dataset DOI: 10.5061/dryad.gtht76j1b
Description of the data and file structure
1. Overview
This dataset contains 99 fully synthetic German emergency medical service (EMS) dialogues, provided in:
synthetic_dialogs.json
Each entry represents one complete text-based EMS dialogue simulating a prehospital scenario involving interactions between EMS personnel and patients. These dialogues served as the ground-truth basis for audio synthesis and speech-to-text (STT) evaluation in our study.
2. How the dialogues were generated
All dialogues were originally created for research purposes as part of an earlier study
(Moser, Bender & Sariyar, 2025; DOI: https://doi.org/10.1080/08839514.2025.2519169).
The generation process consisted of the following steps:
Step 1 — Synthetic EMS case construction
- Structured EMS attributes (e.g., diagnosis, medications, vital signs) were derived from MIMIC-IV EMS data.
- Using an adapted Post-Randomization Method (PRAM), all structured fields were randomized or swapped to generate fully synthetic case descriptions, ensuring that no original MIMIC-IV patient values were retained.
Step 2 — Prompt-based dialogue generation
- The synthetic case attributes were inserted into prompt templates.
- Large-language models (local and OpenAI API-based variants) were instructed to generate coherent German EMS operation dialogues based on these synthetic case descriptions.
3. Dialogue structure
Each dialogue represents a continuous EMS operation narrative covering following phases:
- Arrival on scene and initial patient contact
- Triage and early vital-sign assessment
- Medication administration and follow-up assessment
Within these dialogues, elements such as diagnoses, vital-sign measurements, and administered medications are included as part of the synthetic clinical context.
In the JSON file, these phases are combined into a single continuous text for each dialogue.
4. Data format
The dataset is provided as a single JSON array with 99 entries.
Each entry has the following structure:
{"convoID": "30394981",
"srcText": "Michael Bauer: \"Julia, wir haben einen 45-jährigen Mann mit Übelkeit,..."}
- convoID uniquely identifies the dialogue.
- srcText contains the full, continuous synthetic EMS dialogue.
5. Relation to earlier work
The methodological foundations of the dialogue-generation pipeline are described in:
Moser D, Bender M, Sariyar M. (2025).
A pipeline for automating emergency medicine documentation using LLMs with retrieval-augmented text generation.
Applied Artificial Intelligence.
DOI: https://doi.org/10.1080/08839514.2025.2519169
6. Reproducibility
Audio synthesis and STT evaluation can be reproduced using the generation code available at:
https://github.com/denMo24/stt-emergency-benchmark
(This external repository is not part of the Dryad submission.)
