Benchmarking speech-to-text robustness in noisy emergency medical dialogues: An evaluation of stt-models under realistic acoustic conditions

Moser, Denis 1 ; Stanic, Nikola1; Sariyar, Murat1

Published Dec 03, 2025 on Dryad. https://doi.org/10.5061/dryad.gtht76j1b

Data files

Dec 03, 2025 version files 1.33 MB

README.md

3.36 KB
synthetic_dialogs.json

1.32 MB

Abstract

This dataset provides a representative JSON file containing 99 fully synthetic German emergency medical service (EMS) dialogue texts used as ground-truth material in our benchmarking study on speech-to-text (STT) robustness. Each dialogue represents a prehospital scenario with conversational exchanges between EMS personnel and patients, and includes synthetic clinical information (e.g., diagnoses, medications, and vital signs).

In the associated study, these texts served as the basis for generating synthetic audio samples with text-to-speech systems and for evaluating multiple STT models under controlled noise conditions. Only the underlying dialogue texts are included here, as they form the necessary foundation for reproducing the audio corpus or for conducting related benchmarking tasks.

All contents are fully synthetic, contain no personal or sensitive information, and may be used freely for research and methodological development.

Dataset DOI: 10.5061/dryad.gtht76j1b

Description of the data and file structure

1. Overview

This dataset contains 99 fully synthetic German emergency medical service (EMS) dialogues, provided in:

synthetic_dialogs.json

Each entry represents one complete text-based EMS dialogue simulating a prehospital scenario involving interactions between EMS personnel and patients. These dialogues served as the ground-truth basis for audio synthesis and speech-to-text (STT) evaluation in our study.

2. How the dialogues were generated

All dialogues were originally created for research purposes as part of an earlier study

(Moser, Bender & Sariyar, 2025; DOI: https://doi.org/10.1080/08839514.2025.2519169).

The generation process consisted of the following steps:

Step 1 — Synthetic EMS case construction

Structured EMS attributes (e.g., diagnosis, medications, vital signs) were derived from MIMIC-IV EMS data.
Using an adapted Post-Randomization Method (PRAM), all structured fields were randomized or swapped to generate fully synthetic case descriptions, ensuring that no original MIMIC-IV patient values were retained.

Step 2 — Prompt-based dialogue generation

The synthetic case attributes were inserted into prompt templates.
Large-language models (local and OpenAI API-based variants) were instructed to generate coherent German EMS operation dialogues based on these synthetic case descriptions.

3. Dialogue structure

Each dialogue represents a continuous EMS operation narrative covering following phases:

Arrival on scene and initial patient contact
Triage and early vital-sign assessment
Medication administration and follow-up assessment

Within these dialogues, elements such as diagnoses, vital-sign measurements, and administered medications are included as part of the synthetic clinical context.

In the JSON file, these phases are combined into a single continuous text for each dialogue.

4. Data format

The dataset is provided as a single JSON array with 99 entries.

Each entry has the following structure:

{"convoID": "30394981",

"srcText": "Michael Bauer: \"Julia, wir haben einen 45-jährigen Mann mit Übelkeit,..."}

- convoID uniquely identifies the dialogue.

- srcText contains the full, continuous synthetic EMS dialogue.

5. Relation to earlier work

The methodological foundations of the dialogue-generation pipeline are described in:

Moser D, Bender M, Sariyar M. (2025).

A pipeline for automating emergency medicine documentation using LLMs with retrieval-augmented text generation.

Applied Artificial Intelligence.

DOI: https://doi.org/10.1080/08839514.2025.2519169

6. Reproducibility

Audio synthesis and STT evaluation can be reproduced using the generation code available at:

https://github.com/denMo24/stt-emergency-benchmark

(This external repository is not part of the Dryad submission.)