EMSDialog: Synthetic Multi-person Emergency Medical Service Dialogue Generation from Electronic Patient Care Reports via Multi-LLM Agents

🏛️ Findings of ACL 2026

🌐 Project · 📄 Paper · 🤗 Dataset

Contributions

We propose a scalable, EHR-grounded, multi agent pipeline for synthetic multi-party dialogue generation, ensuring realism and factuality via independent rule-based concept and topic-flow checkers and an iterative critique-and-refine loop.
We introduce EMSDialog, an EMS-specific synthetic dataset of 4,414 realistic multi-party conversations, generated based on a real-world ePCR dataset and annotated with 43 diagnoses, turn level speaker roles and topics. Human expert and LLM-based evaluations show strong quality at both utterance level (realism, safety, role accuracy, groundedness) and conversation level (logical flow, factuality, diversity).
We demonstrate the downstream utility of EMSDialog by training models of different sizes for conversational diagnosis prediction and evaluating them on real-world EMS conversations. Experiments show that EMSDialog-augmented training improves prediction accuracy, timeliness, and stability, and combining synthetic with real data yields the strongest overall performance.

Dataset

🤗 Synthetic EMSDialog Dataset: Link
🩺 Our dataset is also available at github repo data

How to Run the Code

Synthetic Data Generation

python generate.py --model_name_or_path=Qwen/Qwen3-32B --enable_concept_check --enable_topicflow_check --enable_style_check

Conversational Diagnosis Prediction

Static Training

cd ./code/bash
./static_train_4b_ours.sh

Dynamic Training

cd ./code/bash
./dynamic_train_4b_ours.sh

Citation

If you find this work useful, please consider citing our paper.

@misc{ge2026emsdialogsyntheticmultipersonemergency,
      title={EMSDialog: Synthetic Multi-person Emergency Medical Service Dialogue Generation from Electronic Patient Care Reports via Multi-LLM Agents}, 
      author={Xueren Ge and Sahil Murtaza and Anthony Cortez and Homa Alemzadeh},
      year={2026},
      eprint={2604.07549},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.07549}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
code		code
data		data
figure		figure
log/gcs		log/gcs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EMSDialog: Synthetic Multi-person Emergency Medical Service Dialogue Generation from Electronic Patient Care Reports via Multi-LLM Agents

🏛️ Findings of ACL 2026

Contributions

Dataset

How to Run the Code

Synthetic Data Generation

Conversational Diagnosis Prediction

Static Training

Dynamic Training

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EMSDialog: Synthetic Multi-person Emergency Medical Service Dialogue Generation from Electronic Patient Care Reports via Multi-LLM Agents

🏛️ Findings of ACL 2026

Contributions

Dataset

How to Run the Code

Synthetic Data Generation

Conversational Diagnosis Prediction

Static Training

Dynamic Training

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages