Skip to main content
Dryad

Annotated dataset of clinical notes for predicting social determinants of mental health in opioid use disorder using a Human-in-the-Loop Large Language Model Interaction for Annotation (HLLIA) framework

Data files

Mar 06, 2026 version files 163.76 KB

Click names to download individual files

Abstract

This dataset comprises 2,636 deidentified discharge summaries from the MIMIC-IV-Note database, annotated for 13 Social Determinants of Mental Health (SDOMH) relevant to Opioid Use Disorder (OUD). The dataset was created to support natural language processing (NLP) and machine learning research aimed at identifying social factors influencing OUD outcomes. Using a Human-in-the-Loop Large Language Model Interaction for Annotation (HLLIA) framework, initial SDOMH labels were generated by GPT-3.5/4 and subsequently refined through expert review, partial-correlation–based validation, and iterative consensus refinement to ensure label consistency and reliability. Each record includes: (1) a subject ID, (2) binary indicators for OUD presence (Hierarchy 1), SDOMH presence (Hierarchy 2), and (3) thirteen binary columns representing specific determinants such as Social Detachment, Financial Uncertainty, Housing Instability, Substance Misuse, Violence, and Suicide Mortality (Hierarchy 3).

The dataset enables hierarchical, multi-label classification of SDOMHs and serves as training data for transformer-based models such as the Multilevel Hierarchical Clinical-Longformer Embeddings (MHCLE) algorithm. Potential reuse includes applications in social and behavioral health informatics, causal inference, clinical decision support, and bias-aware LLM annotation studies.