Effectiveness of screening and ultra-brief intervention for hazardous drinking in primary care: pragmatic cluster randomised controlled trial
Data files
Sep 26, 2025 version files 616.30 KB
-
Baseline.xlsx
81.81 KB
-
CONSORT_flowchart.r
1.33 KB
-
data_EASY_cluster_RCT_preprocessed.csv
248.34 KB
-
data_EASY_cluster_RCT_preprocessed.RData
15.38 KB
-
RawData.csv
193.63 KB
-
README.md
13.48 KB
-
Scr_Select.csv
13.29 KB
-
stable1.r
2.24 KB
-
table1.r
1.54 KB
-
table2_and_4.sas
33.52 KB
-
table2_sensitivity_analyses.r
7.42 KB
-
table3.r
2.42 KB
-
table5.r
1.88 KB
Abstract
Context
Hazardous drinking affects about one in five primary‑care patients. The EASY study compared a ≤1‑minute ultra‑brief intervention (Ultra‑BI) with simplified assessment only (SAO) across 40 primary care clinics in Japan.
Objective
To provide open access to datasets and SAS / R scripts needed to reproduce the published analyses and support future studies including independent participant data meta-analyses.
Datasets and scripts description
-
RawData.csv – unprocessed participant-level data at baseline, 12 weeks, and 24 weeks
- Baseline.xlsx –unprocessed participant-level data at baseline for SAS
-
data_EASY_cluster_RCT_preprocessed.csv / .RData – processed versions of RawData.csv
- Baseline.xlsx –unprocessed participant-level data at baseline for SAS
-
Scr_Select.csv – flag indicating participants recruited at clinics that restricted screening to patients suspected of hazardous drinking
-
Six analysis scripts and one CONSORT-diagram script
Key variables
participant id, clinic id, allocation group, AUDIT-C score (0–12), total alcohol consumption (grams per 4 weeks), WHO drinking risk level (DRL), readiness-to-change score, sex, five-year age band, comorbidities, smoking status, visit type, and participant reported recognition of receiving advice.
Reuse potential
The dataset and scripts enables full replication of the published analyses, and pooled effect estimates in individual participant data meta-analysis.
Ethical consideration
Public data sharing was conducted under IRB-approved written informed consent and public disclosure, with participants given the opportunity to decline data sharing. All data were de-identified prior to release, including the removal of direct identifiers and the aggregation or masking of potentially re-identifiable information.
Dataset DOI: 10.5061/dryad.866t1g22m
Dataset abstract
This dataset contains de-identified, participant-level records from 1,136 adults (20–74 years) screened for hazardous drinking across 40 primary-care clinics in Okayama, Hyogo, Osaka, and Hiroshima prefectures, Japan, between 29 June and 7 August 2023. Data are provided as rectangular tables in CSV, XLSX, and RData formats capturing baseline demographics, medical histories, readiness-to-change scales (1–4 ordinal items), AUDIT-C sub-scores (0–4) and total scores (0–12), and ethanol consumption amounts expressed in grams per 4-week period at baseline, 12 weeks, and 24 weeks. The deposit also includes a preprocessed analysis-ready table with derived indicators (e.g., per-protocol flag, intervention receipt percentage, patient-reported advice) and R scripts/SAS programs used to reproduce the trial tables.
The files support reuse for evaluating alcohol screening and ultra-brief interventions, examining behavioural readiness trajectories, benchmarking cluster-randomised trial analyses, and developing replication or secondary analyses. All direct identifiers have been removed, participant and facility IDs are pseudonymised, and age is supplied as bands or integers with no dates, aligning with Dryad's human-subject data guidelines and the informed-consent provisions for public data sharing. Users should note that missing values appear as blank cells (import as NA in R) and that binary indicators consistently use 0 = No / 1 = Yes unless otherwise specified.
Description of the data and file structure
We collected these data during a two-arm cluster randomised controlled trial that evaluated an ultra-brief alcohol intervention (Ultra-BI) versus simplified assessment only (SAO) in routine primary-care settings. Forty outpatient clinics from urban, suburban, and rural areas of Okayama, Hyogo, Osaka, and Hiroshima prefectures were invited and randomly allocated (block design, computer-generated sequence) before recruitment began. Between 29 June and 7 August 2023, reception staff consecutively screened all attending patients aged 20–74 years for basic eligibility, obtained written informed consent, and administered a baseline questionnaire that included the AUDIT-C. Patients meeting hazardous-drinking thresholds received the Ultra-BI immediately or usual care, according to cluster assignment. Follow-up questionnaires capturing alcohol consumption and readiness to change lifestyle behaviours were distributed at 12 and 24 weeks by post (paper or QR-linked web form) with SMS reminders; non-responders were contacted by an independent survey company, which also double-entered and validated all screening and follow-up data.
Dataset descriptions
Unless noted otherwise, missing values in the CSV files are encoded as . (period). When importing to analysis software, treat . as NA and cast numeric fields accordingly. Blank cells in the XLSX workbook represent missing data. All participant and clinic identifiers are pseudonymised and consistent across files where the column exists.
RawData.csv (1136 rows × 22 columns)
Participant-level baseline and follow-up responses exported from the trial database in the original coding used for monitoring.
id— integer pseudonymous participant identifier.w0_PerProtocol— string;'1'marks participants meeting the pre-specified per-protocol criteria (1130 rows),'.'indicates not assessed/not in the per-protocol set (3 rows).w0_Sex— integer;1 = Male,2 = Female.w0_VisitHistory— categorical text;First visit,Routine appointments, orVisit as needed;'.'= not recorded.w0_ConsideringDietChange— categorical text with readiness optionsNo Improvement Needed,No intention to improve,Intending to improve,Already working on improvement;'.'= missing.w0_ConsideringSmokingChange— categorical text with optionsNever smoked,Smoked but quit,Intending to improve,No intention to improve;'.'= missing.w0_AUDIT1— integer 1–4; Alcohol Use Disorders Identification Test (AUDIT-C) item 1 (drinking frequency).w0_AUDIT2— string digits0–4; AUDIT-C item 2 (usual quantity). Convert to integer after treating'.'as missing.w0_AUDIT3— string digits0–4; AUDIT-C item 3 (frequency of heavy drinking).'.'= missing.w0_ConsideringDrinkingChange— ordinal code1–5:1 = No Improvement Needed,2 = No intention to improve,3 = Interested but no intention to improve,4 = Intending to improve,5 = Already working on improvement;'.'= missing.w0_Allocation— integer cluster assignment (0 = Simplified assessment only (SAO),1 = Ultra-brief intervention (Ultra-BI)).FacilityID— integer string1–40; pseudonymised clinic identifier.w0_DrinkingAmountPer4weeks— numeric stored as string; ethanol consumption in grams per 4-week period at baseline ('.'= missing).w0_SmokingStatus— integer string;1 = Never smoked,2 = Smoked but quit,3 = Current smoker,'.'= missing.w12_ConsideringDietChange— same categories asw0_ConsideringDietChange;'.'= missing.w12_ConsideringSmokingChange— same categories asw0_ConsideringSmokingChange;'.'= missing.w12_ConsideringDrinkingChange— ordinal code1 = No Improvement Needed,2 = No intention to improve,3 = Intending to improve,4 = Already working on improvement;'.'= missing.w12_DrinkingAmountPer4weeks— numeric stored as string; grams of ethanol per 4 weeks at 12-week follow-up ('.'= missing,'0'indicates abstinent).w24_ConsideringDietChange— same categories asw0_ConsideringDietChange;'.'= missing.w24_ConsideringSmokingChange— same categories asw0_ConsideringSmokingChange;'.'= missing.w24_ConsideringDrinkingChange— ordinal code with the same mapping as at 12 weeks (1 = No Improvement Needed…4 = Already working on improvement);'.'= missing.w24_DrinkingAmountPer4weeks— numeric stored as string; grams of ethanol per 4 weeks at 24-week follow-up ('.'= missing).
data_EASY_cluster_RCT_preprocessed.csv (1133 rows × 24 columns)
Analysis-ready participant-level dataset restricted to the per-protocol set and enhanced with derived variables. Values are cleaned and typed for immediate use.
w0_PerProtocol— numeric;1for the 1130 participants in the per-protocol set,NAfor the 3 participants excluded from that set.w0_Sex— categorical text;MaleorFemale.w0_Age— character string five-year band (20–29,30–39, …,70–74).w0_VisitHistory— categorical text;First visit,Routine appointments,Visit as needed, or missing (NA).w0_PH_Hypertension— boolean; history of hypertension (TRUE/FALSE).w0_PH_Diabetes— boolean; history of diabetes.w0_PH_Gout— boolean; history of gout.w0_PH_Dyslipidemia— boolean; history of dyslipidaemia.w0_PH_LiverDisease— boolean; history of liver disease.w0_PH_DigestiveDisease— boolean; history of digestive disease.w0_ConsideringDrinkingChange— categorical text;No Improvement Needed,No intention to improve,Interested but no intention to improve,Intending to improve,Already working on improvement.w0_Allocation— numeric;0 = SAO,1 = Ultra-BI.FacilityID— character string1–40; pseudonymised clinic ID (matchingRawData.csv).w0_AgeCont— numeric; exact age in years (includes .5 where ages were rounded to half-years).w0_AUDIT_c— numeric; AUDIT-C total score (0–12).w0_DrinkingAmountPer4weeks— numeric; grams of ethanol per 4 weeks at baseline.w0_SmokingStatus— ordered factor;Never smoked,Smoked but quit,Smoking.invited— categorical text describing clinic invitation coverage; values areInvited <50% of eligible patients,Invited about 60% of eligible patients,Invited about 70% of eligible patients,Invited about 80% of eligible patients,Invited about 90% of eligible patients,Invited ~100% of eligible patients, orSelected only patients likely to drink heavily.w12_ConsideringDrinkingChange— categorical text;No Improvement Needed,No intention to improve,Intending to improve,Already working on improvement.w12_DrinkingAmountPer4weeks— numeric; grams of ethanol per 4 weeks at 12-week follow-up.w24_ConsideringDrinkingChange— categorical text; same scale as at 12 weeks.w24_DrinkingAmountPer4weeks— numeric; grams of ethanol per 4 weeks at 24-week follow-up.Received_Percentage— numeric; within-clinic percentage (0–100) of participants who reported receiving the intervention (calculated for intervention clinics only; missing for control clinics).Received_patient_report— boolean;TRUEif the participant reported receiving counselling/advice, otherwiseFALSEor missing.
data_EASY_cluster_RCT_preprocessed.RData
R binary file containing a tibble named df with the same 1133 rows and 24 columns as data_EASY_cluster_RCT_preprocessed.csv. Variable types follow the descriptors above.
Baseline.xlsx (1136 rows × 18 columns)
Excel workbook with baseline questionnaire data in human-readable labels (all categorical responses are now supplied in English). String categories correspond to the coded variables in RawData.csv.
id— pseudonymous participant identifier (matchesRawData.csv).Sex—Male/Female.Age— decade band (20's,30's, …,70's).VisitHistory— clinic attendance description:First visit,Routine appointments, orVisit as needed.PH_Hypertension— logical; history of hypertension (TRUE/FALSE).PH_Diabetes— logical; history of diabetes.PH_Gout— logical; history of gout.PH_Dyslipidemia— logical; history of dyslipidaemia.PH_LiverDisease— logical; history of liver disease.PH_DigestiveDisease— logical; history of digestive disease.ConsideringDietChange— readiness statements:No Improvement Needed,No intention to improve,Intending to improve,Already working on improvement.ConsideringSmokingChange— readiness statements:Never smoked,Smoked but quit,No intention to improve,Intending to improve.AUDIT1— integer 0–4; AUDIT-C item 1.AUDIT2— integer 0–4; AUDIT-C item 2.AUDIT3— integer 0–4; AUDIT-C item 3.ConsideringDrinkingChange— readiness statements aligned with the numeric codes inRawData.csv(No Improvement Needed,No intention to improve,Interested but no intention to improve,Intending to improve,Already working on improvement).Allocation— integer (0 = SAO,1 = Ultra-BI).FacilityID— integer 1–40; pseudonymised clinic identifier.
Scr_Select.csv (1133 rows × 2 columns)
Link table flagging clinics that did not invite every eligible patient.
id— pseudonymous participant identifier (matchesRawData.csvandBaseline.xlsx).w0_biased_invitation— boolean;TRUEwhen the clinic reported selectively screening only patients suspected of hazardous drinking,FALSEfor universal screening invitation.
Abbreviations
- AUDIT-C: Alcohol Use Disorders Identification Test, consumption items
- WHO DRL: WHO Drinking-Risk Level (Low / Medium / High / Very high)
- Ultra-BI: ultra-brief intervention
- SAO: Simplified assessment only
Code/software
table1.r
This script aggregates data for creating Table 1 (Baseline characteristics per participant by allocation group) of the paper.
stable1.r
This script aggregates data for creating sTable 1 (Baseline characteristics per cluster by allocation group) of the paper.
table2_and_4.sas
This script estimates the Local Average Treatment Effect (LATE) for creating Table 2 (Baseline characteristics per cluster by allocation group) of the paper.
table2_sensitivity_analyses.r
This script estimates the Local Average Treatment Effect (LATE) for creating Table 2 (Baseline characteristics per cluster by allocation group) of the paper.
calculate_effect_sizes.r
This script calculates Hedges’ g for outcome variables presented in Table 2 and 4.
table3.r
This script aggregates data for creating Table 3 (Proportion of WHO drinking risk level at follow-ups) of the paper.
table5.r
This script aggregates data for creating Table 5 ( Proportion of readiness to change drinking habits by category) of the paper.
CONSORT_flowchart.r
This script aggregates data for creating Figure 2 (CONSORT flow diagram of clusters and patients through the trial) of the paper.
Access information
Other publicly accessible locations of the data:
- N/A
Data was derived from the following sources:
- N/A
Human subjects data
Public data sharing was conducted under IRB-approved written informed consent and public disclosure, with participants given the opportunity to decline data sharing. All data were de-identified prior to release, including the removal of direct identifiers and the aggregation or masking of potentially re-identifiable information.
