Maximizing outcomes for preschoolers with developmental language disorders
Data files
Feb 27, 2026 version files 102.35 KB
-
Maximizing_Outcomes_for_Toddlers_with_DLD_Data_Dictionary.csv
10.20 KB
-
Maximizing_Outcomes_for_Toddlers_with_DLD_Data.csv
74.47 KB
-
README.md
6.51 KB
-
U01_Analysis_2-1_Dryad.Rmd
4.30 KB
-
U01_Analysis_2-2_Dryad.Rmd
6.87 KB
Abstract
This dataset contains de-identified baseline and short-term follow-up data from a randomized controlled trial examining the effects of Enhanced Milieu Teaching–Sentence Focused (EMT-SF), an 18-month, multi-phase caregiver-implemented intervention for children at risk for developmental language disorder (DLD) (ClinicalTrials.gov: NCT03782493). The trial enrolled 108 children between 30 and 31 months of age and their caregivers. Children were randomly assigned to either the EMT-SF treatment group or a control condition. Seven participants who did not consent to additional data sharing are excluded from this dataset.
The dataset includes assessments collected at three time points: baseline (T30; child age ~30 months), 6 months post-baseline (T36; child age ~36 months), and 12 months post-baseline (T42; child age ~42 months). Baseline measures include standardized language assessment data from the Preschool Language Scales, Fifth Edition (PLS-5), standardized language sample variables, child–caregiver interaction (CCX) measures, and demographic characteristics. At T36, data include repeated language sample and CCX measures as well as receptive and expressive vocabulary assessments from the Peabody Picture Vocabulary Test, Fifth Edition (PPVT-5) and the Expressive Vocabulary Test, Third Edition (EVT-3). At T42, data include repeated language sample and CCX measures and grammar assessments from the Structured Photographic Expressive Language Test–Preschool, Second Edition (SPELT-P2) and the Test of Early Grammatical Impairment (TEGI).
The primary analyses examined short-term vocabulary outcomes at T36 and grammar outcomes at T42 using structural equation modeling (SEM) with latent variables representing vocabulary and grammar constructs. Results of analyses conducted on this shared dataset may differ slightly from published findings due to the exclusion of participants who did not consent to data sharing.
This dataset enables replication of published analyses and supports secondary analyses examining early vocabulary development, grammar development, caregiver-implemented intervention effects, and longitudinal language trajectories in children at risk for DLD.
Dataset DOI: 10.5061/dryad.sj3tx96g9
Description of the data and file structure
The data included here were gathered from the U01 grant "Maximizing outcomes for preschoolers with developmental language disorders" awarded to Dr. Megan Roberts at Northwestern University.
The data is in longitudinal format. The study included 7 major time points. Each row in the data represents a single participant at a single time point.
All participants included in this data set consented to sharing their data for additional research purposes outside of our lab. Seven participants included in our analysis DID NOT consent to this and therefore are not included in this data. Therefore any replication of our analysis plan will not match our original analysis as the sample is different. R scripts used in our analysis are included for reference.
Data here was stored in REDCap. Any data containing personal identifiable information (name, email, phone, etc.) was marked as such in REDCap and not exported to this dataset. Unverified text and notebox fields were removed from this export. Date and timestamp fields were randomly adjusted by +/- 365 days to further enhance anonymity.
Files and variables
File: Maximizing_Outcomes_for_Toddlers_with_DLD_Data_Dictionary.csv
Description:
Variables
- Variable / Field Name: Name of the variable
- Form Name: Assessment or Survey from which the field was obtained
- Field Type: Text entry, calculation, radio button, yes/no
- Field Label: A description of the field
- Choices or Calculations: Radio button options or calculated field equations
- Field Note: Any additional notes about the field
File: Maximizing_Outcomes_for_Toddlers_with_DLD_Data.csv
Description:
All deidentified REDCap data used in the analysis phase of this study is included in this data set. Please use the data dictionary file for reference.
Code/software
Excel or R
Missing Data and Empty Cells
Empty cells in the dataset represent intentionally missing values. No values have been removed or hidden.
Blank cells occur for the following reasons:
- Not Yet Collected – The measure was not administered at that time point (e.g., vocabulary assessments are not present at baseline; grammar assessments are not present at T36).
- Not Applicable – The measure was not relevant for that participant (e.g., group-specific variables).
- Participant-Level Missingness – The participant did not complete a specific assessment or was unavailable at that time point.
- Data Sharing Restrictions – Seven participants who did not consent to additional data sharing have been removed entirely from this dataset; therefore, no partial records are present for those individuals.
In this dataset, blank cells are intended to be interpreted as missing values (NA). When imported into R using standard functions (e.g., read.csv() or readr::read_csv()), blank cells will automatically be read as NA.
No blank cells represent zero values.
The accompanying analysis scripts assume that blank cells are read as NA. Users working in other statistical software should ensure that blank cells are treated as missing values.
Analysis Scripts Included in Submission
This submission includes two R scripts that reproduce the primary short-term outcome analyses reported in the associated manuscript.
1. U01_Analysis_2-1_Dryad.Rmd
This script evaluates short-term vocabulary outcomes at 6 months post-baseline (T36; child age ~36 months).
- Outcome: Latent variable representing Short-Term Vocabulary
- Indicators:
- Expressive Vocabulary Test–Third Edition (EVT-3)
- Peabody Picture Vocabulary Test–Fifth Edition (PPVT-5)
- Independent variable:
- Treatment group (EMT-SF vs. control)
- Covariate:
- Baseline number of different words (NDW) from language sample
The latent variable model is estimated using structural equation modeling (SEM).
2. U01_Analysis_2-2_Dryad.Rmd
This script evaluates short-term grammar outcomes at 12 months post-baseline (T42; child age ~42 months).
- Outcome: Latent variable representing Short-Term Grammar
- Indicators:
- Structured Photographic Expressive Language Test–Preschool, Second Edition (SPELT-P2)
- Test of Early Grammatical Impairment (TEGI)
- Independent variable:
- Treatment group (EMT-SF vs. control)
- Covariate:
- Baseline language sample score
The latent variable model is estimated using structural equation modeling (SEM).
Missing Data Handling in Analysis Scripts
Both scripts assume that blank cells in the dataset are read as missing values (NA) in R. Missing data are handled within the SEM framework using full information maximum likelihood (FIML), as specified in the analysis code.
Blank cells in the dataset represent:
- Structurally missing values (measure not administered at that time point)
- Participant-level missingness (assessment not completed)
- Not applicable values (measure not relevant for that participant)
Blank cells do not represent zero values.
Users working in other statistical software should ensure that blank cells are interpreted as missing values.
Sample Size Note
Seven participants who did not consent to additional data sharing are excluded from the shared dataset. As a result, reproduced estimates may differ slightly from those reported in the published manuscript.
Software Requirements
The scripts were written in R (version 4.4.1) and require the following packages:
- lavaan
- tidyverse (or specify exact packages used)
Users should ensure required packages are installed prior to running the scripts.
Human subjects data
All participants included in this data set consented to sharing their data for additional research purposes outside of our lab. Seven participants included in our analysis DID NOT consent to this and therefore are not included in this data. Therefore any replication of our analysis plan will not match our original analysis as the sample is different.
Data here was stored in REDCap. Any data containing personal identifiable information (name, email, phone, etc.) was marked as such in REDCap and not exported to this dataset. Unverified text and notebox fields were removed from this export. Date and timestamp fields were randomly adjusted by +/- 365 days to further enhance anonymity.
