Maximizing outcomes for preschoolers with developmental language disorders

Grauzer, Jeffrey 1 ; Roberts, Megan1 ; Jones, Maranda1

Research facility: Northwestern University

Published Feb 27, 2026 on Dryad. https://doi.org/10.5061/dryad.sj3tx96g9

Data files

Feb 27, 2026 version files 102.35 KB

Maximizing_Outcomes_for_Toddlers_with_DLD_Data_Dictionary.csv

10.20 KB
Maximizing_Outcomes_for_Toddlers_with_DLD_Data.csv

74.47 KB
README.md

6.51 KB
U01_Analysis_2-1_Dryad.Rmd

4.30 KB
U01_Analysis_2-2_Dryad.Rmd

6.87 KB

Abstract

This dataset contains de-identified baseline and short-term follow-up data from a randomized controlled trial examining the effects of Enhanced Milieu Teaching–Sentence Focused (EMT-SF), an 18-month, multi-phase caregiver-implemented intervention for children at risk for developmental language disorder (DLD) (ClinicalTrials.gov: NCT03782493). The trial enrolled 108 children between 30 and 31 months of age and their caregivers. Children were randomly assigned to either the EMT-SF treatment group or a control condition. Seven participants who did not consent to additional data sharing are excluded from this dataset.

The dataset includes assessments collected at three time points: baseline (T30; child age ~30 months), 6 months post-baseline (T36; child age ~36 months), and 12 months post-baseline (T42; child age ~42 months). Baseline measures include standardized language assessment data from the Preschool Language Scales, Fifth Edition (PLS-5), standardized language sample variables, child–caregiver interaction (CCX) measures, and demographic characteristics. At T36, data include repeated language sample and CCX measures as well as receptive and expressive vocabulary assessments from the Peabody Picture Vocabulary Test, Fifth Edition (PPVT-5) and the Expressive Vocabulary Test, Third Edition (EVT-3). At T42, data include repeated language sample and CCX measures and grammar assessments from the Structured Photographic Expressive Language Test–Preschool, Second Edition (SPELT-P2) and the Test of Early Grammatical Impairment (TEGI).

The primary analyses examined short-term vocabulary outcomes at T36 and grammar outcomes at T42 using structural equation modeling (SEM) with latent variables representing vocabulary and grammar constructs. Results of analyses conducted on this shared dataset may differ slightly from published findings due to the exclusion of participants who did not consent to data sharing.

This dataset enables replication of published analyses and supports secondary analyses examining early vocabulary development, grammar development, caregiver-implemented intervention effects, and longitudinal language trajectories in children at risk for DLD.

Dataset DOI: 10.5061/dryad.sj3tx96g9

Description of the data and file structure

The data included here were gathered from the U01 grant "Maximizing outcomes for preschoolers with developmental language disorders" awarded to Dr. Megan Roberts at Northwestern University.

The data is in longitudinal format. The study included 7 major time points. Each row in the data represents a single participant at a single time point.

All participants included in this data set consented to sharing their data for additional research purposes outside of our lab. Seven participants included in our analysis DID NOT consent to this and therefore are not included in this data. Therefore any replication of our analysis plan will not match our original analysis as the sample is different. R scripts used in our analysis are included for reference.

Data here was stored in REDCap. Any data containing personal identifiable information (name, email, phone, etc.) was marked as such in REDCap and not exported to this dataset. Unverified text and notebox fields were removed from this export. Date and timestamp fields were randomly adjusted by +/- 365 days to further enhance anonymity.

Files and variables

File: Maximizing_Outcomes_for_Toddlers_with_DLD_Data_Dictionary.csv

Description:

Variables

Variable / Field Name: Name of the variable
Form Name: Assessment or Survey from which the field was obtained
Field Type: Text entry, calculation, radio button, yes/no
Field Label: A description of the field
Choices or Calculations: Radio button options or calculated field equations
Field Note: Any additional notes about the field

File: Maximizing_Outcomes_for_Toddlers_with_DLD_Data.csv

Description:

All deidentified REDCap data used in the analysis phase of this study is included in this data set. Please use the data dictionary file for reference.

Code/software

Excel or R

Missing Data and Empty Cells

Empty cells in the dataset represent intentionally missing values. No values have been removed or hidden.

Blank cells occur for the following reasons:

Not Yet Collected – The measure was not administered at that time point (e.g., vocabulary assessments are not present at baseline; grammar assessments are not present at T36).
Not Applicable – The measure was not relevant for that participant (e.g., group-specific variables).
Participant-Level Missingness – The participant did not complete a specific assessment or was unavailable at that time point.
Data Sharing Restrictions – Seven participants who did not consent to additional data sharing have been removed entirely from this dataset; therefore, no partial records are present for those individuals.

In this dataset, blank cells are intended to be interpreted as missing values (NA). When imported into R using standard functions (e.g., read.csv() or readr::read_csv()), blank cells will automatically be read as NA.

No blank cells represent zero values.

The accompanying analysis scripts assume that blank cells are read as NA. Users working in other statistical software should ensure that blank cells are treated as missing values.

Analysis Scripts Included in Submission

This submission includes two R scripts that reproduce the primary short-term outcome analyses reported in the associated manuscript.

1. `U01_Analysis_2-1_Dryad.Rmd`

This script evaluates short-term vocabulary outcomes at 6 months post-baseline (T36; child age ~36 months).

Outcome: Latent variable representing Short-Term Vocabulary
Indicators:
- Expressive Vocabulary Test–Third Edition (EVT-3)
- Peabody Picture Vocabulary Test–Fifth Edition (PPVT-5)
Independent variable:
- Treatment group (EMT-SF vs. control)
Covariate:
- Baseline number of different words (NDW) from language sample

The latent variable model is estimated using structural equation modeling (SEM).

2. `U01_Analysis_2-2_Dryad.Rmd`

This script evaluates short-term grammar outcomes at 12 months post-baseline (T42; child age ~42 months).

Outcome: Latent variable representing Short-Term Grammar
Indicators:
- Structured Photographic Expressive Language Test–Preschool, Second Edition (SPELT-P2)
- Test of Early Grammatical Impairment (TEGI)
Independent variable:
- Treatment group (EMT-SF vs. control)
Covariate:
- Baseline language sample score

The latent variable model is estimated using structural equation modeling (SEM).

Missing Data Handling in Analysis Scripts

Both scripts assume that blank cells in the dataset are read as missing values (NA) in R. Missing data are handled within the SEM framework using full information maximum likelihood (FIML), as specified in the analysis code.

Blank cells in the dataset represent:

Structurally missing values (measure not administered at that time point)
Participant-level missingness (assessment not completed)
Not applicable values (measure not relevant for that participant)

Blank cells do not represent zero values.

Users working in other statistical software should ensure that blank cells are interpreted as missing values.

Sample Size Note

Seven participants who did not consent to additional data sharing are excluded from the shared dataset. As a result, reproduced estimates may differ slightly from those reported in the published manuscript.

Software Requirements

The scripts were written in R (version 4.4.1) and require the following packages:

lavaan
tidyverse (or specify exact packages used)

Users should ensure required packages are installed prior to running the scripts.