Skip to main content

Expository and narrative discourse summary statistics and demographic information for adolescents with and without traumatic brain injury

Cite this dataset

Collins, Gavin; Lundine, Jennifer; Kaizar, Eloise (2021). Expository and narrative discourse summary statistics and demographic information for adolescents with and without traumatic brain injury [Dataset]. Dryad.


Purpose: Generalized linear mixed models (GLMM) and Bayesian methods together provide a framework capable of handling a wide variety of complex data commonly encountered across the communication sciences. Using language sample analysis (LSA), we demonstrate the utility of these methods in answering specific questions regarding the differences between discourse patterns of children who have experienced a traumatic brain injury (TBI), as compared to those with typical development (TD).

Methods: Language samples were collected from 55 adolescents ages 13-18, five of whom had experienced a TBI. We describe parameters relating to the productivity, syntactic complexity, and lexical diversity of language samples. A Bayesian GLMM is developed for each parameter of interest, relating these parameters to age, sex, prior history (TBI or TD), and socioeconomic status, as well as the type of discourse sample (compare-contrast, cause-effect, or narrative). Statistical models are thoroughly described.

Results: Comparing the discourse of adolescents with TBI to those with TD, substantial differences are detected in productivity and lexical diversity, while differences in syntactic complexity are more moderate. Females exhibited greater syntactic complexity, while males exhibited greater productivity and lexical diversity. Generally, our models suggest more advanced discourse among adolescents who are older or who have indicators of higher socioeconomic status. Differences relating to lecture type were also detected.

Conclusions: Bayesian and GLMM methods yield more informative and intuitive results than traditional statistical analyses, with a greater degree of confidence in model assumptions. We recommend that these methods be used more widely in LSA.


55 subjects, age 13-18 years, 52% female, of varying socioeconomic status (SES; measured using data from a census-track) verbally summarized three short lectures about a fictional nation called “Lifeland.” One of the lectures was compare-contrast (CC), one was cause-effect (CE), and the other was a story, i.e. a narrative (N). The lectures were presented in a random order for each subject, via video on a computer monitor. The subjects summarized the lectures immediately after viewing each one. Each lecture was written at about the same reading level, and contained about the same number of main/supporting ideas, sentences, and words. Each lecture was read in front of the same neutral background, by the same speaker. The CC and CE lectures were of an expository nature, while the N lecture was narrative. Five adolescents had experienced a traumatic brain injury (TBI) meeting certain specifications, while the remaining 50 subjects had a history of typical development (TD).

The audio and video of each discourse sample was recorded, and later transcribed using Systematic Analysis of Language Transcripts (SALT) software. SALT was also used to record the microstructural discourse statistics of interest. Each discourse summary contained a number of utterances; each utterance consists of an independent clause and its accompanying dependent clauses; and each clause is made up of a number of words. With this structure in mind, for each discourse (CC, CE, N) delivered by each participant (1, …, 55), we recorded four statistics that together provide an informative picture of the discourse as a whole: the total number of utterances (U), clauses (C), words (W), and distinct words (D) spoken by each participant, giving each lecture. Proper checking was done to ensure accurate transcription and recording of relevant variables, as described in the corresponding manuscript. 

Usage notes

The dataset is a .csv file consisting of 10 columns: "subject" is the number assigned to each of the adolescents in the study, 1,...,55; "lecture_type" is either cc (compare-contrast), ce (cause-effect), or n (narrative), and each of the 55 subjects have a row for each lecture type; "development_type" is collected at the subject level, and is either TD (typically developing) or TBI (traumatic brain injury); "sex," (Male/Female) "age," (13-19) and "ses" (a summary of socioeconomic status; a standardized "z-value") are also collected at the subject level; "U" (>=1) is the total number of utterances in the discourse; "C" (>=U) is the total number of clauses in the discourse; "W" (>=C) is the total number of words in the discourse; and "D" (<=W) is the total number of distinct words in the discourse.


The Ohio State University, Award: Alumni Grant for Graduate Research and Scholarship

The Ohio State University, Award: Laboratory Start-up Seed Grant