Skip to main content
Dryad

Survey data on student expectations for faculty, teaching assistant, and peer support in engineering education

Data files

Mar 20, 2026 version files 384.12 KB

Click names to download individual files

Abstract

This study compares five short text topic modeling (STTM) techniques for analyzing qualitative student feedback on instructional support in engineering education. Student feedback was collected using short answer questions that resulted in 1,667, 1,592, and 1,376 expectations for faculty support, teaching assistant (TA) support, and peer support respectively as part of a larger survey conducted via convenience sampling in over 40 engineering courses offered at single large university between 2016 and 2023.   After cleaning and preprocessing the data, short text responses were analyzed using five unsupervised topic models implemented in Python: traditional models, Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA), Non-Negative Matrix Factorization (NMF), and k-means and one deep learning model (BERTopic). Model performance was evaluated using topic coherence and external performance metrics. Two approaches to establishing ground truth were evaluated: (a) keywords from each topic model guided manual (human) coding of the data (a machine-led approach); and (b) themes in the data were extracted and coded independently by a domain expert (a human-led approach).  NMF achieved the highest average performance in two of the three datasets, reaching 75.6% accuracy, 75.7% F1-Score, and 0.63 interrater reliability for the peer support dataset and 72.6% accuracy, 72.0% F1-Score, and 0.57 interrater reliability for the TA support dataset. The human-led approach yielded higher accuracy and F1-scores for faculty and peer support but failed for TA support when the topics extracted by topic models did not align with themes identified by a domain expert.  These findings highlight the need for humans to be involved in the analysis of short text data in contexts like education research where high performance is necessary to achieve appropriate rigor. Domain expert intervention also enables strategic use of topic models to optimize their use in qualitative data analysis.