Code from: Beyond the classroom: Alicia’s multivariate journey
Data files
Nov 26, 2025 version files 1.64 MB
-
april-2019-script-1-codes.rtf
218.64 KB
-
april-2019-script-1.R
3.86 KB
-
april-2019-script-2-codes.rtf
133.06 KB
-
april-2019-script-2.R
2.59 KB
-
december-2018-script-1-codes.rtf
118.42 KB
-
december-2018-script-1.R
1.62 KB
-
r-code-themes.xlsx
29.44 KB
-
README.md
1.21 KB
-
september-2019-script-1-codes.rtf
969.03 KB
-
september-2019-script-1.R
25.97 KB
-
september-2019-script-2-codes.rtf
136.05 KB
-
september-2019-script-2.R
3.08 KB
Abstract
The importance of data science skills for modern scientific research cannot be understated. Although policy documents increasingly recommend what skills should be included in undergraduate statistics and data science curricula, little is known about how students actually develop and apply these skills. This paper addresses this gap through an in-depth case study tracing one student’s learning progressions throughout her master’s program. Using a qualitative method to analyze student code, which has seen little use in statistics education research, I examined how Alicia transferred the data science skills from her applied statistics course into authentic research settings. The analysis shows that, while Alicia successfully navigated new challenges, she encountered persistent hurdles when extending bivariate techniques into multivariate contexts, particularly with visualizations and summary statistics. These findings highlight the obstacles students may face when applying classroom knowledge to real-world data problems. The results carry implications for instructors designing curricula, researchers studying how students learn data science, and policymakers shaping educational standards, underscoring the need to pair policy recommendations with research on the realities of student learning.
https://doi.org/10.5061/dryad.c59zw3rg6
This repository contains the R script files submitted by Alicia (pseudonym) throughout this study, files associated with the qualitative analysis of the code, and files associated with visualizations of the qualitative themes included in Alicia's code.
Description of the data and file structure
As this is a qualitative analysis, the usage of these "data" files differs from a typical quantitative analysis.
- The
.RFiles contain the scripts generated by Alicia at each time point (December 2018, April 2019, September 2019) - The
-codes.rftFiles contain the (qualitative) process codes for eachRscript - The
r-code-themes.xlsxThe file contains information on every script and the qualitative code assigned to each line of code.
Code/Software
While the "data" for this analysis are R scripts, these scripts cannot be executed. The intention of the analysis was not to run these scripts, but to qualitatively describe the processes used throughout each script. As such, the scripts should be thought of as static documents.
R Script files submitted by Alicia (pseudonym) over the course of the study. The files are named according to when they were submitted:
- December 2018
- R Script #1
- April 2019
- R Script #1 (revised)
- R Script #2
- September 2019
- R Script #1 (revised)
- R Script #2 (revised)
Qualitative Data Analysis Files (Rich text files)
- December 2018 Script #1
- April 2019 Script #1
- April 2019 Script #2
- September 2019 Script #1
- September 2019 Script #2
Quantitative Data Analysis Files
- r-code-themes.csv
- Comma separated values file with separate sheets for each R script
- Each sheet contains the qualitative code assigned to each line of code and whether the code contained errors.
- Theobold, Allison S. (2026). Beyond the Classroom: Alicia’s Multivariate Journey. Journal of Statistics and Data Science Education. https://doi.org/10.1080/26939169.2026.2616499
