Data for: Integrating open education practices with data analysis of open science in an undergraduate course
Data files
Jul 26, 2024 version files 10.16 KB
-
1._BestPracticesData.csv
788 B
-
3._AssessmentWeights.csv
2.33 KB
-
6._LikertData.csv
1.67 KB
-
README.md
5.38 KB
Abstract
The open science movement produces vast quantities of openly published data connected to journal articles, creating an enormous resource for educators to engage students in current topics and analyses. However, educators face challenges using these materials to meet course objectives. I present a case study using open science (published articles and their corresponding datasets) and open educational practices in a capstone course. While engaging in current topics of conservation, students trace connections in the research process, learn statistical analyses, and recreate analyses using the programming language R. I assessed the presence of best practices in open articles and datasets, examined student selection in the open grading policy, surveyed students on their perceived learning gains, and conducted a thematic analysis on student reflections. First, articles and datasets met just over half of the assessed fairness practices, but this increased with the publication date. There was a marginal difference in how assessment categories were weighted by students, with reflections highlighting appreciation for student agency. In course content, students reported the greatest learning gains in describing variables, while collaborative activities (e.g., interacting with peers and instructor) were the most effective support. The most effective tasks to facilitate these learning gains included coding exercises and team-led assignments. Autocoding of student reflections identified 16 themes, and positive sentiments were written nearly 4x more often than negative sentiments. Students positively reflected on their growth in statistical analyses, and negative sentiments focused on how limited prior experience with statistics and coding made them feel nervous. As a group, we encountered several challenges and opportunities in using open science materials. I present key recommendations, based on student experiences, for scientists to consider when publishing open data to provide additional educational benefits to the open science community.
Author: Marja H Bakermans
Affiliation: Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA 01609 USA
ORCID: https://orcid.org/0000-0002-4879-7771
Institutional IRB approval: IRB-24–0314
Data and file overview
The full dataset file called OEPandOSdata (.xlsx extension) contains 8 files.
Below are descriptions of the name and contents of each file.
NA = not applicable or no data available
- BestPracticesData.csv
- Description: Data to assess the adherence of articles and datasets to open science best practices.
- Column headers and descriptions:
- Article: articles used in the study, numbered randomly
- F1: Findable, Data are assigned a unique and persistent doi
- F2: Findable, Metadata includes an identifier of data
- F3: Findable, Data are registered in a searchable database
- A1: Accessible, Data are retrievable by their identifier
- A2: Accessible, Metadata are accessible and retrievable
- I1: Interoperable, Data use an accessible and shared language
- I2: Interoperable, Data include qualified references to other (meta)data
- R1: Reusable, (Meta)data are described with accurate and relevant attributes
- C1: Coding, Coding scripts, etc used. for analyses are included
- C2: Coding, Coding scripts provide an adequate explanation of the steps
- Score: Sum of the above criteria that were met (0 or 1), with a total possible score of ten
- Year: Year of the publication of the dataset
- Coh1: Coherence of dataset and article analysis (were data included in the dataset
that were used in the analysis)
- R Code for BP Data.txt
- Description: R code for analyses related to the best practices data.
- AssessmentWeights.csv
- Description: Data to examine how students weighted assessment categories.
- Column headers and descriptions:
- Student: each student was randomly assigned a number (from 1-13)
- Group: categorical group indicating the data is for a student or the instructor
- PercentageWeight: the percent given to a particular assessment category
- AssessmentCategory: the assessments used in the course to determine final grades
- R Code for Weights.txt
- Description: R code for analyses related to the open grading policy- where students could
select the weight of assessments toward their final grade in the course.
- Description: R code for analyses related to the open grading policy- where students could
- LikertDescriptiveStatistics.csv
- Description: Data summaries (e.g., mean, sd, n) for the SALG (student assessment of their learning gains)
survey instrument. This report was generated through the salgsite.net once the survey was closed. - Column headers and descriptions:
- Number: this indicates the question number in the survey
- Question: gives text for each question in the survey
- Type: describes if the response was multiple choice (type = select one) or open response (type = long answer).
- The type = category is for any question header, where students were receiving instructions.
- N: sample size; number of survey responses
- Mean: mean value for responses (on a scale of 1-5)
- Std dev: standard deviation around the response mean
- Choices: describes the choices students could select in their multiple choice questions. Example responses
include no gain, a little gain, moderate gain, good gain, great gain, and not applicable. - See Appendix B in the article for the SALG survey questions.
- Description: Data summaries (e.g., mean, sd, n) for the SALG (student assessment of their learning gains)
- LikertData.csv
- Description: Data to visualize the results of the SALG (student assessment of their learning gains)
survey instrument. - Column headers and descriptions:
- Qset: Question set.
- Subset: Question subset.
- CATEGORY: Students assessed their learning gains as they related to the understanding of course content (Grasp),
increase in their skills (Skills), impact of their attitudes (Attitudes), course structure (Structure), assignments (Tasks), and support and resources offered to them (Resources). - Measure: Brief description of the question- shortened for the sake of the visual.
- No gain: Number of responses where students perceived there was no gain toward the question prompt.
- A little gain: Number of responses where students perceived there was a little gain toward the question prompt.
- Moderate: Number of responses where students perceived there was moderate gain toward the question prompt.
- Good gain: Number of responses where students perceived there was good gain toward the question prompt.
- Great gain: Number of responses where students perceived there was great gain toward the question prompt.
- See Appendix B in the article for the SALG survey questions.
- Description: Data to visualize the results of the SALG (student assessment of their learning gains)
- R Code for Likert Graph.txt
- Description: R code for creating the graph to visualize the results of the SALG survey instrument.
- Codebook.csv
- Description: Codebook created in the autocoding of themes in student reflections. This codebook is a code list of themes
(e.g., analysis) and the associated subterms (e.g., actual analysis, data analysis, etc.) used in organizing and creating themes from the student writing. Autocoding was performed using NVivo 14.
- Description: Codebook created in the autocoding of themes in student reflections. This codebook is a code list of themes
Article and dataset fairness
To assess the utility of open articles and their datasets as an educational tool in an undergraduate academic setting, I measured the congruence of each pair to a set of best practices and guiding principles. I assessed ten guiding principles and best practices (Table 1), where each category was scored ‘1’ or ‘0’ based on whether it met that criteria, with a total possible score of ten.
Open grading policies
Students were allowed to specify the percentage weight for each assessment category in the course, including 1) six coding exercises (Exercises), 2) one lead exercise (Lead Exercise), 3) fourteen annotation assignments of readings (Annotations), 4) one final project (Final Project), 5) five discussion board posts and a statement of learning reflection (Discussion), and 6) attendance and participation (Participation). I examined if assessment categories (independent variable) were weighted (dependent variable) differently by students using an analysis of variance (ANOVA) and examined pairwise differences with Tukey HSD.
Assessment of perceived learning gains
I used a student assessment of learning gains (SALG) survey to measure students’ perceptions of learning gains related to course objectives (Seymour et al. 2000). This Likert-scale survey provided five response categories ranging from ‘no gains’ to ‘great gains’ in learning and the option of open responses in each category. A summary report that converted Likert responses to numbers and calculated descriptive statistics was produced from the SALG instrument website.
Student reflections
In student reflections, I examined the frequency of the 100 most frequent words, with stop words excluded and a minimum length of four (letters), both “with synonyms” and “with generalizations”. Due to this paper's explorative nature, I used autocoding to identify students' broad themes and sentiments in their reflections. Autocoding examines the sentiment of each word and scores it as positive, neutral, mixed, or negative. In this process, I compared how students felt about each theme, focusing on positive (i.e., satisfaction) and negative (i.e., dissatisfaction) sentiments. The relationship of how sentiment was coded to themes was visualized in a treemap, where the size of a block is relative to the number of references for that code. All reflection processing and analyses were performed in NVivo 14 (Windows).
All data were collected with institutional IRB approval (IRB-24–0314). All statistical analyses were performed in R (ver. 4.3.1; R Core Team 2023).