UCSC library survey on graduate student publishing
Data files
Mar 24, 2026 version files 176.56 KB
Abstract
This dataset contains responses from a survey of graduate students at the University of California, Santa Cruz (UCSC), examining publishing practices and attitudes toward open access. The survey was distributed to 1,907 graduate students (master's and doctoral) via email invitation, with responses collected through Qualtrics. The dataset includes 245 valid responses from students who had previously published, planned to publish, or were unsure about publishing during their graduate programs. The questionnaire comprised 48 questions (conditional branching applied) covering topics related to scholarly publishing with emphasis on open access, defined as "scholarly content that is free to access online, with limited or no copyright and licensing restrictions." Questions focused on the publishing process from manuscript submission onward, concentrating on article publication as the most universal format across disciplines. Data collection and de-identification were managed in compliance with IRB requirements.
Dataset DOI: 10.5061/dryad.9zw3r22vd
Dataset overview
This dataset contains survey responses collected from University of California, Santa Cruz (UCSC) graduate students. The survey examined graduate students' publishing experiences, behaviors, and information needs related to the scholarly publishing process. Data were collected via an institutional Qualtrics platform and distributed by email. The dataset contains 245 respondent records.
Files included
UCSC_Library_Survey_on_Publishing_Graduate_Students_2026-03-18.sav
De-identified survey response data in SPSS format. Contains 245 rows (respondents) and 196 variables. All direct identifiers and verbatim responses have been removed (see Human Subjects De-identification below). Value labels and variable labels are preserved.
ucsc_library_questionnaire_grad_student_publishing.rtf
Survey instrument used to collect the data. Distributed by email and administered via the UCSC institutional Qualtrics platform.
Variables and file structure
Variables are named using Qualtrics-generated question identifiers (e.g., Q4, Q7_1) and a small number of computed demographic variables created by the research team. Variable labels providing the full question text are embedded in the .sav file and visible in SPSS or compatible software.
Key demographic/classification variables:
- Ethnicity_3: Respondent ethnicity, collapsed into 3 categories (URM groups combined; White non-Hispanic; International)
- DoctoralField_2: Doctoral field, collapsed into 2 broad divisions (Arts/Humanities/Social Sciences; PBSci/Engineering)
- DoctoralProgressToDegree: Progress toward degree, 2 categories (first or second year; third year or more)
- Respondent: Flag indicating respondent has published or plans to publish
- Respondent_TopicsWanted / Respondent_PublishingGuidance: Flags indicating eligibility for specific question series
- Finished: Whether respondent completed the survey (1 = completed, 0 = did not complete)
- Remaining variables correspond to individual survey questions. Multiple-choice and select-all-that-apply questions are represented as binary indicator variables (1 = selected). Likert-scale questions are stored as numeric values with labeled categories embedded in the file.
Software
The .sav file can be opened with IBM SPSS Statistics. It is also readable in R (via the haven or foreign package), Python (via pyreadstat or savReaderWriter), and PSPP (a free, open-source alternative to SPSS).
Human subjects de-identification
IRB and Consent
Data collection was conducted under IRB protocol HS-FY2021-21. Survey data were collected in partnership with the UCSC Institutional Research, Analytics, and Planning Support (IRAPS) office. Data were received by the research team already de-identified. Participants provided consent for public sharing of their de-identified data.
De-identification Process
The dataset has been prepared in accordance with Dryad's human subjects data requirements. The following variables were removed prior to deposit:
- Direct identifiers removed: ResponseId (Qualtrics-generated unique response identifier), Duration (exact time in seconds to complete survey)
- Verbatim responses removed: All open-ended text fields were removed, including Q8 (reasons for not publishing), Q9 (program deliverable), Q42 (final open-ended comments), and 16 "other, please specify" free-text fields associated with multiple-choice questions (Q7_5_TEXT, Q5_5_TEXT, Q13_8_TEXT, Q14_12_TEXT, Q15_12_TEXT, Q16_5_TEXT, Q17_8_TEXT, Q18_4_TEXT, Q18_7_TEXT, Q20_7_TEXT, Q24_6_TEXT, Q30_5_TEXT, Q39_6_TEXT, Q40_10_TEXT, Q41_6_TEXT, Q47_5_TEXT)
- The dataset retains three indirect identifiers required for analysis and reproducibility: Ethnicity_3, DoctoralField_2, and DoctoralProgressToDegree. Each has been collapsed into broad categories to minimize re-identification risk. DoctoralProgressToDegree was further generalized from three categories to two (first/second year; third year or more), removing a specific graduation timeline reference. Cross-tabulation of all three variables produces a minimum cell size well above Dryad's guidance threshold of n=5.
Missing Data
Not all respondents answered all questions. Some question series were shown only to subsets of respondents based on prior answers (e.g., questions about the publishing process were shown only to respondents who indicated they had published or planned to publish). Missing values are represented as system-missing in the .sav file. The Respondent, Respondent_TopicsWanted, and Respondent_PublishingGuidance flag variables indicate which question series each respondent was eligible to receive.
Human subjects data
Working under IRB HS-FY2021-21, we partnered with our local Institutional Research, Analytics, and Planning Support (IRAPS) to design a data collecting instrument. We received data from IRAPS already de-identified. Additionally, we were provided consent from participants to publicly share data.
