Broadening participation: 20-year outcomes from undergraduate science training programs
Data files
May 27, 2026 version files 158.03 KB
-
Interventions_shared.csv
19.92 KB
-
Interventions_shared.do
7.20 KB
-
Interventions_shared.dta
128.69 KB
-
README.md
2.22 KB
Abstract
Open Source Data
We have submitted our raw data in both csv: (Interventions_shared.csv), and Stata: (Interventions_shared.dta), including our Stata syntax do file:
(Interventions_shared.do).
Descriptions
Variables: Interventions_shared
- attcolle - University attended
- MTP - Participation in national undergraduate science training programs Research Initiative for Scientific Enhancement (RISE) and Minority Access to Research Careers (MARC)
- w_gpa - Participant grade point average (GPA) on 4.0 scale at baseline
- q154_0 - Intention to become a research scientist at baseline
- phd_complete - PhD Completion
Key Information Sources
University, Science Training Program Participation, GPA, Science Intent, and PhD Completion were derived from the following sources:
- Self-report via Qualtrics survey
- National Student Clearing House open source availble data
Code/Software
STATA is required to run Interventions_shared.do; the script was created using version 18.0.
Annotations are provided throughout the script through 1) library loading, 2) dataset loading and cleaning, and 3) analyses.
Human subjects data
This study was reviewed and approved by the California State University San Marcos and the Claremont Graduate University Institutional Review Boards (IRBs). All participants provided informed consent electronically prior to participation. The consent form clearly stated that de-identified data may be made available for research and educational purposes, and explicit consent for such data sharing was obtained from all participants.
To protect participant confidentiality, all identifying information (e.g., names, contact details, IP addresses, and other potentially identifying metadata) was removed from the dataset prior to deposit. Each participant record has been assigned a numeric ID number, which is the only identifier retained in the dataset. This ID cannot be traced back to any individual participant. Accordingly, the dataset shared on Dryad contains only de-identified information and poses minimal risk to participant privacy.
