Data from: Working groups, gender and publication impact of Canada’s ecology and evolution faculty
Data files
Mar 04, 2025 version files 2.69 MB
-
README.md
7.84 KB
-
researcher_database.csv
2.57 MB
-
syn_sc_socio_public_citations.csv
104.93 KB
-
Synthesis_science_survey_public.csv
7.70 KB
Abstract
Working groups are recognized as a highly effective method for synthesizing science. It is less clear if participating in working groups benefits individual researchers, or if benefits differ between men and women. This is a critical question, for the working group method is not sustainable if the benefit to science comes at a cost to academic careers or gender equity. Here, we analyze the publications of Canadian university faculty specialized in ecology and evolution (N=1244), a field that has embraced the working group method. Researchers were more likely to have participated in a working group as their academic age and prior H-index increased, but controlling for these factors there was no effect of gender. Using a longitudinal analysis, we find that researcher H-indices accrue 14% faster following their first working group publication, regardless of gender. Part of this acceleration may be the 3- to 5-fold higher citation rate of working group synthesis publications. In a survey (N=169), researchers also report indirect benefits of working groups, at similar rates for men and women. Working groups are therefore, good not just for science but also for scientists. Formalized mechanisms for collaborations such as working groups may also offset gender inequities in science.
Description of the Data and file structure
This readme file describes the (1) scripts and (2) datafiles included in this repository. Missing data in data files are indicated as NA. All data files are in .csv format meaning that “,” is used as the separator.
(1) Scripts
paper_analysis.do:
This do file is written using Stata 18.0.
This script analyses whether working group (WG) experience has a significant impact on researchers' Hindex progression and whether this benefit or WG participation is gendered.
Input: researcher_database.csv
Output: Table 1, Table 2, Figure 1(a) (b) (c), Figure 3(a) (b)
SSS_citations_public.R:
This script is written in the programming language R.
The script analyses the effect of research type and research method on the citations of publications using generalized linear models.
The script also plots this data.
Input: syn_sc_socio_public_citations.csv
Output: pubs.tif, pubs.pdf
survey_sss_public.R:
This script is written in the programming language R.
The script compiles information from a questionnaire given to Canadian university faculty specialized in ecology and evolution about their experience of working groups.
The script uses Chi squared tests to examine if faculty experience of working groups differs between men and women.
The script also plots this data
Input: Synthesis_science_survey_public.csv
Output: multibar.tif, multibar.pdf
(2) Data files
Description of "researcher_database.csv"
The file "researcher_database.csv" is a comma (",") delimited table with 83348 rows and 7 columns.
Our sample is structured in individual-year format. Each row represents a researcher observed in a particular year. We have binned the years of PhD completion to anonymize this data.
This data can be analysed with the "paper_analysis.do" script.
Missing data is indicated with NA.
The columns contain the following data:
id_1: integer variable which contains the random id numbers assigned to each researcher.
year: integer variable indicating the calendar years from 1953 to 2019.
hindex: integer variable showing the Hindex of a researcher in a certain year.
gender: factor variable taking two values: "Men (0)" and "Women (1)".
pub_sc_wg_raw: numeric variable indicating the number of working group publications.
phd_year_bin: character variable indicating the range of years in which the researcher received the PhD degree.
wg1_year: numeric variable showing the years of the researchers' first working group publications.
Description of "syn_sc_socio_public_citations.csv"
The file "syn_sc_socio_public_citations.csv" is a comma (",") delimited table with 2486 rows and 8 columns.
Each row represents a selected publication published by a Canadian university faculty member with specialization in in ecology and evolution.
Although publication citation information is technically in the public domain, we have removed author and title information to anonymize this data.
This data can be analysed with the "SSS_citations_public.R" script.
Missing data is indicated with NA.
The columns contain the following data:
type: character variable indicating the type research in the publication, taking one of four values - "Meta-analysis", "Review", "Model", "Primary":
"Meta-analysis" is defined as a quantitative meta-analysis of previously published data
"Review" is defined as a qualitative synthesis or summary of previously published results
"Model" is defined as mathematical or conceptual framework for predicting future results or understanding previous results, excluding purely statistical models
"Primary" is defined as the original collection of new data
method: character variable indicating the method used to conduct the research in the publication, either "Traditional" or "Working group".
"Working group" is defined as collaborative research conducted largely in a small group meeting assembled for that purpose, often at synthesis centres
"Traditional" is defined as research conducted by means other than a working group
level: character variable summarizing the level of research, either "Primary" (when type = "Primary) or "Synthesis" (when type = "Meta-analysis" or "Review" or"Model")
citations: numerical variable indicating the mean annual number of citations of the publication from the year of publication until 2019
synthesis_science_dummy: numerical variable taking the value "0" (level = "Primary") and "1" (level = "Synthesis")
wg_dummy: numerical variable taking the value "0" (method = "Traditional") and "1" (method = "Working group")
years_since: integer variable indicating the number of years that have elapsed between publication and the year 2019 (e.g. an article published in 2018 has years_since = 1)
citations_decade: integer variable which estimates the number of citations in a decade, calculated as "citations" x 10 rounded to the nearest integer.
Description of "Synthesis_science_survey_public.csv"
The file "Synthesis_science_survey_public.csv" is a comma (",") delimited table with 169 rows and 7 columns.
Each row represents one respondent to an online and in-person questionnaire.
Respondents are faculty at Canadian universities with speciality in ecology and evolution.
To preserve anonymity of respondents, we report here only the 7 responses analyzed in this study an provide no additional identifying information.
This data can be analysed with the "survey_sss_public.R" script.
Missing data is indicated with NA and often means that the question did not apply to the respondent.
The columns contain the following data:
gender: character variable which contains identity as male ("M") or female ("F"); other gender identities not given.
WG(yes or no): character variable which contains answer yes ("Y") or no ("N") to question "Are you currently or have you ever been part of a Working Group?"
decline_WG: character variable which contains answer yes ("Y") or no ("N") to question "Have you ever declined an invitation to participate in a working group?"
decline_reason: character variable which contains answer to question "If yes, why" when decline_WG = Y. Respondents chose one of
"I was too busy or could not travel due to work-related duties",
"I was too busy or could not travel due to family-related duties",
"I was not interested in the topic",
"I do not like the working group method",
"If other, please specify"
subsequent_colla: character variable which contains answer to "How many participants from this working group have you subsequently collaborated with in other contexts?" with
the answers generally being a number but also including text answers such as "Aucun"(translation of “Aucun” [French] is "none"). This question referred just to the most recent working group of the respondent. We altered the entry “2-3” to “2 to 3” to prevent this entry being misinterpreted as a date.
reuse: character variable. If the respondent had constructed a dataset in their most recent working group, they were then asked "Has this dataset been used for other projects, besides the original project of the working group?". This variable contains the reply, either "Yes" or "No"
future_fund: character variable which contains answer to "Has your participation in this working group led to funding opportunities?", either "Yes" or "No".
This question referred just to the most recent working group of the respondent.
Sharing/access Information
Links to other publicly accessible locations of the data: No other publicly accessible location
Was data derived from another source? Yes (variable "citations" only, all other variables original)
If yes, list source(s): Clarivate Web of Science
We compiled information on 1,244 faculty members at Canadian universities who were funded by a NSERC Discovery grant (Evolution and Ecology subcommittee) between 1991 and 2019. This information included assumed binary gender from first names and institutional website use of pronouns and photographs (coded men, women); we acknowledge that we may have mis-assigned gender or failed to notice non-binary, transitional or fluid gender identities. We also collected information on the researcher’s year of PhD and all institutions they were affiliated with during their research career. This information was obtained from public curriculum vitae, institutional websites, personally-maintained researcher websites, academic networking platforms (LinkedIn, Research Gate), Google Scholar, and other public sources such as obituaries. For each researcher, we reconstructed their H-index through time using (1) a compiled list of their peer-reviewed publications and (2) the citations for each publication, for each calender year from the date of publication until 2019. We compiled their publications using a recursive procedure, which started by first downloading all publications for individuals with the researcher’s first initial and last name from Web of Science Core Collection (hereafter, WOS) starting from 5 years prior to their PhD until 2019, and then filtering this list by cross-referencing with known variants in authorship names for the researcher (from online curriculum vitae or Google Scholar profile) as well as their institutional affiliations, fuzzy matching of publication titles from their curriculum vitae or Google Scholar profile where possible, and recursive identification of previously unidentified affiliations to fine-tune the cross-referencing procedure. Once we had cleaned the publication record, we then calculated cumulative citations over years for each publication from WOS yearly citation counts as a precursor to calculating the H-index.
We identified a potential pool of publications from working groups by (1) matching WOS titles with known working group publications funded by the 15 synthesis centers that comprise the International Synthesis Consortium, (2) by searching the funding and acknowledgment sections of publications for synthesis centre names or acronyms, or keywords commonly used to describe working groups (“working group”, “synthesis group”, “synthesis working group”, “synthesis committee”, “synthesis workshop”, “catalysis group”). All publications from steps 1 and 2 were then manually coded as primary research vs. synthesis research, and as working group method vs. non-working group method. We further categorized synthesis research publications into the following types: statistical synthesis (statistical analysis of previously published or archived data collected by multiple different researchers and/or studies), conceptual synthesis (qualitative review of the literature or proposal of new frameworks for scientific concepts or investigation), or mathematical synthesis (theoretical mathematical models or specific application of general models for the purpose of prediction). We scored non-working group publications using similar criteria. However, given the large number of publications involved, we changed methods to allow for programmatic approaches based on keywords indicative of the three types of synthesis science. This data is presented in aggregated and anonymized form as needed to prevent the identification of individuals.
We conducted an online survey of current ecology and evolution faculty in Canada from July to September 2019, recruited by email and supplemented by in-person recruitment at the Canadian Society of Ecology and Evolution annual conference (Fredericton NB Canada, August 18-21 2019). The 169 valid responses represent an effective questionnaire response rate of 14.7%. The questionnaire asked for information designed to confirm or complete the researcher database (e.g. academic history, gender) as well as information about why researchers participated or not in working groups, and the perceived costs and benefits of participation. This data is presented in condensed and anonymized form only to maintain the privacy of personal information.
We used survival analysis to test if gender or pace of career progression (H-index adjusted for time since PhD) predicts the hazard rate of participation in working groups. We included an interaction between gender and H-index to assess whether potential selection effects tied to research record captured by the H-index are the same for women and men. We estimated hazard ratios for attending WGs using Cox proportional hazard models.
For the 183 researchers who participated in working groups, we used a fixed effects model with a linear spline to investigate the effects of working group participation and gender on researchers’ trajectory of H-indices over time. This model compares the trajectory of researchers’ H-indices in years before (0-5 years before) and after (1-5 years after, and >6 years after) participating in working groups, and then averaging those differences across researchers. To account for autocorrelation within individuals and heteroscedasticity across individuals, we clustered on individuals. We used a 0.67 power transformation on the “time” variable to linearize the H-index ~ time relationship. We code the spline specification in marginal form, which makes interpretation simple: coefficients of the second and third intervals capture changes in H-index growth rates from their prior intervals.
The effects of research type (synthesis vs primary) and method (working group vs traditional) on publication citation rates were evaluated with a zero-inflated generalized linear model based on a negative binomial error distribution with a log link (R package glmmTMB).
The survey results were evaluated with simple Chi-square tests of association.
- Wei, Qian; Srivastava, Diane; Lachapelle, Francois; Fuller, Sylvia (2025). Data from: Working groups, gender and publication impact of Canada's ecology and evolution faculty. Zenodo. https://doi.org/10.5281/zenodo.10493629
- Wei, Qian; Srivastava, Diane; Lachapelle, Francois; Fuller, Sylvia (2025). Data from: Working groups, gender and publication impact of Canada's ecology and evolution faculty. Zenodo. https://doi.org/10.5281/zenodo.10493628
- Wei, Qian; Lachapelle, Francois; Fuller, Sylvia et al. (2020). Working groups, gender and publication impact of Canada’s ecology and evolution faculty [Preprint]. Cold Spring Harbor Laboratory. https://doi.org/10.1101/2020.05.12.092247
