Seventy-five years of systematic biology: Looking back, moving forward
Data files
Nov 24, 2025 version files 36.48 MB
-
README.md
3.53 KB
-
systbiol_articles_enhanced_250717.xlsx
3.15 MB
-
systbiol_articles_oxford_252905.xlsx
7.15 MB
-
texts_info.csv
25.90 MB
-
topics_info.csv
279.86 KB
Abstract
What does “systematic biology” mean today, where has it been in the past, and where is it going? We explore these questions by considering five elements – collaboration, integration, discourse, infrastructure, and society – that we think have allowed systematic biology to adapt to change and sustain growth without losing its unique identity. In the spirit of celebrating the 75th anniversary of our flagship journal, we generated a comprehensive dataset for all Systematic Biology and Systematic Zoology articles (N = 5,150) that we could locate and used bibliometric and textual analyses to illustrate ways in which our field has transformed. We offer our humble opinions on how our community can ensure that systematic biologists inherit an enlightening, dynamic, and enduring future.
Seventy-five years of systematic biology: Looking back, moving forward
Dataset DOI: https://doi.org/10.5061/dryad.905qfttz8
GitHub repo: https://github.com/mlandis/systbiol75
Datasets:
systbiol_articles_oxford_252905.xlsx: Original Oxford dataset of Systematic Biology/Zoology articles published during 1952-2024. Empty fields represent missing data. This dataset was created prior to May 29, 2025. Supplementary Information contains analysis details.systbiol_articles_enhanced_250717.xlsx: Enhanced dataset of Systematic Biology/Zoology articles published during 1952-2024. Based on original Oxford dataset with extra columns for citation counts from Google Scholar, article types, and page lengths from the journal website, and extra articles missing from the original dataset. Empty fields represent missing data. This dataset was created prior to July 17, 2025. Supplementary Information contains analysis details.
Code (stored on Zenodo, Software related work):
systbiol75_analysis.ipynb: Jupyter notebook with all code used to generate Figures 1-3 and Supp. Figures S1-S7. Functionality includes filtering the original dataset for relevant records, embedding research articles into vector-space with SPECTER, topic modeling with BERTopic, comparison of mean-similarity of articles across years, computing frequencies of number of articles per topic over time, comparing citation rates among different topic-clusters, comparing numbers of articles and article lengths for different article types, and visualizing and outputting results from these analyses.
Output:
-
topics_info.csv: Topics for Systematic Biology/Zoology research articles published during 1952-2025 as predicted by BERTopic. Provides topic keywords and the representative document for each topic.- Topic -1 represents the "null" topic and Topics 0 to 38 represent the main topics.
- Count represents the numbers of articles represented to Topics -1 to 38.
- BestTopicCount represents the numbers of articles assigned to Topics 0 to 38 (excluding Topic -1) based on the highest topic probability.
- Representative keywords are the keywords BERTopic associate with each topic.
- Representative documents are the titles and abstracts of the articles associated most closely associated with each topic.
Supplementary Information contains analysis details and in the accompanying Jupyter notebook file,
systbiol75_analysis.ipynb. -
texts_info.csv: Topics assigned to each Systematic Biology/Zoology research articles published during 1952-2025 as predicted by BERTopic. Provides all topic assignment probabilities and the highest-probability topic assignment for each document.- Document gives the title and abstract for each article.
- Topic is the assigned topic from Topics -1 to 38.
- BestTopic is the assigned topic from Topics 0 to 38 (excluding Topic -1).
- Representative keywords are the keywords BERTopic associate with each topic.
- Representative documents are the titles and abstracts of the articles associated most closely associated with each topic.
- Probability is the probability that the article is associated with Topic.
- Prob_Topic_N fields are the probabilities that the article is associated with Topics 0 to 38 (highest probability used for BestTopic).
Supplementary Information contains analysis details and in the accompanying Jupyter notebook file,
systbiol75_analysis.ipynb.
