The disruption index suffers from citation inflation and is confounded by shifts in scholarly citation practice: synthetic citation networks for bibliometric null models
Data files
Jun 28, 2023 version files 181.68 MB
-
Dryad_OpenData.zip
181.37 MB
-
README.pdf
310.52 KB
Jun 28, 2023 version files 181.68 MB
-
Dryad_OpenData.zip
181.37 MB
-
README.md
1.74 KB
-
README.pdf
310.52 KB
Feb 05, 2025 version files 303.95 MB
-
Dryad_OpenData.zip
303.56 MB
-
README.md
2.52 KB
-
README.pdf
388.41 KB
Abstract
README: The disruption index suffers from citation inflation and is confounded by shifts in scholarly citation practice: synthetic citation networks for bibliometric null models
https://doi.org/10.6071/M3G674
Description of the data and file structure
Enclosed are two types of data: 1) Empirical publication-level data (tabular file format) accompanied by code (do-files) for running multi-variate regressions in STATA; 2) Raw network data (sparse network representation format) produced for 6 citation network scenarios. For each scenario we include 4 synthetic networks each, for a total of 24 citation networks. Each citation network is comprised of 125270 nodes that were systematically added in cohorts, therefore representing null model for evolving citation networks, and thereby useful for benchmarking existing and new bibliometric measures. The data and code for 1) and 2) are organized into subfolders, the contents and functionality of which are described in detail in the enclosed README.pdf document.
Files and variables
File: README.pdf
Description: An extended description of the files and their data format
File: Dryad_OpenData.zip
Description: A folder containing the synthetic networks and Mathematica notebooks (code) for visualizing the results
Code/software
Data were analyzed and visualized using notebooks developed with Mathematica 13.0, which should be compatible with future software versions. The workflow for executing Mathematica notebooks is simply Shift+Enter to execute commands contained in any given cell; the initial cells upload the data files, and from there the notebook cells should be executed from start to end in linear order.
Multivariate regression models were implemented with STATA 13.0 software, which should be compatible with future software versions. The workflow for executing STATA do-files is to open the .dta file containing the tabular observation data, and then to execute the associated .do file containing the model specification.
Version changes
Feb-2025: Added the folders /QSS_STATA_Table1 and /JOI_STATA_Table_S3&S4, which contain tabular data and code that generate several multi-variate regressions using STATA 13 software; these models test relationships between the disruption index and various other publication-level covariates (team size, citations, and reference-list length) for millions of publications, as reported in the published companion articles.
Methods
Enclosed data accompany the following publications:
- Alexander M. Petersen, Felber Arroyave, Fabio Pammolli (2025). The disruption index suffers from citation inflation: re-analysis of temporal CD trend and relationship with team size reveal discrepancies. J. Informetrics 19, 101605 (2025). DOI:10.1016/j.joi.2024.101605
- Alexander M. Petersen, Felber Arroyave, Fabio Pammolli (2024). The disruption index is biased by citation inflation. Quantitative Science Studies 5, 936-953 (2024). DOI:10.1162/qss_a_00333
To summarize, enclosed are two types of data:
1) Empirical publication-level data accompanied by code (do-files) for running multi-variate regressions in STATA
2) Raw network data produced for 6 citation network scenarios. For each scenario, we include 4 synthetic networks each, for a total of 24 citation networks. Each citation network is comprised of 125270 nodes that were systematically added in cohorts, therefore representing a null model for evolving citation networks, and thereby useful for benchmarking existing and new bibliometric measures. These data were generated using a synthetic citation network model developed and reported in:
Pan, R. K., Petersen, A. M., Pammolli, F. & Fortunato, S. The memory of science: Inflation, myopia, and the knowledge network. Journal of Informetrics 12, 656–678 (2018).
Usage notes
Enclosed code was developed using 1) STATA 13.0 and 2) Mathematica 13 software, both of which should be backwards compatible with newer software verions. The document README.pdf provides detailed descriptions of the enclosed data and code.