The disruption index suffers from citation inflation and is confounded by shifts in scholarly citation practice: synthetic citation networks for bibliometric null models

Petersen, Alexander 1

Published Jun 28, 2023; Updated Feb 05, 2025 on Dryad. https://doi.org/10.6071/M3G674

Data files

Jun 28, 2023 version files 181.68 MB

Dryad_OpenData.zip

181.37 MB
README.pdf

310.52 KB

Jun 28, 2023 version files 181.68 MB

Dryad_OpenData.zip

181.37 MB
README.md

1.74 KB
README.pdf

310.52 KB

Feb 05, 2025 version files 303.95 MB

Dryad_OpenData.zip

303.56 MB
README.md

2.52 KB
README.pdf

388.41 KB

Abstract

We demonstrate that the disruption index (CD) recently applied to publication and patent citation networks by Park et al. (Nature, 2023) systematically decreases over time due to secular growth in research and patent production, following two distinct mechanisms unrelated to innovation – the first structural and the second behavioral. The structural explanation follows from ‘citation inflation’ (CI) (Petersen et al., Research Policy, 2018), an inextricable feature of real citation networks. One driver of CI is the ever-increasing length of reference lists, which causes the CD index to systematically decrease. The behavioral explanation reflects shifts in scholarly citation practice (e.g. self-citation) that increase the rate of triadic closure in citation networks and confounds efforts to measure disruptive innovation using CD. Combined, these two mechanisms render CD unsuitable for cross-temporal analysis, and call into question the interpretations provided by Park et al.

https://doi.org/10.6071/M3G674

Description of the data and file structure

Enclosed are two types of data: 1) Empirical publication-level data (tabular file format) accompanied by code (do-files) for running multi-variate regressions in STATA; 2) Raw network data (sparse network representation format) produced for 6 citation network scenarios. For each scenario we include 4 synthetic networks each, for a total of 24 citation networks. Each citation network is comprised of 125270 nodes that were systematically added in cohorts, therefore representing null model for evolving citation networks, and thereby useful for benchmarking existing and new bibliometric measures. The data and code for 1) and 2) are organized into subfolders, the contents and functionality of which are described in detail in the enclosed README.pdf document.

Files and variables

File: README.pdf

Description: An extended description of the files and their data format

File: Dryad_OpenData.zip

Description: A folder containing the synthetic networks and Mathematica notebooks (code) for visualizing the results

Code/software

Data were analyzed and visualized using notebooks developed with Mathematica 13.0, which should be compatible with future software versions. The workflow for executing Mathematica notebooks is simply Shift+Enter to execute commands contained in any given cell; the initial cells upload the data files, and from there the notebook cells should be executed from start to end in linear order.

Multivariate regression models were implemented with STATA 13.0 software, which should be compatible with future software versions. The workflow for executing STATA do-files is to open the .dta file containing the tabular observation data, and then to execute the associated .do file containing the model specification.

Version changes

Feb-2025: Added the folders /QSS_STATA_Table1 and /JOI_STATA_Table_S3&S4, which contain tabular data and code that generate several multi-variate regressions using STATA 13 software; these models test relationships between the disruption index and various other publication-level covariates (team size, citations, and reference-list length) for millions of publications, as reported in the published companion articles.

The disruption index suffers from citation inflation and is confounded by shifts in scholarly citation practice: synthetic citation networks for bibliometric null models

Data files

Abstract

README: The disruption index suffers from citation inflation and is confounded by shifts in scholarly citation practice: synthetic citation networks for bibliometric null models

Description of the data and file structure

Files and variables

File: README.pdf

File: Dryad_OpenData.zip

Code/software

Version changes

Methods

Usage notes

Works referencing this dataset