Skip to main content

PLOS ONE publication and citation data

Cite this dataset

Petersen, Alexander (2023). PLOS ONE publication and citation data [Dataset]. Dryad.


Merged PLOS ONE and Web of Science data compiled in .dta files produced by STATA13. Included is a Do-file for reproducing the regression model estimates reported in the pre-print (Tables I and II) and published version (Table 1). Each observation (.dta line) corresponds to a given PLOS ONE article, with various article-level and editor-level characteristics used as explanatory and control variables. This summary provides a brief description of each variable and its source. 

If you use this data, please cite: A. M. Petersen. Megajournal mismanagement: Manuscript decision bias and anomalous editor activity at PLOS ONE. Journal of Informetrics 13, 100974 (2019). DOI: 10.1016/j.joi.2019.100974


We gathered the citation information for all PLOS ONE articles, indexed by A, from the Web of Science (WOS) Core Collection. From this data we obtained a master list of the unique digital object identifier, DOIA and the number of citations, cA, at the time of the data download (census) date

(a) For the pre-print this corresponds to December 3, 2016;

(b) and for the final published article this corresponds to February 25, 2019. 

We then used each DOIA to access the corresponding online XML version of each article at PLOS ONE by visiting the unique web address “” + “DOIA”. After parsing the full-text XML (primarily the author byline data and reference list), we merged the PLOS ONE publication information and WOS citation data by matching on DOIA.

#allofplos: PLOS has since made all full-text XML data freely available: ; this option was not available at the moment of our data collection.

Usage notes

Data enclosed in a single zipped folder:

A) DASH-V2 : Data files for final published analysis (J. Informetrics, 2019)

File A1: PubData_DOI_141986_Nc_0_2019.dta

File A2: PubData_DOI_141986_Nc_0_2019_DOFILE

B) DASH-V1 : Data files for preprint version (

File B1: PubData_Obs_102741_Nc_10_No2015_CitationsAnalysis.dta

File B2: PubData_Obs_128734_Nc_10_AcceptanceTimeAnalysis.dta


C) Data description common to all .dta files, which contain parsed and merged PLOS ONE and Web of Science metadata:

File A3: UC-DASH_DataDescription_Petersen_V2.pdf

File B4: UC-DASH_DataDescription_Petersen_V1.pdf