PLOS ONE publication and citation data
Data files
Jul 23, 2018 version files 17.60 MB
-
DASH_Petersen_2018.zip
-
UC-DASH_DataDescription_Petersen.pdf
Jul 23, 2018 version files 29.40 MB
-
UCDASH_DataDescription_Petersen.zip
May 15, 2023 version files 29.40 MB
-
README.md
-
UCDASH_DataDescription_Petersen.zip
Abstract
Merged PLOS ONE and Web of Science data compiled in .dta files produced by STATA13. Included is a Do-file for reproducing the regression model estimates reported in the pre-print (Tables I and II) and published version (Table 1). Each observation (.dta line) corresponds to a given PLOS ONE article, with various article-level and editor-level characteristics used as explanatory and control variables. This summary provides a brief description of each variable and its source.
If you use this data, please cite: A. M. Petersen. Megajournal mismanagement: Manuscript decision bias and anomalous editor activity at PLOS ONE. Journal of Informetrics 13, 100974 (2019). DOI: 10.1016/j.joi.2019.100974
Methods
We gathered the citation information for all PLOS ONE articles, indexed by A, from the Web of Science (WOS) Core Collection. From this data we obtained a master list of the unique digital object identifier, DOIA and the number of citations, cA, at the time of the data download (census) date
(a) For the pre-print this corresponds to December 3, 2016;
(b) and for the final published article this corresponds to February 25, 2019.
We then used each DOIA to access the corresponding online XML version of each article at PLOS ONE by visiting the unique web address “http://journals.plos.org/plosone/article?id=” + “DOIA”. After parsing the full-text XML (primarily the author byline data and reference list), we merged the PLOS ONE publication information and WOS citation data by matching on DOIA.
#allofplos: PLOS has since made all full-text XML data freely available: https://www.plos.org/text-and-data-mining ; this option was not available at the moment of our data collection.
Usage notes
Data enclosed in a single zipped folder:
A) DASH-V2 : Data files for final published analysis (J. Informetrics, 2019)
File A1: PubData_DOI_141986_Nc_0_2019.dta
File A2: PubData_DOI_141986_Nc_0_2019_DOFILE
B) DASH-V1 : Data files for preprint version (https://ssrn.com/abstract=2901272)
File B1: PubData_Obs_102741_Nc_10_No2015_CitationsAnalysis.dta
File B2: PubData_Obs_128734_Nc_10_AcceptanceTimeAnalysis.dta
File B3: STATA13_DOFILE
C) Data description common to all .dta files, which contain parsed and merged PLOS ONE and Web of Science metadata:
File A3: UC-DASH_DataDescription_Petersen_V2.pdf
File B4: UC-DASH_DataDescription_Petersen_V1.pdf