Forecasting the publication and citation outcomes of Covid-19 preprints

Pfeiffer, Thomas 1 ; Gordon, Michael1; Bishop, Michael2; Chen, Yiling3; Goldfedder, Brandon4; Dreber, Anna5; Holzmeister, Felix6; Johannesson, Magnus5; Liu, Yang7; Twardy, Charles8; Wang, Juntao3; Tran, Luisa8

Published Sep 27, 2022 on Dryad. https://doi.org/10.5061/dryad.rfj6q57d0

Data files

Sep 27, 2022 version files 238.82 KB

Data.zip
238.40 KB
README
418 B

Abstract

The scientific community reacted quickly to the Covid-19 pandemic in 2020, generating an unprecedented increase in publications. Many of these publications were released on preprint servers such as medRxiv and bioRxiv. It is unknown however how reliable these preprints are, and if they will eventually be published in scientific journals. In this study, we use crowdsourced human forecasts to predict publication outcomes and future citation counts for a sample of 400 preprints with high Altmetric scores. Most of these preprints were published within one year of upload on a preprint server (70%), and 46% of the published preprints appeared in a high-impact journal with a Journal Impact Factor of at least 10. On average, the preprints received 162 citations within the first year. We found that forecasters can predict if preprints will be published after one year and if the publishing journal has high impact. Forecasts are also informative with respect to preprints’ rankings in terms of Google Scholar citations within one year of upload on a preprint server. For both types of assessment, we found statistically significant positive correlations between forecasts and observed outcomes. While the forecasts can help to provide a preliminary assessment of preprints at a faster pace than the traditional peer-review process, it remains to be investigated if such an assessment is suited to identify methodological problems in pre-prints.

Forecasting the publication and citation outcomes of Covid-19 preprints

Data files

Abstract

Methods

Works referencing this dataset