Skip to main content

Forecasting the publication and citation outcomes of Covid-19 preprints

Cite this dataset

Pfeiffer, Thomas et al. (2022). Forecasting the publication and citation outcomes of Covid-19 preprints [Dataset]. Dryad.


The scientific community reacted quickly to the Covid-19 pandemic in 2020, generating an unprecedented increase in publications. Many of these publications were released on preprint servers such as medRxiv and bioRxiv. It is unknown however how reliable these preprints are, and if they will eventually be published in scientific journals. In this study, we use crowdsourced human forecasts to predict publication outcomes and future citation counts for a sample of 400 preprints with high Altmetric scores. Most of these preprints were published within one year of upload on a preprint server (70%), and 46% of the published preprints appeared in a high-impact journal with a Journal Impact Factor of at least 10. On average, the preprints received 162 citations within the first year. We found that forecasters can predict if preprints will be published after one year and if the publishing journal has high impact. Forecasts are also informative with respect to preprints’ rankings in terms of Google Scholar citations within one year of upload on a preprint server. For both types of assessment, we found statistically significant positive correlations between forecasts and observed outcomes. While the forecasts can help to provide a preliminary assessment of preprints at a faster pace than the traditional peer-review process, it remains to be investigated if such an assessment is suited to identify methodological problems in pre-prints. 


The dataset consists of survey responses collected through Qualtrix. Data was formatted and stored as .csv, and analysed with R.


Defense Advanced Research Projects Agency, Award: N66001-19-C-4014