Skip to main content

Irreproducibility in searches of scientific literature: a comparative analysis

Cite this dataset

Pozsgai, Gabor et al. (2022). Irreproducibility in searches of scientific literature: a comparative analysis [Dataset]. Dryad.


1. Repeatability is the cornerstone of science and it is particularly important for systematic reviews. However, little is known on how researchers’ choice of database and search platform influence the repeatability of systematic reviews. Here, we aim to unveil how the computer environment and the location where the search was initiated from influence hit results.

2. We present a comparative analysis of time-synchronized searches at different institutional locations in the world, and evaluate the consistency of hits obtained within each of the search terms using different search platforms.

3. We revealed a large variation among search platforms and showed that PubMed and Scopus returned consistent results to identical search strings from different locations. Google Scholar and Web of Science’s Core Collection varied substantially both in the number of returned hits and in the list of individual articles depending on the search location and computing environment. Inconsistency in Web of Science results has most likely emerged from the different licensing packages at different institutions.

4. To maintain scientific integrity and consistency, especially in systematic reviews, action is needed from both the scientific community and scientific search platforms to increase search consistency. Researchers are encouraged to report the search location and the databases used for systematic reviews, and database providers should make search algorithms transparent and revise access rules to titles behind paywalls. Additional options for increasing the repeatability and transparency of systematic reviews are storing both search metadata and hit results in open repositories and using Application Programming Interfaces (APIs) to retrieve standardized, machine-readable search metadata.


Three major scientific search platforms, PubMed, Scopus, and Web of Science, and Google Scholar, were used in this study. We generated keyword expressions (search strings) with two complexity levels using keywords that focused on an ecological topic and ran standardized searches from various institutions in the world (see below), all within a limited timeframe.

Simple search strings contained only one main keyphrase, without using logical (Boolean) operators, whereas complex ones contained both inclusion and exclusion criteria for additional, related, keywords and key phrases (i.e. two-word expressions within quotation marks). In complex search strings Boolean operators were also used. The simple keyword was “ecosystem services” while the complex one was “ecosystem service” AND “promoting” AND “crop” NOT “livestock”. Search language was set to English in every case, and only titles, abstracts and keywords were searched. Since there is no option in Google Scholar to limit the search to titles, keywords, and abstracts, we used the default search in this case. Since different search platforms use slightly different expressions for the same query, exact search term formats were generated for each search.

Searches were conducted on one or two machines at each of the 12 institutions in Australia, Canada, China, Denmark, Germany, Hungary, UK, and the USA (Supplementary material 2), using three commonly used browsers (Mozilla Firefox, Internet Explorer, and Google Chrome). Searches were run manually (i.e. no APIs were used) according to strict protocols, which allowed standardization of search date, exact search term for every run, and the data recording procedure. Not all platforms were queried from every location: Google products are not available in China, and Scopus was not available at some institutions (Supplementary material 2). The original version of the protocol is provided in Supplementary material 3. The first run was conducted at 11:00 Australian Eastern Standard Time (01:00 GMT) on 13 April 2018 and the last search run at 18:16, Eastern Daylight Time (22:16 GMT, 13 April 2018). After each search run, the number of hits was recorded and the bibliographic data of the first 20 articles were extracted and saved in a file format that the website offered (.csv, .txt). Once search combinations were completed, the browsers’ cache was emptied, to make sure the testers’ previous searches did not influence the results, and the process was repeated. At four locations (Flakkebjerg, Denmark; Fuzhou, China; St. Catharines, Canada; Orange, Australia) the searches were also repeated on two different computers. This resulted in 228, 132, 228, and 144 search runs for Web of Science, Scopus, PubMed, and Google Scholar, respectively.

Results were collected from each contributor, bibliographic information was automatically extracted from the identically structured saved files using a loop in the R statistical software (R Core Team, 2012), and stored in a standardized MySQL database, allowing unique publications to be distinguished. If unique identifiers for individual articles were missing, authors, titles, or the combination of these were searched for, and uniqueness was double-checked across the entire dataset. Saved data files with non-standard structures were dealt with manually. All data cleaning and manipulations were done by R.