Data from: Detecting seminal research contributions to the development of ethnobotany by reference publication year spectroscopy (RPYS)


Malik, Zubair; Malik, Basharat; PM, Naushad Ali; Bussmann, Rainer W. (2022), Data from: Detecting seminal research contributions to the development of ethnobotany by reference publication year spectroscopy (RPYS), Dryad, Dataset,


This study aims to assess the growth in overall publication output in ethnobotany as well as provide a systematic examination of the history of ethnobotanical publications using reference publication year spectroscopy (RPYS). The study is based on 5201 papers published between 1974 and 2019 covering 290006 non-distinct cited references (CRs), indexed in Science Citation Index-Expanded (SCI-Expanded) of Web of Science (WoS). The regression analysis indicated a compound annual growth rate of approximately 11% globally in ethnobotanical publications and the volume of publications doubles every approximately 6 years. The reference publication period was divided into four sub-periods in which a total of 31 peaks are clearly identifiable, including five peaks from the first period (earliest to 1800), ten from the second (1801–1900), nine from the third (1901–1950) and seven from the last period (1951–2000). A total of 44 publications were found to have been especially influential and landmark. Out of them, 31 (70%) were books and 11 (25%) were articles.  Out of the 11 articles, 5 (45%) were published in the same journal (Economic Botany). The first period had the lowest number of publications (5), including classic books like the Spanish translation of Dioscorides’ Materia Medica and Carolus Linnaeus’ Systema naturæ. Interestingly, about 30% of the studies that laid the foundation of ethnobotany and are discussed in this paper come from South Africa, pointing to the contribution of the African Continent to the foundation of the field of ethnobotany.


It is difficult to select all publications relating to a particular research area or study subject from a literature database as a source of information. To obtain a more exhaustive and inclusive publication data set, investigators applied four-step retrospective searching (Haunschild et al., 2016; Wang et al., 2014). This approach involves an initial search for key publications and a renewed search on the basis of the synonyms identified by the keyword analysis of these publications. This kind of approach is termed  “interactive query formulation” (Wacholder, 2011). In this study, investigators implemented an identical method for the data extraction. Investigators have used the WoS data (1974–2019) of the database producer Clarivate Analytics, derived from the Science Citation Index Expanded (SCI-E).

The following four steps were used in this retrospective search method.

1. Investigators searched for the term “Ethnobotany” as topic in all four search domains of Web of Science (title, abstract, author keywords and Keyword Plus) for the publication years between 1974 and 2019. A publication set of 2729 papers was retrieved with this search strategy.

 2. Bibexcel software (Persson et al. 2009) was used to extract all the author keywords of the publications (2729) for further analysis. All retrieved keywords were then selected and ranked by their frequency of occurrence.  The investigators looked for the ethnobotanical synonyms and identified the following ‘full list’ of keywords and their variants on consensus of all the authors of this paper, including Ethnobotany, ethno-botany, Ethno-medicinal plants, Ethnomedicinal plants, ethnobotanical, Ethno-botanical, Ethnobotanics, Folk medicinal plants, Traditional ethnobotanical knowledge, Traditional botanical knowledge, Traditional plant use, Medico-ethnobotany, Paleoethnobotany, Paleo-ethnobotany.

3. Investigators used a renewed topic search using the ‘full list’ of terms, with “OR” between all of them. The query was limited to the period 1974 to 2019 and finally resulted in 5201 publications.

4. Finally, the organization of literature and analysis was carried out. Investigators have processed the retrieved publications from aspects of overall growth. By following this method, an in-depth analysis of the referenced sources was carried out in all those publications that were retrieved.

Investigators analysed ethnobotanical publications with respect to influential publications. To trace influential publications, investigators used a recently developed bibliometric method called “reference publication year spectroscopy” (RPYS) (Marx et al., 2014) in combination with a tool termed CRExplorer (http:// (Thor et al., 2016). To analyse the cited references, the publication set is introduced to the CRExplorer and all CRs are mined. Equivalent references are then grouped and blended using the software’s clustering and merging facilities. References that appear less frequently than a certain limit are eliminated to decrease background noise and to enhance the corresponding spectrogram. Lastly, reference publication years are evaluated for frequently cited publications. Earlier RPYs involve a rather unique procedure, i.e. a relatively low cut-off of the minimum number of CRs, since the scale of the number of cited references (NCR, i.e., count of publications which cited a specific reference) differs significantly across different periods.

To identify the landmark papers, investigators used an advanced indicator of the software called N_TOP10. This indicator helps to identify those publications which belong to the 10% most frequently cited publications over many citing publication years (Thor et al., 2018).

The 5201 ethnobotanical publications included in this study contained 290006 non-distinct CRs. Managing (clustering, merging, and analysis) of a substantial volume of CRs is not an easy task. Hence, investigators divided the analysis into four time periods: (1) 1555 to 1800, (2) 1801–1900, (3) 1901–1950 and (4) 1951–2000.

Between 1555 and 1800 the highest peak in citations was 24, between 1801 and 1900 it was more than 28, between 1901 and 1950 it was more than 75, and lastly it increased to about 2200 during the last time period. The criterion for references to be excluded for the first three periods (i.e. 1555 (the earliest CRs) to 1800, 1801–1900 and 1901–1950) is 1. For the fourth time period (1951–2000), investigators used a minimum of 10 consistent with the study of (Haunschild et al., 2019a) after clustering and merging of reference variants.