Thesaurus for Sustainable Development Goals
Data files
Aug 15, 2023 version files 220.79 KB
Abstract
This paper addresses the important question of how national research systems can support the implementation of the United Nations' 17 sustainable development goals (SDGs) set out in the 2030 agenda. Much attention on this topic has so far coalesced around understanding and measuring possible synergies and trade-offs that emerge in the SDGs. We contribute to this discussion by arguing that it is necessary to move from a focus on system interaction towards system transformation. A conceptual approach is presented based on the notion that research that “builds bridges” between science and technology and the social and environmental pillars of sustainable development can more fully support simultaneous achievement of the SDGs and thus be transformational. This proposition is put to the test empirically through a study of the Mexican research system using methods from bibliometrics and social network analysis. Our results can help to provide a diagnostic of how research systems are approaching SDGs and where potential exists for transformative research.
Methods
Our thesaurus with 2,103 search items was constructed by means of the following steps.
1. We extracted key terms from the UN official list of Goals, Targets and Indicators (884 terms).
2. In order to enrich our thesaurus, we searched a preliminary set with 884 search items in Web of Science and SciELO so as to identify a first set of papers related to SDGs. It permitted us to identify synonyms and academic jargon related to SDGs in scientific publications. We only picked up very frequent keywords which were deeply related to SDGs.
3. We also identified that there were some items that are not considered to have a social or environmental sustainability perspective, for example, “economic growth” on its own. A more careful analysis of the words chosen for inclusion was therefore undertaken, particularly around SDG goals 8, 9 and 17. Here it was decided to include only words which have some normative association with sustainable futures whether social or environmental in the article search items. For instance, in SDG 8, we only included bibliometric sources related to economic issues that also contain sustainability and/or directionality topics. The result is that our dataset and results will be biased towards references that include social or sustainable objectives. This will result in some SDGs being underreported but is a more accurate reflection that addresses the SDGs.
4. In order to overcome partly this bias, we enriched our data set by selecting some of the keywords suggested by Duran-Silva et al., (2019) (SIRIS project) and the Colombian Green Book (2018). Their method was based on machine learning and literature review around SDGs. This dataset permitted us to enrich the synonyms term of our data set and therefore to extend the scope of the scientific publications labelled. However, we did not take all their keywords due to some drawbacks, including a large range of keywords for each SDG (SDG 3 has around 622 keywords compared to SDG 1 with 101) which can generate bias to compare the number of publications across SDGs.
5. Since we identified that the number of keywords may have a high correlation with the number of scientific publications labelled. Therefore, the number of search items in every SDG was limited to between 100 and 140
6. Lastly, search items were categorized into two groups. A first group of 781 search items) that are always together, such as “climate change” or “clean energy” (type N1). A second group consisted of 1,322 topics (type N2) that mix two or more terms that are not always together such as “economic” and “sustainability”
Note:
1. We used as training data scientific publications from WoS and SciELO Citation Index related to Colombia, Mexico and Brazil Research. This thesaurus was developed to study SDGs publication in Latin America. However, it could be extended to other regions.
2. Once identified those sources that have at least one tag from the thesaurus, we proceed to evaluate which proportion of them correspond to misidentifications by analysing a random sample of 1,000 sources in SciELO and 3,000 in WoS. Thus, we excluded research areas with a proportion of misidentifications larger than 70%. Acoustics, astronomy and astrophysics, crystallography, optics, history, literature, instrumentation, mineralogy, mining and mineral processing, sport sciences, metallurgy and metallurgical engineering, arts and humanities, mathematics and physics).
3. We used a stemming words technique to transform our search terms into a standard form of every word. It permits to increase the labelling of match scientific publications by reducing the mismatching due to plurals or the different parts of speech (Nouns, Verbs, Adjectives, and Adverbs). Therefore, in order to use this thesaurus, you must use the same technique to stem the text where you are searching these search terms.