Knowledge from non-English-language studies broadens contributions to conservation policy and helps to tackle bias in biodiversity data
Data files
May 19, 2025 version files 63.30 KB
-
README.md
3.94 KB
-
suppmat_datasetLPI.csv
42.47 KB
-
SuppMat_journals.xlsx
16.88 KB
Abstract
Local ecological evidence is key to informing conservation. However, many global biodiversity indicators often neglect local ecological evidence published in languages other than English, potentially biassing our understanding of biodiversity trends in areas where English is not the dominant language. Brazil is a megadiverse country with a thriving national scientific publishing landscape. Here, using Brazil and a species abundance indicator as examples, we assess how well bilingual literature searches can both improve data coverage for a country where English is not the primary language and help tackle biases in biodiversity datasets.
We conducted a comprehensive screening of articles containing abundance data for vertebrates published in 59 Brazilian journals (articles in Portuguese or English) and 79 international English-only journals. These were grouped into three datasets according to journal origin and article language (Brazilian-Portuguese, Brazilian-English and International). We analysed the taxonomic, spatial and temporal coverage of the datasets, compared their average abundance trends and investigated predictors of such trends with a modelling approach.
Our results showed that including data published in Brazilian journals, especially those in Portuguese, strongly increased representation of Brazilian vertebrate species (by 10.1 times) and populations (by 7.6 times) in the dataset. Meanwhile, international journals featured a higher proportion of threatened species. There were no marked differences in spatial or temporal coverage between datasets, in spite of different bias towards infrastructures. Overall, while country-level trends in relative abundance did not substantially change with the addition of data from Brazilian journals, uncertainty considerably decreased. We found that population trends in international journals showed stronger and more frequent decreases in average abundance than those in national journals, regardless of whether the latter were published in Portuguese or English.
Policy implications. Collecting data from local sources markedly further strengthens global biodiversity databases by adding species not previously included in international datasets. Furthermore, the addition of these data helps to understand spatial and temporal biases that potentially influence abundance trends at both national and global level. We show how incorporating non-English-language studies in global databases and indicators could provide a more complete understanding of biodiversity trends and therefore better inform global conservation policy.
Dataset DOI: 10.5061/dryad.ngf1vhj68
Description of the data and file structure
We collected time-series of vertebrate population abundance suitable for entry into the LPD (livingplanetindex.org), which provides the repository for one of the indicators in the GBF, the Living Planet Index (LPI, Ledger et al., 2023).
We collected data from three sets of sources: a) Portuguese-language articles from Brazilian journals (hereafter “Brazilian-Portuguese” dataset), b) English-language articles from Brazilian journals (“Brazilian-English” dataset) and c) English-language articles from non-Brazilian journals (“International” dataset). For a) and b), we first compiled a list of Brazilian biodiversity-related journals using the list of non-English-language journals in ecology and conservation published by the translatE project (www.translatesciences.com) as a starting point. The International dataset was obtained from the LPD team and sourced from the 78 journals they routinely monitor as part of their ongoing data searches.
We screened a total of 59 Brazilian journals; of these, nine accept articles only in English, 13 only in Portuguese and 37 in both languages. We systematically checked all articles of all issues published between 1990 and 2015. Articles that appeared to contain abundance data for vertebrate species based on title and/or abstract were further evaluated by reading the material and methods section. For an article to be included in our dataset, we followed the criteria applied for inclusion into the LPD (livingplanetindex.org/about_index#data): a) data must have been collected using comparable methods for at least two years for the same population, and b) units must be of population size, either a direct measure such as population counts or densities, or indices, or a reliable proxy such as breeding pairs, capture per unit effort or measures of biomass for a single species (e.g. fish data are often available in one of the latter two formats).
Files and variables
File: SuppMat_journals.xlsx
Description: Journals from which data was screened
Variables
- Publication: Name of the journal
- Numbe_years_journal_19902015: Number of years in which journals has published issues (in the 1990-2015 period)
- Number_articles_19902015: Total number of articles published from 1990 to 2015
- Number_relevant_articles_19902015: Total number of articles with relevant data on changes of population abundance
- Percentage_relevant_articles_19902015: Relative number of articles with relevant data on changes of population abundance (number of relevant articles divided by total number of articles)
- Origin: Country of the journal, either Brazilian or International (non-Brazilian)
File: suppmat_datasetLPI.csv
Description:
Variables
- Language: Language of the article
- Authors: Names of the authors
- Year: Year of publication
- Title English (provided by translator if not available): Title in English
- Title_original: Title as published
- Journal: Name of the journal
- Volume: Volume in which the article was published
- Issue: Issue in which the article was published
- Pages: Pages of the article
Code/software
QGIS version 3.6 (QGIS Development Team, 2019)
R 4.4.1 (R Core Team, 2024)
- vegan (Jari Oksanen et al., 2024)
- sampbias (Zizka, Antonelli and Silvestro, 2021)
- rlpi (Freeman et al., 2017)
Access information
Other publicly accessible locations of the data:
Data was derived from the following sources:
Data collection
We collected time-series of vertebrate population abundance suitable for entry into the LPD (livingplanetindex.org), which provides the repository for one of the indicators in the GBF, the Living Planet Index (LPI, Ledger et al., 2023). Despite the continuous addition of new data, LPI coverage remains incomplete for some regions (Living Planet Report 2024 – A System in Peril, 2024). We collected data from three sets of sources: a) Portuguese-language articles from Brazilian journals (hereafter “Brazilian-Portuguese” dataset), b) English-language articles from Brazilian journals (“Brazilian-English” dataset) and c) English-language articles from non-Brazilian journals (“International” dataset). For a) and b), we first compiled a list of Brazilian biodiversity-related journals using the list of non-English-language journals in ecology and conservation published by the translatE project (www.translatesciences.com) as a starting point. The International dataset was obtained from the LPD team and sourced from the 78 journals they routinely monitor as part of their ongoing data searches.
We excluded journals whose scope was not relevant to our work (e.g. those focusing on agroforestry or crop science), and taxon-specific journals (e.g. South American Journal of Herpetology) since they could introduce taxonomic bias to the data collection process. We considered only articles published between 1990 and 2015, and thus further excluded journals that published articles exclusively outside of this timeframe. We chose this period because of higher data availability (Deinet et al., 2024), since less monitoring took place in earlier decades, and data availability for the last decade is also not as high as there is a lag between data being collected and trends becoming available in the literature. Finally, we excluded any journals that had inactive links or that were no longer available online. While we acknowledge that biodiversity data are available from a wider range of sources (grey literature, online databases, university theses etc.), here we limited our searches to peer-reviewed journals and articles published within a specific timeframe to standardise data collection and allow for comparison between datasets.
We screened a total of 59 Brazilian journals; of these, nine accept articles only in English, 13 only in Portuguese and 37 in both languages. We systematically checked all articles of all issues published between 1990 and 2015. Articles that appeared to contain abundance data for vertebrate species based on title and/or abstract were further evaluated by reading the material and methods section. For an article to be included in our dataset, we followed the criteria applied for inclusion into the LPD (livingplanetindex.org/about_index#data): a) data must have been collected using comparable methods for at least two years for the same population, and b) units must be of population size, either a direct measure such as population counts or densities, or indices, or a reliable proxy such as breeding pairs, capture per unit effort or measures of biomass for a single species (e.g. fish data are often available in one of the latter two formats).
Assessing search effectiveness and dataset representation
We calculated the encounter rate of relevant articles (i.e. those that satisfied the criteria for inclusion in our datasets) for each journal as the proportion of such articles relative to the total number of articles screened for that journal. We assessed the taxonomic representation of each dataset by calculating the percentage of species of each vertebrate group (all fishes combined, amphibians, reptiles, birds and mammals) with relevant abundance data in relation to the number of species of these groups known to occur in Brazil. The total number of known species for each taxon was compiled from national-level sources (amphibians, Segalla et al. 2021; birds, (Pacheco et al., 2021); mammals, Abreu et al. 2022; reptiles, Costa, Guedes and Bérnils, 2022) or through online databases (Fishbase, Froese and Pauly, 2024). We calculated accumulation curves using 1,000 permutations and applying the rarefaction method, using the vegan package (Jari Oksanen et al., 2024). These represent the cumulative number of new species added with each article containing relevant data, allowing us to assess how additional data collection could increase coverage of abundance data across datasets. To compare species threat status among datasets, we used the category for each species available in the Brazilian (‘Sistema de Avaliação do Risco de Extinção da Biodiversidade – SALVE’, 2024) and IUCN Red List (IUCN, 2024), and calculated the percentage of species in each category per dataset.
To assess and compare the temporal coverage of the different datasets, we calculated the number of populations and species across time. To assess geographic gaps, we mapped the locations of each population using QGIS version 3.6 (QGIS Development Team, 2019). We then quantified the bias of terrestrial records towards proximity to infrastructures (airports, cities, roads and waterbodies) at a 0.5º resolution (circa 55.5 km x 55.5 km at the equator) and a 2º buffer using posterior weights from the R package sampbias (Zizka, Antonelli and Silvestro, 2021). Higher posterior weights indicate stronger bias effect.
Generalised linear mixed models and population abundance trends
We used the rlpi R package (Freeman et al., 2017) to calculate trends in relative abundance. We calculated the average lambda (logged annual rate of change) for each time-series by averaging the lambda values across all years between the start and the end year of the time-series. We then built generalised linear mixed models (GLMM) to test how average lambdas changed across language (Portuguese vs English), journal origin (national vs international), and taxonomic group, using location, journal name, and species as random intercepts (Table 1). We offset these by the number of sampled years to adjust summed lambda to a standardised measure, to allow comparison across different observations with different length of time series and plotted the beta coefficients (effect sizes) of all factors. Finally, we performed a post-hoc test to check pairwise differences between taxonomic groups (Table S2).
To assess the influence of national-level data on global trends in relative abundance, we calculated the trends for both the International dataset and the two combined Brazilian datasets (Brazilian-Portuguese and Brazilian-English), using only years for which data were available for more than one species, to be able to estimate trend variation. We also plotted the trends for the Brazilian datasets separately. All analyses were performed in R 4.4.1 (R Core Team, 2024).