Multinational evaluation of genetic diversity indicators for the Kunming-Montreal Global Biodiversity Framework
Data files
May 31, 2024 version files 17.91 MB
Abstract
Under the recently adopted Kunming-Montreal Global Biodiversity Framework, 196 Parties committed to report the status of genetic diversity for all species. To facilitate reporting, three genetic diversity indicators were developed, two of which focus on processes contributing to genetic diversity conservation: maintaining genetically distinct populations and ensuring populations are large enough to maintain genetic diversity. The major advantage of these indicators is that they can be estimated with or without DNA-based data. However, demonstrating their feasibility requires addressing the methodological challenges of using data gathered from diverse sources, across diverse taxonomic groups, and for countries of varying socioeconomic status and biodiversity levels. Here, we assess the genetic indicators for 919 taxa, representing 5,271 populations across nine countries, including megadiverse countries and developing economies. Eighty-three percent of taxa assessed had data available to calculate at least one indicator. Our results show that although the majority of species maintain most populations, 58% of species have populations too small to maintain genetic diversity. Moreover, genetic indicator values suggest that IUCN Red List status and other initiatives fail to assess genetic status, highlighting the critical importance of genetic indicators.
README: Multinational evaluation of genetic diversity indicators for the Kunming-Montreal Global Biodiversity Monitoring Framework
https://doi.org/10.5061/dryad.bk3j9kdkm
Data comes from the first multi-country assessment of genetic diversity status, with emphasis on the PM and Ne 500 indicators, including nine countries: Australia, Belgium, Colombia, France, Japan, Mexico, South Africa, Sweden, and the United States of America.
For all countries, some of the data collected was not cleaned or analysed in the associated publication, thus the variables are shared so that the processing scripts can run, but the values were removed. Data for these variables would be published as part of a follow-up paper. See the data dictionary for details.
Code/Software
Data was collected using a KoboToolBox (https://www.kobotoolbox.org/ form specifically designed for this project. The resulting dataset was downloaded as a .csv file and processed in R version 4.2.1 using custom functions and a processing pipeline specifically developed for this study for quality checking, indicator calculation, and subsequent analyses. The R code is available from https://github.com/AliciaMstt/GeneticIndicators or a static version at Zenodo: https://zenodo.org/records/10620307
Description of the data and file structure
Note on *.csv files: Open them with "UTF-8" encoding to properly see special characters of non-English languages present in the text.
- kobo_form.xlsx: xlsx version of the Kobo form (set of questions that participants answered for each species, just the questions). This form can be imported into KoboToolBox to start a new blank project.
- genetic_diversity_indicators_dictionary_dryad.xlsx: data dictionary explaining the meaning of each variable in the Kobo form and data files. The tab "kobo_form_variables" outlines variables in the kobo form and kobo outputs, including output files (kobo_output_clean.csv, ind1_data.csv, ind2_data.csv, ind3_data.csv, indicators_full.csv, metadata.csv), including the variable name as recorded in the output data, a variable description of each variable, and an example of real output values. The column comments contain comments on how the data was used in the associated paper of this repository, or reasons why the data could not be shared (e.g. because it contains sensitive information of endangered species). The tab "values_for_categ_variables" contains all the options available to the user for Kobo questions where the question type was "select_one" (user can select only one provided value option) or "select_multiple" (user can select one or more provided value options). The question_id is the unique identifier linked to a kobo question (see kobo_form_variables tab for information). The tab "processing_scripts_variables" describes variables created by the R functions or processing scripts provided in the GitHub repository https://github.com/AliciaMstt/GeneticIndicators as part of processing the kobo_output_clean.csv data. They are present in the output files ind1_data.csv, ind2_data.csv, ind3_data.csv, indicators_full.csv OR metadata.csv
- International_Genetic_Indicator_testing_V_4.0_-latest_version-False-_2023-11-02-08-23-26.csv: raw output of the Kobo form (answers to the form), as downloaded from the KoboToolBox platform. Notice that the original separator was ";", but in this file, the separator was converted to "," after obscuring Japan's data.
- kobo_output_clean.csv: "clean" kobo output data after processing the raw Kobo output with the script 2_cleaning.Rmd (see associated Github repository) to correct errors detected by `1_quality_check.Rmd, based on the feedback from the people who collected the data. The output kobo_output_clean.csv has the clean data that was used for analyses.
- processed_files.zip: zip file containing the population data for the species with >25 populations, which was submitted using a text template instead of the Kobo form. The files in "processed_files/original_files/* " are the files as they were submitted, and the files in "processed_files/processed_files/* " are the corrected version using the process_attached_files() function (see associated Github repository) to make them compatible with how population data as expected to estimate the Ne >500 indicator.
- metadata.csv: a subset of variables from kobo_output_clean.csv after running
get_metadata()
(see associated Github repository) to extract the metadata for taxa and indicators, in some cases creating new useful variables, like taxon name (joining Genus, species, etc) and if the taxon was assessed only a single time or multiple times. Each row is a taxon. - ind1_data.csv: data needed to estimate the Ne 500 indicator (the proportion of populations within species with an effective population size Ne greater than 500) in "long" format because, in the Kobo output, population data is in different columns. Data in this file comes from re-formatting the kobo_output_clean.csv data using the
get_indicator1_data()
function (see associated Github repository) so that population data is in rows and after runningtransform_to_Ne()
to get the Nc data from point or range estimates and transforms it to Ne multiplying for a ratio Ne:Nc. This is needed for downstream analyses. In the ind1_data.csv file, each population is a row, and there are as many rows per taxon as there are populations within it. - ind1_data_from_templates.csv: similar to ind1_data.csv, but for the species with >25 populations (see processed_files.zip).
- ind3_data.csv: subset of variables from kobo_output_clean.csv after running
get_indicator3_data()
(see associated Github repository) to extract the data needed to estimate DNA-based genetic monitoring indicator (number of species in which genetic diversity has been or is being monitored using DNA-based methods). Each row is a taxon. - indicators_full.csv: output of the script 3_manuscript_figures_analyses (see associated Github repository) stating the indicator value for indicator1 (Ne 500 indicator) and indicator 2 (proportion of maintained populations or PM indicator) and some metadata for each species assessment. Each row is an assessment of a species.
Methods
Data comes from the first multi-country assessment of genetic diversity status, with emphasis on the PM and Ne 500 indicators, including nine countries: Australia, Belgium, Colombia, France, Japan, Mexico, South Africa, Sweden, and the United States of America. Within each country, teams of researchers and conservation practitioners from academia, government institutions, and non-governmental organizations aimed to asses of 50-100 species per country. In total 919 taxa, representing 5,271 populations were assessed. Data comes from different sources depending on the country and the species.
Data was collected using a KoboToolBox (https://www.kobotoolbox.org/) form specifically designed for this project. The resulting dataset was downloaded as a .csv file and processed in R version 4.2.1 using custom functions and a processing pipeline specifically developed for this study for quality checking, indicator calculation, and subsequent analyses. The R code is available from https://github.com/AliciaMstt/GeneticIndicators or a static version at Zenodo: https://zenodo.org/records/10620307
Japan's raw data comes from the ancillary data for national red-list evaluation basically based on a field survey provided by the Japanese Society for Plant Systematics. Raw data was provided on the condition that no information that could lead to the species' location in the field would be published, as a way to protect the endangered species. Therefore, for this repository part of the Japan data was obscured or removed, as explained for each variable in the data dictionary.