Skip to main content

A gap analysis modeling framework to prioritize collecting for ex situ conservation of crop landraces

Cite this dataset

Ramirez-Villegas, Julian et al. (2021). A gap analysis modeling framework to prioritize collecting for ex situ conservation of crop landraces [Dataset]. Dryad.


Aim: The conservation and effective use of crop genetic diversity is crucial to overcome challenges related to human nutrition and agricultural sustainability. Farmers’ traditional varieties (“landraces”) are major sources of genetic variation. The degree of representation of crop landrace diversity in ex situ conservation is poorly understood, partly due to a lack of methods that can negotiate both the anthropogenic and environmental determinants of their geographic distributions. Here we describe a novel spatial modeling and ex situ conservation gap analysis modeling framework for crop landraces, using common bean (Phaseolus vulgaris L.) as a case study.

Location: The Americas

Methods: The modeling framework includes five main steps: (1) determining relevant landrace groups using literature to develop and test classification models; (2) modeling the potential geographic distributions of these groups using occurrence (landrace presences) combined with environmental and socioeconomic predictor data; (3) calculating geographic and environmental gap scores for current genebank collections; (4) mapping ex situ conservation gaps; and (5) compiling expert inputs.

Results: Modeled distributions and conservation gaps for the two genepools of common bean (Andean and Mesoamerican) were robustly predicted and align well with expert opinions. Both genepools are relatively well conserved, with Andean ex situ collections representing 78.5% and Mesoamerican 98.2% of their predicted geographic distributions. Modelling revealed additional collection priorities for Andean landraces occur primarily in Chile, Peru, Colombia and, to a lesser extent, in Venezuela. Mesoamerican landrace collecting priorities are concentrated in Mexico, Belize, and Guatemala.

Conclusions: The modeling framework represents an advance in tools that can be deployed to model the geographic distributions of cultivated crop diversity, to assess the comprehensiveness of conservation of this diversity ex situ, and to highlight geographic areas where further collecting may be conducted to fill gaps in ex situ conservation.


Our distribution modeling and conservation gap analysis modeling framework requires geographic occurrence (presence) data for landraces, and information on the locations where these landraces have been previously collected for conservation ex situ, as well as characterization data on the landrace accessions. To assess the world’s common bean landrace collections, we compiled available genebank accession-level passport (i.e., site where collected) data from major online germplasm databases, including the Genesys plant genetic resources portal and the United Nations Food and Agriculture Organization World Information and Early Warning System on Plant Genetic Resources for Food and Agriculture (WIEWS). To ensure inclusion of the crop’s majorgermplasm collections, we specifically gathered occurrence and characterization data from the CIAT database, freely available at and from the United States Department of Agriculture (USDA) Genetic Resources Information Network (GRIN)–Global.

Additional occurrences were gathered from the Global Biodiversity Information Facility (GBIF), which contained 25,670 observations from herbaria, botanic gardens, and other plant repositories, to provide independent data from non-genebank sources. We compiled the datasets into a single database and performed a thorough quality check of all records. Duplicated observations were eliminated with preference to maintain original data, e.g., USDA-GRIN or CGIAR records included in Genesys or WIEWS were discarded. Coordinates were corrected, or if not possible, eliminated, when latitude and longitude were equal to zero, located in inland water bodies or in the ocean, located in the wrong country, had an inverted sign in the latitude and/or longitude, or had low coordinate precision (i.e. with less than 2 decimal places).

With the aim of compiling a robust global dataset of important environmental and anthropogenic drivers of the geographic distributions of crop landraces, we gathered and/or calculated spatially explicit (gridded) information for a total of 50 potential predictors, including climate, topography, diversity and domestication, and socioeconomic variables. These were extracted for each occurrence point location. For climate, we used a total of 40 variables, derived from a combination of the WorldClim version 2and the Environmental Rasters for Ecological Modeling (ENVIREM) databases. We included topography from the Shuttle Radar Topography Mission (SRTM) dataset of the CGIAR-Consortium on Geospatial Information (CSI) portal. Two crop genetic diversity and domestication proxy variables were included, namely, the distance to known common bean wild relative populations, and the distance to human settlements before year AD 1500. Regarding socioeconomic variables (8 in total) we included datasets on the geographic distribution of ethnic groups; crop yield, harvested area, and crop production quantity (You et al., 2017); population density; population accessibility; distance to navigable rivers; and percentage of area under irrigation. All spatial predictor data were scaled to or computed on a common 2.5 arc-min grid, using the Geographic Coordinate System (GCS) with WGS84 as datum.

Usage notes

See Readme file. 


CGIAR Genebanks Platform