Data and code from: Crop yields are not greater outside centers of origin
Data files
May 21, 2026 version files 77.43 MB
-
DryadCropCenterOrigin.zip
77.41 MB
-
ReadMe_Metadata_TabularData.xlsx
13.08 KB
-
README.md
3.49 KB
Abstract
Managing pest and pathogen populations for crops is vital to global food production. Anderson’s law states that the farther you get from a crop’s center of origin, the more of pests you leave behind. This hypothesis has helped fuel the replacement of indigenous crops with introduced varieties. Here, we find no evidence for a yield benefit by growing crops outside their centers of origin. Instead, agricultural inputs, including fertilizer and climate, are the best predictors of global crop yield. The data and code here can be used to reproduce analyses of this study.
Dataset overview
A detailed description of the general framework and specific methodology that were followed in order to generate and process all included files, can be found in the relevant publication (see below).
This dataset has been generated using published previously datasets of crop distributions (gbif), crop yields and fertilizers (Earthstat), pesticide application (PEST CHEM-GRIDS), and climate (CHELSA).
ALL CONTENTS IN "DryadCropCenterOrigin.zip":
Directory and file descriptions
File "ReadMe_Metadata_TabularData.xlsx" has information on column names for all tabular files. Empty cells/fields are intentional unless otherwise noted.
Code:
gcos_wildrelatives.R: Main script for analyzing global centers of crop origin and wild relatives.
Code/main_model_pipeline.R: Primary modeling workflow for crop center analyses, including data preparation, model fitting, and output generation.
Code/SamplingRasters_fixed.R: Samples raster covariates at crop and wild relative point locations.
Code/gco_nullmodel_ols.R: Null model analysis using ordinary least squares.
Code/gco_nullmodels_spatial.R: Spatial null model analysis accounting for spatial structure.
Code/power_analysis.R: Power analysis for evaluating sampling design and model sensitivity.
Code/CorrelationPlot.R: Generates correlation plots among environmental and agricultural predictors.
Code/plots_R2.R: Generates model R² plots.
Code/plots_RMSE.R: Generates RMSE plots.
Code/plots_morani.R: Generates Moran’s I plots for spatial autocorrelation.
Code/plots_variableimportance.R: Generates variable importance plots.
Code/marginal_effects/me_binary_main.R: Marginal effects analysis for the main binary crop origin model.
Code/marginal_effects/me_MAP.R: Marginal effects analysis for mean annual precipitation.
Code/marginal_effects/me_MAT.R: Marginal effects analysis for mean annual temperature.
Code/marginal_effects/me_Phosphorus.R: Marginal effects analysis for soil phosphorus.
Code/marginal_effects/me_fertilizer.R: Marginal effects analysis for fertilizer-related predictors.
Code/marginal_effects/me_pesticide.R: Marginal effects analysis for pesticide-related predictors.
Data:
Data/all_crop_points_100k_fixed.csv: Main crop point dataset used for sampling raster covariates and fitting models.
Data/gcos/crop_gco_wild_relatives.rds: Crop center and wild relative dataset used in the global center of origin analyses.
Data/models/results_gco_ols_1000_samples.csv: Model output from OLS analyses using 1,000 sampled GCOs.
Data/models/results_gco_nullmodels_sem_500_samples_1000GCOs.csv: Output from spatial null model analyses using 500 samples and 1,000 GCOs.
Data/models/yield/: Folder containing yield model outputs.
Data/models/yield_efficiency/: Folder containing yield efficiency model outputs.
Data/models/pesticide_efficiency/: Folder containing pesticide efficiency model outputs.
Data/rasters/: Folder containing raster covariates used for geospatial sampling and model prediction.
Figures:
figures/: Output folder for figures generated by the analysis scripts.
figures/sampled_points/: Figures showing sampled crop or wild relative points.
Software Requirements
R version 4.1 with packages including: terra, sf, tidyverse, ggplot2, ranger, caret
These data were collected by a literature review (see manuscript for details), and come from the Earthstat, TerraClimate, FAO, and Worldbank. Major files include crop_gco_wild_relatives.rds and gcos_wildrelatives.R.
Data can be opened in GIS software and R.
