Data from: Over three-quarters of earthworm species lack protection in China, a crisis exacerbated by climate change
Data files
Mar 30, 2026 version files 66.29 MB
-
ECOG70050.zip
66.29 MB
-
README.md
3.44 KB
Abstract
This dataset was systematically constructed to assess the distribution of earthworm diversity in China, the impacts of climate change, and conservation gaps. It integrates earthworm distribution records from the Global Biodiversity Information Facility (GBIF), the Chinese Earthworm Database, the Taiwan Earthworm Database, and published literature, spanning the period from 1986 to 2024. After rigorous coordinate validation, removal of duplicate records, and spatial thinning at a 5 km resolution, 5,334 valid records were obtained, covering 306 earthworm species. Additionally, for rare species with fewer than five records (accounting for approximately 51% of the total species), data were separately compiled for climate change exposure analysis (e.g., PCA climate distance calculation) to comprehensively evaluate the conservation needs of all known earthworm taxa. For environmental data, bioclimatic variables, soil properties, and topographic factors under current and future (2050s, 2090s) scenarios across three Shared Socioeconomic Pathways (SSP1-2.6, SSP3-7.0, SSP5-8.5) were collected. Following collinearity diagnostics and principal component analysis, 10 key predictor variables were selected. Based on the above data, a stacked species distribution model (SSDM) was employed to simulate the spatial patterns of earthworm species richness under current and future scenarios. This was combined with the boundaries of protected areas in China and the distribution of aboveground biodiversity (plants, vertebrates, etc.) to conduct conservation gap and spatial matching analyses. Below are the complete data files and variable descriptions.
Dataset DOI: 10.5061/dryad.k3j9kd5q8
Description of the data and file structure
This dataset was constructed through extensive compilation of earthworm distribution records from multiple sources, including the Global Biodiversity Information Facility (GBIF), the China Earthworm Database, the Taiwan Earthworm Database, and published literature. Records span the period from 1986 to July 2024 and underwent rigorous quality control: entries lacking precise coordinates or containing duplicate information were excluded, and spatial thinning (5-kilometer distance threshold) was applied to minimize sampling bias. The final dataset comprises 5,334 spatially unique records covering 306 earthworm species within China. Additionally, distribution data for species with fewer than five records (approximately 51% of total species) were retained for supplementary analyses (e.g., climatic distance calculations), ensuring that rare and endemic species remain considered in future analyses of earthworm species diversity distribution patterns.
Files and variables
File: ECOG70050.zip
Description:
Below is the complete data file and variable descriptions.
r/: Folder containing all R scripts
SSDM_modeling.R: Main script for the Stacked Species Distribution Model (SSDM).
pca_distance.R: Calculates the climate Euclidean distance to compare the climate change exposure of modeled versus unmodeled species.
gap_analysis.R: Protected Area Gap Analysis. This R script loads earthworm range loss and protected area (PA) data, cleans and reshapes it, performs Wilcoxon tests comparing losses inside vs outside PAs and across future climate scenarios, and visualizes results with plots of median range changes, time series of PA coverage, and proportion of species meeting 5% and 10% representation thresholds. It then arranges all plots into a 2×2 grid and saves them as a PDF.
/database: Stores processed earthworm distribution data.
model_earthworm.csv: 4,776 records of species data with more than 5 occurrences after deduplication, coordinate cleaning, and spatial thinning (5 km threshold), containing 131 species for species modeling and subsequent analysis.
unmodel_earthworm.csv: Species data with fewer than 5 records (not used for SSDM modeling, but used for PCA climate distance analysis)
conservation analysis.csv: Data used for gap analysis (derived from the original earthworm.xlsx).
future_dssdm: Future climate environmental variables used for species distribution modeling.
current_dssdm: Current climate environmental variables used for species distribution modeling.
The _dssdm folders have metadata generated from ArcGIS Desktop. The operation performed was converting a raster band (Band5_126_2070) to an ASCII file (bio5.ASC) using the RasterToASCII tool. The metadata keeps track of the tool path, time, and output location.
Code/software
R (version 3.6.2 or later) – All data processing and modeling.
ArcGIS Desktop 10.8
Tabular data (CSV files): Open with any spreadsheet software or text editor. Missing values are represented as blank cells or NA.
Access information
Other publicly accessible locations of the data:
- None
Data was derived from the following sources:
- None
