Data from: Increased Holocene diversity in Europe linked to human-associated vegetation change

Gordon, Jonathan 1 2 ; Fagan, Brennen1 2; Githumbi, Esther3; Milner, Nicky1 2; Thomas, Chris D1 2

Published Dec 21, 2025 on Dryad. https://doi.org/10.5061/dryad.h9w0vt4tm

Data files

Dec 21, 2025 version files 23.68 GB

code.tar

316.42 KB
data.tar

23.68 GB
README.md

34.52 KB

Abstract

It is widely reported that aspects of present-day global biodiversity are declining, with humans largely to blame. However – perhaps paradoxically – in Europe, floristic diversity and human populations have grown in tandem for millennia. Disturbance intensity and habitat heterogeneity potentially explain this phenomenon, though we lack understanding of how human land use intensity affected biodiversity at numerous spatial scales over the Holocene. In this work, we examined the spatio-temporal dynamics of, and relationships between, floristic richness, evennes,s and compositional turnover with an index of anthropogenic vegetation change (frequencies of human-associated pollen types) since 11,700 cal yr BP, analysing 7,853 pollen samples from 213 records (sites). We evaluated how changes to the proportional site occupancies of human-associated and other taxa related to diversity patterns. We found that (1) Floristic richness, evenness, and compositional turnover all increased from 9,000 years ago to 1850 CE. (2) Temporal increases in richness and evenness were positively associated with the anthropogenic vegetation index at the majority of vegetation zones (~biome) and sites, whereas compositional turnover was only associated with the anthropogenic index at the site level. (3) Holocene site occupancies of all human-associated taxa were positively associated with biodiversity gains, whereas the results for other taxa (that were not associated with people) were mixed. All data for these analyses are freely available and, where possible, provided. Where reuse licences prohibit the republishing of data, citations are provided for the user to download the data. These analyses are very computationally demanding and thus intermediate and output data products have been provided.

Authors

Jonathan D. Gordon*, Brennen Fagan, Esther Githumbi, Nicky Milner, Chris D. Thomas,
*Corresponding author: jonny.gordon@york.ac.uk

This work was funded by a Leverhulme Trust Research Centre - The Leverhulme Centre for Anthropocene Biodiversity (grant number: RC-2018-021). Any queries should be sent to the corresponding author.

Project description

The majority of the analysis scripts for the manuscript 'Increased Holocene diversity in Europe linked to human-associated vegetation change' were run on the University of York's High Performance Computing cluster, Viking2.
Without using computing resources such as these, the computational time required to run these analyses in full is intractable, though the user may run the scripts in this folder on a reduced number of pollen records, with a reduced number of resamples. Alternatively, intermediate files have been provided (detailed under Processed data files) to run sections of the analyses.

Data

Raw data data.tar

All pollen data are open access from Neotoma, accessed using the neotoma2 R package.
Climate data are open access from Arthur et al (2023), cited in-text.
AIV values digitised from Deza-Araujo et al (2022), ./data/aiv_vals.csv.
Pollen harmonisation table, ./data/Europe_harmtable_RV_types_UPDATED.xlsx.
REVEALS inputs available from Githumbi et al (2021), cited in-text.
LRA code from Abraham et al (2014), cited in-text.
Archaeological radiocarbon dates for SPDs are from Bird et al (2022), cited in-text.
KK10 Anthropogenic land cover change scenario from Kaplan et al (2010), cited in-text.
HYDE population and land use estimates from Klein Goldewijk et al (2017), cited in-text.

Processed data files

Predicted ages (1,000 simulations) per pollen sample, ./data/age_depth_mods.RDS.
Final geochronological control table, ./data/filtered_geochron_tables.RDS.
Per record & sample REVEALS estimates, ./data/REVEALS_estimates_Neotoma_data.RDS.
All 1,000 diversity/turnover and LUPi resamples, ./data/diversity_LUPi_resamples.RDS.
Average European diversity ~ pollen type occurrences, ./data/all_occurrences.RDS.
Per-record & sample climate layers, ./data/per_record_climate_vars_df.RDS.
Per-record & sample radiocarbon SPDs, ./data/radiocarbon_SPDs.
Per-record and sample KK10 estimates, ./data/KK10_per_record.RDS.
Per-record and sample HYDE estimates, ./data/HYDE_Population_PiP.RDS.
Final fitted spatiotemporal HGAM that underlies Fig. 1 (1 of 1,000 resamples), ./data/spatiotemporal_outs/.
Final fitted HGAMs that underlie Fig. 2, ./data/mixed_model_mods/ and ./data/perveg_model_mods.

All analyses are run in R (R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria.URL
https://www.R-project.org/.)

Code

code.tar

Run the scripts in the following order:

01_neotoma_download_Europe.R (download pollen and geochronological data)
02_global_chronology_table_prep.R (prepare chron. tables, slice top and bottom depths for sequence for which only part meets the inclusion criteria)
03_HPC_europe_run_bchron.R (age-depth models)
04_HPC_load_age_depth_mods.R (load age depth models into df ready for analyses)
05_PiP_calculation.R (subset pollen records, calculate which vegetation zone each pollen record is present in)
06_RV_file_format_Neotoma_data.R (prepares pollen data for REVEALS)
07_RV_run_REVEALS_Neotoma_data.R (runs REVEALS)
08_HPC_REVEALS_diversity.R (filter, resample, calculate diversity metrics)
09_HPC_REVEALS_LUPi.R (filter, resample, calculate LUPi)
10_HPC_per_record_clim.R (subset per record climate data from climate surfaces)
11_load_per_record_palaeoclim.R (compile annual palaeoclim slices into timeseries)
12_load_diversity_LUPi_resamples.R (load data from HPC array outputs into dataframe)
13_HPC_human_clim_models_per_veg.R (per vegetation models, all combos of predictor variables)
14_HPC_human_clim_models_europe.R (whole Europe models, all combos of predictor variables)
15_AIC_comparisons.R (compute delta AICs for cohorts of models per metric * region
16_HPC_human_clim_models_perveg_bestmods.R(per vegetation models, but only for the best set of predictors per metric * region combination)
17_HPC_human_clim_models_europe_bestmods.R(whole Europe models, but only for the best set of predictors per metric)
18_HPC_spatiotemporal_mods.R (fit spatiotemporal model to each pollen metric)
19_sample_sizes_map.R (compute sample sizes and plot supplementary data description figure)
20_HPC_generate_SPDs.R (generate local-in-space radiocarbon SPD curves per pollen record)
21_KK10_intersection.R (sample KK10 surface for each pollen record location)
22_human_indicator_comparison.R (sample HYDE surfaces and compare LUPi with the three other human indices)
23_HPC_diversity_taxon_occurrences.R (generate pollen record proportional site occupancies and average diversity and turnover data)
24_diversity_taxon_occurrences_ARIMA.R (ARIMA modelling of diversity and turnover ~ occupancy, and plots)
25_comparison_PseudoR2.R (extract and plot pseudo-R^2 values for models with increasingly complex sets of predictors)
26_plot_timseries.R (plot preds from spatiotemporal models through time and across a spatiotemporal grid, Fig. 1)
27_plot_model_estimates.R (extract and plot slope estimates, Figs 2 & 3, and save summaries for Appendix 3)

The authors has provided the .txt files used to send the HPC_* scripts to the Viking2 high performance computing cluster (BASH scripts), which they user may use with their own HPC cluster and/or to give an indication of the memory usage/time allocation to run the full analysis in part locally.

Full data descriptions|

File	Description
`.data/mixed_model_mods/`	Folder contains the {mgcv} Europe-wide model fits for all metrics (evenness, richness and turnover)
`.data/perveg_model_mods/`	Folder contains the {mgcv} per-vegetation zone model fits for all metrics. File name structure: metric_vegetationzone.
`.data/radiocarbon_SPDs/`	Folder contains the per-record SPD timeseries. Each file within `/radiocarbon_SPDs` is named by the Neotoma pollen record ID.ColumnscalBP = calibrated years before radiocarbon present (date).PrDens = Probability density (SPD estimate).datasetid = Neotoma unique pollen record ID.buff = Buffer radius (in meters) surrounding pollen records from within which radiocarbon dates are sampled.
`./data/spatiotemporal_outs/resample_1.RDS`	Europe-wide {mgcv} spatiotemporal model
`.data/age_depth_mods.RDS`	1,000 age simulations per level, per pollen record.Columnsdepths = depth of sample in sequence (in centimeters).median_age = median age in calibrated years before radiocarbon present across the 1,000 simulations.collunit = Neotoma unique collection unit ID.Columns 4 (draw_1) to 1004 (draw_1000) represent individual age simulations (in calibrated years before radiocarbon present) per depth.
`.data/all_occurrences.RDS`	Diversity and per-pollen type proportional range size dataset. Dataset is a list, whereby each list element (1-588) represents a pollen type. All list elements have the identical structure.Columnsbin = time bin (in calibrated years before radiocarbon present). taxon = pollen type.LUPi_Rating = agricultural Land Use Probility Index Rating (Rating scale goes from worst to best, 1-4). av_prop_presence = average proportional presence of taxonacross records (as a proportion).mean_richness = mean richness (number of unique pollen types).mean_evenness = mean evenness (dominance of abundances across pollen types, Pielou's J).mean_turnover = mean compositional change (Bray-Curtis).res = resample (1-1000 simulations)
`.data/best_models_prec_re_only.RDS`	Full set of model summaries, across the different sets of candidate predictor variables for the Europe-wide and per-vegetation zone models. Dataset is a list of 2 dataframes, with list element 1 representing the per-vegetation zone summaries, and element 2 representing the European summaries. Both elements have the same structure. Columnsmodel_name = model name.AIC = Akaike Information Criterion for model (model 'quality' estimate).r_sq = Adjusted R^2 for model.metric = response variable.zone = vegetation zone model relates to.vars = index for set of predictor variables from total candidate pool.min_AIC = minimum Akaike Information Criterion value across the total set of models computed per diversity metric x vegetation zone combination. delta_AIC = the difference between the model's Akaike Information Criterion value and the lowest Akaike Information Criterion for that diversity metric x vegetation zone combination.
`data/diversity_LUPi_resamples.RDS`	Full set of 1,000 richness, evenness, turnover and LUPi resamples. Dataset is a list, with each list element representing a resample (1-1000). All list elements have the same structure.Columnsdatasetid = Neotoma unique pollen record ID.age_draw = age estimate (calibrated years before radiocarbon present).depth = depth of sample in sequence (in centimeters).richness = richness (number of unique pollen types).evenness = evenness (dominance of abundances across pollen types, Pielou's J).turn_norm = compositional change (Bray-Curtis).resample = resample (1-1000 simulations).datasetid = Neotoma unique pollen record ID.n_chron_controls = Number of chronological control points per record.lat = Latitude in Decimal Degrees.long = Longitude in Decimal Degrees.collunitid = = Neotoma unique collection unit ID.n_samples = Number of samples (levels) per pollen sequence.resample_n = rarefaction value (number of pollen grains).analysis = Analysis run (REVEALS or RAW).LUP_index = agricultural Land Use Probility Index (counts).Vegetati_1 = Vegetation zone (minor).Zone= Vegetation zone (major)
`./data/Europe_harmtable_RV_types_UPDATED.xlsx`	Pollen harmonisation table, updated for set of pollen records included in this study from Birks et al (2023; cited in-text).Columnsvariablename = Neotoma pollen type name.variablename_clean = Cleaned Neotoma pollen type name.level_1 = Harmonised type category name
`./data/filtered_geochron_tables.RDS`	Table of geochronological control points for those included records.Columnsdepth = depth of sample in sequence (in centimeters).thickness section thickness (in centimeters).agelimitolder = Upper age estimate (in years before radiocarbon present).agelimityounger = Lower age estimate (in years before radiocarbon present).chroncontrolage = Central age estimate (in years before radiocarbon present).chroncontroltype = Type of chronological control (radiocarbon date, tephra, section top, etc.).agelimityounger = Lower age estimate (in years before radiocarbon present).chronologyid = Neotoma unique chronology ID.datasetid = Neotoma unique pollen record ID.lat = Latitude in Decimal Degrees.long = Longitude in Decimal Degrees.calcurve = radiocarbon calibration curve.error = chronological control point error (SD).n_dates = Number of chronological control points per record.age_diff = Interval between successive chronological control points (in years).contin_run = Continuous run (Y/N, does sample meet chronological control sampling criteria?).partial_record = Continuous run (T/F, does record meet chronological control sampling criteria?).record_duration = total record duration (years)
`./data/HYDE_Population_PiP.RDS`	Per record population estimates from the History Database of the Global Environment (HYDE version 3.2) Dataset is a list of 53 elements, each of which is a dataframe of the same structure.Columnsdatasetid = Neotoma unique pollen record ID.value = HYDE population estimate.time = Age in BC/AD.value = HYDE population estimate.
`./data/KK10_per_record.RDS`	Per record past human land use estimates from the 'Krumhardt-Kaplan 2010' anthropogenic land cover change simulation.Columnsdatasetid = Neotoma unique pollen record ID.median_landuse = Human land use estimate.slice = Age (in calibrated years before radiocarbon present).
`./data/LUP_harmonisation_table_and_taxa_ratings.xlsx`	Pollen type harmonisation table for the agricultural land use probability index.Columnsneotoma_name = Neotoma pollen type name.neotoma_name_clean = Cleaned Neotoma pollen type name.LUP_name_clean = Cleaned agricultural land use probability index name.Rating = Agricultural land use probability index rating (Rating scale goes from worst to best, 1-4).
`./data/per_record_climate_vars_df.RDS`	Per pollen record palaeoclimate data (temperature and precipitation).Columnsdatasetid = Neotoma unique pollen record ID.Tann = Temperature in degrees centigrade.Acc = Precipitation in metres per year.index = Model time (index).time = Age (years before present).
`./data/PiP_filtered.RDS`	Per pollen record spatial partition into Lang's vegetation zones.Columnsdatasetid = Neotoma unique pollen record ID.Id = Vegetation zone ID (major).Vegetation = Vegetation zone ID (major).Vegetati_1 = Vegetation zone (minor).Zone= Vegetation zone (major).lat = Latitude in Decimal Degrees.long = Longitude in Decimal Degrees.geometry = Spatial polygon (coordinate reference system = 4326).
`./data/REVEALS_estimates_Neotoma_data.RDS`	Past vegetation cover estimates per pollen sample, from the Regional Estimates of VEgetation Abundance from Large Sites model. Dataset is a list of 694 elements, with each element representing a pollen record. Each element is a dataframe of proportional cover estimates per taxon (each of which is represented by its own column) and all list elements have the same structure.Columnsdatasetid = Neotoma unique pollen record ID.br>depth = depth of sample in sequence (in centremeters).Columns 3 - 33 represent proportional vegetation estimates for individual taxa:"Abies.alba" "Alnus.glutinosa" "Amanranthaceae.Chenopodiaceae" "Artemisia" "Betula" "Buxus.sempervirens" "Calluna.vulgaris" "Carpinus.betulus" "Carpinus.orientalis" "Castanea" "Cerealia.t" "Corylus.avellana" "Cyperaceae" "Ericaceae" "Fagus.sylvatica" "Filipendula" "Fraxinus" "Juniperus" "Phillyrea" "Picea" "Pinus" "Pistacia" "Plantago.lanceolata.type" "Poaceae" "Quercus.deciduous" "Quercus.evergreen" "Rumex.acetosa.t" "Salix" "Secale" "Tilia" "Ulmus"