Data from: Burrowing into the past: Extending niche space models of procellariiform breeding grounds by merging fossil and historic data
Data files
Jun 11, 2025 version files 15.20 GB
-
README.md
6.75 KB
-
seabird_paleoenm.zip
15.20 GB
Abstract
Aim
Predicting species’ potential distributions and niches requires multi-scale data encompassing the past and present. Increasingly, researchers have advocated using historical context to inform ecological niche models (ENMs). Two key sources of past distributions are fossils and historical records. Fossils are subject to sampling and taphonomy biases but offer insights into temporal dynamics over millennia. Historical records are filtered by human perceptions over a shorter temporal window, but compared to fossils, they provide different contextual information from a potentially broader range of habitats. New Zealand (NZ) has a relatively short history of human occupation, with rich fossil and historical literature archives. Approximately 25% of the world’s seabirds breed in NZ, nearly half of which are burrowing procellariiforms. Since human arrival in NZ, most procellariiforms have declined in abundance and breeding ranges, primarily due to introduced mammalian predators. We combined record sources to improve ENMs of burrowing procellariiform breeding colonies and reconstruct narratives of decline.
Location
Aotearoa New Zealand.
Methods
We fitted ENMs using a maximum entropy algorithm and mixed principal component analysis for four sets of occurrence records (fossil, historic, historic + fossil, and post-1990) of burrowing procellariiform breeding colonies, where taxa were grouped by functional traits.
Results
For all procellariiform trait groups, the breeding niche space captured separately by the fossils and historical data had low overlap, reflecting different environmental conditions. The combined fossil + historic datasets predicted a niche that overlapped the post-1990 observed niche. Moreover, the fossil and historic datasets combined demonstrated that breeding grounds, now restricted mainly to predator-free settings, were once more widespread and extended further inland throughout NZ.
Main Conclusions
Historic and fossil occurrence records can complement each other, by mitigating biases unique to either dataset to better resolve these procellariiform trait groups ecological breeding niches. Together, such records provide critical insights into the past drivers of species range contractions, contextualising current ecosystems and informing species management planning.
The attached project recreates the analyses conducted in the seabird paleoENM manuscript (https://doi.org/10.1111/ddi.70032).
Description of the data and file structure
The ./scripts/ directory contains all the necessary analysis scripts, all of which have detailed comments describing their use. The seabird_sdm.R script calls each of the *_data.R scripts, and contains the primary analysis for our manuscript. The class_trait_analysis.R script conducts the trait group analysis and produces the associated figures in our supplementary materials. All scripts have detailed headers describing their use and purpose.
The environmental predictor layers used in this analysis were sourced from NZEnvDS, who explicitly request that these files be shared, but are freely available for download from their site. In addition, a potential vegetation layer was taken from Leathwick et al. (2003) and then converted to have identical classifications as the basic_ecosystems layer from NZEnvDS (see supplementary details for conversion). These files need to be downloaded and then placed in the designated repositories (see ./scripts/terra_data.R for directory structure).
We have supplied the thinned and extracted data to reproduce this analysis to remove this barrier from those wishing to explore our analysis (see ./output/model_data.rds). This is a nested list containing the datasets for each time trait group organised by the observation period. Each row in the dataframes corresponds to a colony location or one of the background points (denoted by 'observation' column). The distance_coast, elevation, slope_deg, topo_geomorphons, wind_meanAnn, basic_ecosystems, precip_warmQtr are the environmental values for those cells (see our manuscript for more detail on these). The cell column indicates the raster cell the observation originated from.
The script (seabird_sdm.r) has detailed comments describing our analysis choices and the data produced/used. However, the heatmaps and niche analyses cannot be recreated without downloading the environmental variables. Our code is heavily commented to make it as clear as possible the manner in which we conducted our analysis and the choices we have made.
The ./data/species_traits/species_trait_full.csv dataset contains all the data for the species trait analysis. The species_scientific_name contains the latin binomials for each species entry (rows). The size_class, contains the trait group each individual was assigned to post trait analysis. The body_mass column shows the average adult body mass of each species in grams. str_excavation indicates whether the strategy for nest building the birds use is excavation (1 indicates they do, 0 indicates they do not). All birds excavated. loc_earth_hole, loc_ground, and loc_rocks are binary (1 = they do, 0 = they do not) columns for whether a species is known to place its nests in earth holes, on the ground, or among rocks, respectively. tarsus_length, wing_length and beak_depth are all measures in millimetres of adult birds. hand_wing_index is the measured in millimetres of adult birds for their Hand-Wing Index (a measure of dispersal ability). Notes are an author-generated column that includes descriptions of any modification to the entries and their respective sources.
The ./output/models/ directory contains RDS files for all the dismo::maxent models we fitted for our trait groups/observation periods, and are necessary for some of our code. See dismo::maxent for additional details. The ./output/replicated/ directory contains the maxent output files. As the dismo::maxent function outputs hundreds of files for each individual model, we will not exhaustively document them here. Instead, please see the dismo R packages vignette for a detailed description of the files and their contents.
The ./output/predictions/ directory contains the individual cross-validated model predictions we produced for each observation period × trait group combination. Each of these is a .TIFF image in NZTM2000 format showing the probability of occurrence for the respective observation × trait group with values ranging from 0 - 1. These rasters were subsequently summarised for our project.
The ./output/niche_biplot.rds is a list of the ggplots we created for our manuscript. The file is a nested R list containing ggplot objects for each of the niche overlap plots we produced for our manuscript. The list elements are arranged by the observation periods nested within each trait group. Each sub element of the observation period is a ggplot object.
The ./output/niche_analysis/ directory contains three CSV files with the analysis of niche overlaps within each trait group across observation periods (see supplementary materials of these tables). Each of these files shows the pairwise niche overlap scores between the observation periods for each trait group (ranging between 0 - 1).
The burrowingIntoThePast_ODMAP.csv files contains the ODMAP for our paper, following the template provided by Fitzpatrick, M.C., Lachmuth, S. and Haydt, N.T., 2021. The ODMAP protocol: a new tool for standardized reporting that could revolutionize species distribution modeling. Ecography, 44(7), pp.1067-1070.
The environmental predictor layers used in this analysis were sourced from NZEnvDS who explicitly request that these files shared, but are freely available for download from their site. In addition, a potential vegetation layer was taken from Leathwick et al. (2003) and then converted to have identical classifications as the basic_ecosystems layer from NZEnvDS (see supplementary details for conversion). These files need to be downloaded and then placed in the designated repositories (see ./scripts/terra_data.R for directory structure).
- https://datastore.landcareresearch.co.nz/ne/dataset/nzenvds
- https://doi.org/10.1111/j.1523-1739.2003.00469.x
We have supplied the thinned and extracted data to reproduce this analysis to remove this barrier from those wishing to explore our analysis. However, the heatmaps and niche analyses cannot be recreated without downloading the environmental variables. Our code is heavily commented to make it as clear as possible the manner in which we conducted our analysis and the choices we have made.
Code/Software
Open the seabird_paleoenm.rproj after downloading the environmental data and placing it into the correct directories. From there, execute the code in the ./scripts/seabird_sdm.r file.
- Bellvé, André M.; Wilmshurst, Janet M.; Wood, Jamie R. et al. (2025). Burrowing Into the Past: Extending Niche Space Models of Procellariiform Breeding Grounds by Merging Fossil and Historic Data. Diversity and Distributions. https://doi.org/10.1111/ddi.70032
