An ensemble machine learning bioavailable strontium isoscape for Eastern Canada
Data files
May 12, 2025 version files 634.43 MB
-
isoscapes.zip
76.54 MB
-
projected_rasters.zip
557.03 MB
-
README.md
6.18 KB
-
SM1_table_S1.xlsx
778.06 KB
-
SM3_script_bedrock.R
18.89 KB
-
SM4_script_isoscape.R
47.24 KB
Abstract
Bioavailable strontium isotope ratios (87Sr/86Sr) distribution across the landscape mainly follow the underlying lithology, making 87Sr/86Sr baseline maps (isoscapes) powerful tools for provenance studies. 87Sr/86Sr has already been used in Eastern Canada (EC) to track food and human remains origins, or to reconstruct animal mobility. While bioavailable 87Sr/86Sr isoscapes for EC can be extrapolated from global datasets using random forest modelling (RF), no regionally-calibrated isoscape exists. Here, we produce a regionally-calibrated bioavailable 87Sr/86Sr isoscape by analysing plants collected at 136 sites across EC, incorporating updated geological variables and applying a novel ensemble machine-learning (EML) framework. We generated and compared isoscapes generated by the traditional RF and the EML approaches. Adding local bioavailable 87Sr/86Sr to a global dataset significantly improved the model prediction with a drastic increase of predicted 87Sr/86Sr and increased spatial uncertainty in the northern Canadian craton. EML produced similar 87Sr/86Sr predictions but with tighter spatial uncertainty distribution. Regionally-calibrated RF and EML isoscapes significantly outperformed the global bioavailable RF isoscape, confirming the requirement for collecting local data in data-poor regions. This isoscape provides a baseline in EC to monitor and manage the movements and provenance of agricultural products, natural resources, endangered/harmful migratory species, and archaeological human remains and artifacts.
https://doi.org/10.5061/dryad.9zw3r22qf
Description of the data and file structure
The bioavailable strontium (87Sr/86Sr) isoscape for Eastern Canada (EC) was generated using two machine learning approaches: random forest (RF), following the workflow described in Bataille et al. (2020, Advances in global bioavailable strontium isoscapes, Palaeogeography, Palaeoclimatology, Palaeoecology, 555, 109849, https://doi.org/10.1016/j.palaeo.2020.109849), and ensemble machine learning (EML).
Both approaches rely on a training dataset consisting of bioavailable 87Sr/86Sr measurements sampled globally, along with geological, environmental, and climatic variables in raster format used to predict 87Sr/86Sr across the landscape. Among these variables, predicted bedrock 87Sr/86Sr—a key driver of bioavailable 87Sr/86Sr—was updated for EC using more accurate geological data, following the methodology described in Bataille et al. (2014, A geostatistical framework for predicting variations in strontium concentrations and isotope ratios in Alaskan rivers, Chemical Geology, 389, https://doi.org/10.1016/j.chemgeo.2014.08.030), prior to running the RF and EML models.
The RF-based isoscape was generated using both the original dataset from Bataille et al. (2020) and an updated dataset incorporating plant samples from Eastern Canada. The EML-based isoscape was generated using the updated dataset only.
This folder contains:
- The updated bioavailable 87Sr/86Sr database (Excel format), based on Bataille et al. (2020), including new plant data from Eastern Canada.
- Rasters of geological variables (predicted bedrock 87Sr/86Sr, geological unit age) updated specifically for this study (additional variables are available from Bataille et al. 2020).
- R scripts to 1) run the bedrock model to predict bedrock 87Sr/86Sr, 2) run the RF and EML analyses.
- The isoscapes generated using RF (with and without EC plant data) and the EML-generated isoscape, along with their corresponding spatial error rasters.
Files and variables
File: SM1_table_S1.xlsx
Bioavailable 87Sr/86Sr dataset used to generate the isoscape. This is an updated version of the dataset compiled by Bataille et al. (2020), incorporating new samples from Eastern Canada. Due to variability in data formats and completeness across the original source studies, not all fields are available for every record. Missing information is indicated with “NA”.
Fields :
- Reference sheet: list of the source studies of the samples with reference id used in the “samples” sheet.
-
samples sheet:
-ID: sample ID for this dataset
-Original_ID: source sample ID
-Country: geographic origin of the sample
-Type1: main type of sample (soil, plant, animal taxa)
-Type2, Type3: more detailed type (e.g. species for plant and animals)
-Material: description of the material analysed
-87Sr/86Sr: strontium (Sr) isotopic ratio. According to the source studies, this ratio was measured using either Thermal Ionization Mass Spectrometry (TIMS) or Inductively Coupled Plasma Mass Spectrometry (ICP-MS). For detailed information on analytical procedures, refer to the original studies.
-Conc._Sr: concentration in strontium (ppm)
-Sr.1SD, Sr.2SD: measurement error measured at 1 or 2 standard deviation
-Latitude/Longitude: sampling site coordinates in decimal degrees
-Source.of.data.reference.ID: ID used to identify source studies in “Reference” sheet.
File: SM3_script_bedrock.R
Description: R script to run the mechanistic bedrock model from Bataille et al. (2014) and predict modern 87Sr/86Sr values in bedrock (median, 1st quartile, 3rd quartile), based on the age and lithology of geological units. The script uses three different equations to estimate 87Sr/86Sr according to bedrock type (igneous/metamorphic rocks, siliciclastic sediments, and carbonate sediments).
File: SM4_script_isoscape.R
Description: R script to run the random forest (RF) and ensemble machine learning (EML) analyses and generate bioavailable 87Sr/86Sr isoscapes. The script includes: extraction of geological, environmental, and climatic predictors at sampling sites, variable selection for the RF analysis, execution of RF and EML models, and prediction of bioavailable 87Sr/86Sr values across the landscape.
File: isoscapes.zip
Description: Output rasters (predicted values and standard deviations) from the RF and EML analyses, including the RF-based isoscapes with and without Eastern Canada (EC) samples, and the EML-based isoscape. All rasters are provided in .tif format (GeoTIFF) and can be opened in R using the raster or terra packages, or in any GIS software such as QGIS.
File: projected_rasters.zip
Description: Rasters of predicted bedrock 87Sr/86Sr (median, 1st quartile, 3rd quartile) and geological unit age (minimum, maximum, mean), updated from Bataille et al. (2020), and used as predictors for generating the Eastern Canada isoscapes. Additional predictor rasters used in the analyses are available from Bataille et al. (2020). All rasters are provided in .tif format (GeoTIFF) and can be opened in R using the raster or terra packages, or in any GIS software such as QGIS.
Sharing/Access information
The original bioavailable 87Sr/86Sr dataset and the predictor rasters used in the analysis are available from Bataille et al. (2020): https://doi.org/10.1016/j.palaeo.2020.109849
Code/software
Analyses were conducted using R 4.3. The scripts are compatible with more recent versions of R, except for the EML section. The EML analysis relies on the landmap R package, which depends on the deprecated rgdal package and will not run in recent R versions where rgdal is no longer supported. The landmap package is available at: https://github.com/Envirometrix/landmap
- Data collection and analyses
We completed the original bioavailable 87Sr/86Sr global database from Bataille et al. (2020) with plant samples from Eastern Canada collected during two independent campaigns between 2018 and 2022: moss, lichen and grass were collected at 28 sites across taiga and tundra habitats; Balsam fir (Abies balsamea (Mill.)) and spruce needles (Picea sp.) were collected at 107 sites across the boreal forest. Samples were analysed for 87Sr/86Sr by Multi-Collector Inductively Coupled Plasma Mass Spectrometer (MC-ICP-MS).
- Mechanistic 87Sr/86Sr bedrock model
The mechanistic 87Sr/86Sr bedrock model (Bataille et al. 2014) predicts the modern 87Sr/86Sr values of the bedrock based on the age and the nature of the geological units. The model assumes that all rocks of a given lithology come from a common parent material with 87Sr/86Sr value changing over time following a three-stage history: (1) the initial 87Sr/86Sr in undifferentiated mantle, (2) the enrichment in 87Sr through 87Rb decay of the parent crustal rock reservoir from its differentiation to the formation of the modern rock, (3) the enrichment in 87Sr of the modern rock since its formation.
We updated the global mechanistic bedrock model (median, 1st quartile, 3rd quartile) and the associated global geological unit age rasters (min, max, mean) used in Bataille et al. (2020), for the province of Québec, using SIGEOM (Système d’Information Géominière, Ministère des Ressources naturelles et des Forêts [MRNF], Québec) fine scale geological data (Thériault and Beauséjour 2012). Update of the geological data (age, nature) was done in ArcGIS 10.3. The mechanistic bedrock model was run in R 4.3 (script provided). We provided the updated geological age rasters (min, max, mean) and the predicted 87Sr/86Sr from bedrock (median, 1st quartile, 3rd quartile), used in the random forest and ensemble machine learning analyses to produce the bioavailable 87Sr/86Sr isoscape for Eastern Canada.
- Bioavailable 87Sr/86Sr isoscape
We generated the bioavailable 87Sr/86Sr isoscape for Eastern Canada using the global bioavailable 87Sr/86Sr sample dataset from Bataille et al. (2020) updated with samples from Eastern Canada and geological, topographic, climate, soil, and aerosol deposition variables extracted at each sampling sites.
We used two machine learning approaches: a random forest analysis and an ensemble learning machine analysis. The random forest grows multiple regression trees by bagging : for each tree the dataset is divided randomly by bootstrap into a training set and a validation set. Predictions are then aggregated from all trees. For the ensemble machine learning, we used the stacking approach: several different algorithms are trained on the dataset and the predictions of these models are used to train a meta-learner, generating the ensemble machine learning estimates.
Two isoscapes were generated with the random forest approach: the first one uses the global 87Sr/86Sr dataset without data from Eastearn Canada and the other one integrates Eastern Canada samples. The ensemble machine learning analysis use global 87Sr/86Sr dataset with the Eastearn Canada data.
Analyses were conduct in R 4.3 (script provided) and we provide the 87Sr/86Sr prediction rasters as well as the associated spatial error (standard deviation) rasters for both approaches.
- References:
Bataille, C.P., Brennan, S.R., Hartmann, J., Moosdorf, N., Wooller, M.J., and Bowen, G.J., 2014. A geostatistical framework for predicting variations in strontium concentrations and isotope ratios in Alaskan rivers. Chemical Geology, 389, 1–15. https://doi.org/10.1016/j.chemgeo.2014.08.030
Bataille, C.P., Crowley, B.E., Wooller, M.J., and Bowen, G.J., 2020. Advances in global bioavailable strontium isoscapes. Palaeogeography Palaeoclimatology Palaeoecology, 555, 109849. https://doi.org/10.1016/j.palaeo.2020.109849
Thériault, R., and Beauséjour, S., 2012. Geological map of Québec, DV2012-07. Ministère des Ressources naturelles et des Forêts, Québec.