Data from: Mapping coastal redwoods (Sequoia sempervirens) across their natural range: An updateable and field-validated distribution map using Sentinel satellite data and cloud computing
Data files
Jan 05, 2026 version files 174.03 MB
-
Dryad_RunManifest_Metadata.csv
446 B
-
Dryad_Shareable_TrainingTable_960_NoCoordinates.csv
328.65 KB
-
README_GEE_Reproducibility.txt
1.99 KB
-
README.md
7.03 KB
-
Redwood_Extent_Boundary.zip
525.72 KB
-
Redwood_Presence_Probability_Map.zip
171.86 MB
-
Redwood_Presence_Raster.zip
1.31 MB
Abstract
Coast redwood (Sequoia sempervirens) is a uniquely-tall and long-lived tree species that occupies a narrow fog-belt along the Pacific coast of North America. Despite its ecological and conservation significance, existing maps of redwood distribution remain limited in spatial resolution, accuracy, and timeliness. In this study, we present an updatable and field-validated distribution map of S. sempervirens across its entire native range – from southwestern Oregon to central California – developed using freely available Sentinel-2 multispectral and Sentinel-1 SAR data. We compiled a georeferenced canopy classification dataset of 960 points, combining field surveys conducted in October 2024 with field-based ground truth points collected in 2017 from a prior mapping study, and externally sourced field-based redwood presence records. This dataset was used to train machine learning models (Random Forest and Gradient Boosted Trees) within a cloud computing framework to classify redwood presence and absence at 10 m spatial resolution. Binary classification models achieved high predictive performance, with the best model yielding over 88% overall accuracy and an AUC of 0.92 on a 30% hold-out validation set. Ten-fold cross-validation on the training data further confirmed model consistency, with high true positive rates and low false positive rates across folds. A secondary multi-class model differentiated between redwood-dominated and mixed-conifer forest types, achieving an overall accuracy of 73.82%. Comparison with previous redwood distribution datasets revealed substantial agreement but also significant discrepancies, with the new model suggesting redwood presences in previously unmapped redwood fragments and absences in locations mapped as redwood. Validation against field data confirmed higher accuracy in the new map. The resulting range-wide redwood map offers a current, accurate, and updateable platform for conservation planning, habitat monitoring, and ecological research. It also establishes a high-confidence baseline for tracking redwood distribution dynamics under ongoing climate and land-use change.
Dataset DOI: 10.5061/dryad.34tmpg4xf
Description of the data and file structure
Dataset overview
This repository provides supporting data and code for a Coast Redwood (Sequoia sempervirens) distribution mapping study using Sentinel-1 and Sentinel-2 satellite data and Google Earth Engine (GEE).
To minimize potential risks associated with publishing precise locations of an Endangered species, this archive does not include point-level geographic coordinates. Instead, it provides a coordinate-free training dataset containing extracted satellite predictor values and class labels, together with spatially explicit raster outputs and a fully documented cloud-based reproducibility workflow.
Files included:
(i) a shareable, coordinate-free training table with extracted Sentinel-1/2 predictors,
(ii) the study area boundary,
(iii) a binary Coast Redwood presence map,
(iv) a pixel-wise Coast Redwood presence probability map,
(v) a Google Earth Engine reproducibility script, and
(vi) a run manifest and README documenting the analysis.
Note on third-party data
Ground reference points supplied by the Save the Redwoods League (STRL) were used during model development but are not included in this archive due to data-use restrictions and sensitive-species considerations. Access to those original point locations is subject to STRL approval and is not required to reproduce the modeling workflow presented here.
Files and descriptions
1) Dryad_Shareable_TrainingTable_960_NoCoordinates.csv
Type/format: CSV (UTF-8)
Description:
A coordinate-free training dataset containing extracted Sentinel-1 and Sentinel-2 predictor values at ground reference locations, together with vegetation composition labels, a binary Coast Redwood presence indicator, and cross-validation fold assignments. This table enables full reproduction of model training, evaluation, and validation steps without disclosing point-level locations.
Key fields (columns):
- Sentinel-2 spectral bands (B1–B12; resampled as described in the manuscript)
- Sentinel-1 backscatter bands (VV, VH) and VV/VH ratio
- Class – categorical vegetation composition class (string), representing dominant canopy composition at each reference location. The six vegetation composition classes used in the study are:
- DOG – Douglas-fir dominated stands
- SML – Smaller trees (shorter-stature or early successional forest vegetation)
- RED – Coast Redwood (Sequoia sempervirens) only
- RDF – Mixed Coast Redwood and Douglas-fir
- ROT – Coast Redwood mixed with other taller tree species (excluding Douglas-fir)
- TAL – Tall trees other than Coast Redwood and Douglas-fir
- NumericClass – integer encoding of the six vegetation composition classes (0–5), used as the response variable for multi-class Random Forest classification in Google Earth Engine. The mapping is:
- DOG → 0
- SML → 1
- RED → 2
- RDF → 3
- ROT → 4
- TAL → 5
- Redwood_Presence – binary response variable (1 = Coast Redwood present, 0 = absent)
- fold – spatial block fold identifier used for cross-validation
- has_fill – indicator flag (1 = at least one predictor value filled due to masking in the Sentinel satellite composite)
Notes on class encoding:
NumericClass is a machine-readable representation of categorical vegetation composition. The numeric values do not imply any ordinal or hierarchical relationship among classes and are used solely for computational purposes. The mapping between Class and NumericClass is defined explicitly in the Google Earth Engine script (Redwood_Extent_Mapping_RF_v01.js) and is fully reproducible.
Notes on predictor completeness:
Rows with has_fill = 1 indicate locations where one or more predictor values were filled due to masking in the Sentinel composites. By default, the provided GEE script trains and evaluates models using rows where has_fill = 0.
2) Redwood_Extent_Boundary.zip
Type/format: ESRI Shapefile (zipped; includes .shp, .shx, .dbf, .prj)
Description:
Polygon boundary defining the study area (area of interest) over which models were trained, applied, and evaluated.
Provenance:
Derived from the union of redwood presence extents from CALVEG, LEMMA, and a custom map provided by the Save the Redwoods League.
Spatial reference:
Defined in the included .prj file.
3) Redwood_Presence_Raster.zip
Type/format: GeoTIFF (zipped)
Description:
Final binary map of Coast Redwood presence across the study area.
Value encoding:
- 1 = Redwood presence
- NoData = outside the analysis boundary
Notes:
Derived by thresholding the probability surface using the decision rule described in the manuscript.
4) Redwood_Presence_Probability_Map.zip
Type/format: GeoTIFF (zipped)
Description:
Pixel-wise probability (0–1) of Coast Redwood presence across the study area.
Value range:
Continuous 0.0–1.0 (floating point).
Notes:
Suitable for alternative thresholding, sensitivity analyses, and uncertainty visualization.
5) Redwood_Extent_Mapping_RF_v01.js
Type: Software (Google Earth Engine JavaScript)
Description:
Google Earth Engine script implementing the full modeling workflow using the shareable training table and AOI, including:
- Sentinel-1 and Sentinel-2 composite generation
- Random Forest model training
- Accuracy assessment and cross-validation
- Spatial block cross-validation
- Variable importance analysis
- Export of classified raster outputs
This script reproduces the analysis without requiring point-level geographic coordinates.
6) Dryad_RunManifest_Metadata.csv
Type/format: CSV
Description:
Metadata file documenting the exact satellite collections, temporal windows, spatial resolution, resampling methods, modeling parameters, and cross-validation settings used in the analysis.
7) README_GEE_Reproducibility.txt
Type/format: Text
Description:
Step-by-step instructions for reproducing the Google Earth Engine analysis, including asset upload procedures and script execution details.
Sensitive species location handling
Sequoia sempervirens is listed as Endangered. To reduce the risk of revealing sensitive locations, this dataset does not include geographic coordinates of field observations. Model reproducibility is ensured through the release of extracted satellite predictor values, raster outputs, and a fully documented cloud-based analysis workflow.
Access to finer-resolution location data
Access to original field coordinates may be considered on a case-by-case basis for legitimate research purposes, subject to ethical considerations and third-party data permissions. Interested researchers may contact the corresponding author.
