Land cover classification and mapping of a polar desert in the Canadian Arctic Archipelago
Citation
Desjardins, Émilie et al. (2023), Land cover classification and mapping of a polar desert in the Canadian Arctic Archipelago, Dryad, Dataset, https://doi.org/10.5061/dryad.3bk3j9kpk
Abstract
We created a highly accurate land cover map of a polar desert in a 162 km2 area surrounding the Canadian Forces Station Alert, on the north-eastern tip of Ellesmere Island (Nunavut, Canada). Our objective was to improve classification methodology in the High Arctic by testing different predictors and classifiers. The land cover classes were selected specifically for the study area and include: forb-dominated barren, forb-dominated tundra, grass-dominated wetland, sedge-dominated wetland, moss-dominated wetland, water, snow, human infrastructure and shadow.
We selected the most relevant predictors to discriminate the land cover classes using three complementary methods: Area Under the receiver operator characteristic Curve (AUC), Boruta, and correlation coefficients. The predictors were divided into four categories: spectral predictors (obtained from WorldView-2/3 multispectral satellite imagery), vegetation predictors (e.g., soil-adjusted vegetation indices), topographic predictors (e.g., aspect-slope, elevation, relief, slope, and terrain ruggedness index, all derived from the Arctic Digital Elevation Model), and hydrological predictors (e.g., distances to the nearest water sources).
We classified seven land cover classes (human infrastructure and shadow were added in a post-classification step) with a supervised approach. We used 25 predictors selected out of 38 to train one parametric classifier, Maximum likelihood in ArcGIS Pro (ESRI), and seven non-parametric classifiers: Artificial Neural Networks, Classification And Regression Trees, K-Nearest Neighbors, Linear Discriminant Analysis, Naive Bayes, Random Forests, and Support Vector Machines. We developed an ensemble classifier based on a majority voting algorithm, with each classifier having one vote and each pixel retaining the land cover class with the highest vote. We evaluated the classification accuracy of the nine classifiers through visual inspection of the derived maps and the confusion matrices, where commonly used metrics were derived (i.e., overall accuracy, kappa coefficient, balanced accuracy, user's accuracy, and producer's accuracy). We retained as final map the one generated from the ensemble classifier because it yielded satisfactory predictions (85% overall accuracy) and produced visually less classification bias.
The dataset includes (1) the reference points in shapefile to train and evaluate the classifiers, (2) the final land cover map in GeoTIFF format with 0.5 × 0.5 meter resolution, and (3) two R scripts: one for predictor selection, classification (with the seven non-parametric classifiers), and validation, and the second for majority voting ensemble algorithm. The digital elevation model that was used to generate the topographic predictors can be downloaded online at https://doi.org/10.7910/DVN/OHHUKH. The predictor layers are not provided because the spectral and vegetation predictors were based on the satellite imagery, and our license does not allow free sharing.
Methods
The methodology and the land cover classes are described in the related works:
Émilie Desjardins, Sandra Lai, Laurent Houle, Alain Caron, Véronique Thériault, Andrew Tam, François Vézina & Dominique Berteaux (Submitted) Revisiting algorithms and predictors for land cover classification of polar deserts: case study, challenges and recommendations. Remote Sensing.
Émilie Desjardins, Sandra Lai, Serge Payette, François Vézina, Andrew Tam & Dominique Berteaux (2021) Vascular plant communities in the polar desert of Alert (Ellesmere Island, Canada): Establishment of a baseline reference for the 21st century, Écoscience, 28:3-4, 243-267, DOI: 10.1080/11956860.2021.1907974
Usage notes
The dataset includes a shapefile named reference.shp, which consists of a collection of files with a common filename prefix, stored in the same directory. The shapefile stores the location, shape (point in this case), and attribute of the 467 reference points. The attribute of each point includes the land cover class among the following: bareground (refers to forb-dominated barren), mesic (refers to forb-dominated tundra), wetgrass (refers to grass-dominated wetland), wetsedge (refers to sedge-dominated wetland), wetmoss (refers to moss-dominated wetland), water, and snow. The shapefile can be opened in geographic information system (GIS) software such as QGIS (QGIS Development Team) and ArcGIS (ESRI). The geographic coordinate system is NAD 1983 (EPSG:4269) and the projected coordinate system is NAD 1983 UTM Zone 20N (EPSG:26920).
The dataset also includes a file named ensemble_classifier.tif. It is a raster GIS file in GeoTIFF format with 0.5 x 0.5 meter resolution. The uncompressed size is 382,48 MB. It can be opened in GIS software such as QGIS and ArcGIS. The geographic coordinate system is NAD 1983 (EPSG:4269) and the projected coordinate system is NAD 1983 UTM Zone 20N (EPSG:26920). Dimensions of the raster are 26870 rows x 29852 columns. The values in the raster represent the land cover classes: 1 = forb-dominated barren, 2= forb-dominated tundra, 3= snow, 4= water, 5= grass-dominated wetland, 6= moss-dominated wetland, 7= sedge-dominated wetland, 8= human infrastructure, and 9 = shadow (NoData value = 15).
The two other files in this dataset are R scripts that can be opened in RStudio software (RStudio Team). One script, named selection_classification_validation.R contains the different steps necessary to select the most relevant predictors prior to classification, to train the seven nonparametric classifiers (Artificial Neural Networks, Classification And Regression Trees, K-Nearest Neighbors, Linear Discriminant Analysis, Naive Bayes, Random Forests, and Support Vector Machines), and to evaluate the accuracy of the resulting classifications. The second script, named majority_voting.R, is the ensemble classifier with majority voting algorithm in which classified maps (rasters in geotiff format) of four of the classifiers (Random Forests, Linear Discriminant Analysis, Classification And Regression Trees, Maximum Likelihood) are combined. Note that the classified maps for the three non-parametric classifiers were created from the first script, while the classified map from the parametric classifier, Maximum Likelihood, was created using ArcGIS Pro.
Funding
Natural Sciences and Engineering Research Council of Canada, Award: RGPIN-2019-05292
Natural Sciences and Engineering Research Council of Canada, Award: RGPNS-2019-305531
Canada Excellence Research Chairs, Government of Canada
Department of National Defence
Kenneth M. Molson Foundation
Fonds de recherche du Québec – Nature et technologies
Network of Centers of Excellence of Canada ArcticNet
Weston Family Foundation
Polar Knowledge Canada
BIOS2 NSERC Collaborative Research and Training Experience (CREATE) program, Award: FONCER 509948-2018