Mapping built infrastructure in semi-arid systems using data integration and open-source approaches for image classification
Data files
Apr 10, 2025 version files 972.37 MB
-
MBI_2015-2024.zip
486.18 MB
-
MBI_2015.zip
40.85 MB
-
MBI_2016.zip
49.79 MB
-
MBI_2017.zip
50.28 MB
-
MBI_2018.zip
46.35 MB
-
MBI_2019.zip
50.06 MB
-
MBI_2020.zip
48.25 MB
-
MBI_2021.zip
47 MB
-
MBI_2022.zip
51.02 MB
-
MBI_2023.zip
52.06 MB
-
MBI_2024.zip
50.51 MB
-
README.md
6.92 KB
Abstract
Accurate land use land cover (LULC) maps that delineate built infrastructure are useful for numerous applications, from urban planning, humanitarian response, disaster management, to informing decision making for reducing human exposure to natural hazards, such as wildfire. Existing products lack sufficient spatial, temporal, and thematic resolution, omitting critical information needed to capture LULC trends accurately over time. Advancements in remote sensing imagery, open-source software and cloud computing offer opportunities to address these challenges. Using Google Earth Engine, we developed a novel built infrastructure detection method in semi-arid systems by applying a random forest classifier to a fusion of Sentinel-1 and Sentinel-2 time series. Our classifier performed well, differentiating three built environment types: residential, infrastructure, and paved, with overall accuracies ranging from 90 to 96%. Producer accuracies were highest for the infrastructure class (98–99%), followed by the residential class (91–96%). Sentinel-1 variables were important for differentiating built classes. We illustrated the utility of our mapped products by generating a time-series of change across southern Idaho spanning 2015 to 2024 and comparing this with publicly available products: National Land Cover Database (NLCD), Microsoft Building Footprints (MBF) and the global Dynamic World (DW). For 2024, our product estimated 5.88% of the study area as built, aligning closely with NLCD (6%) and DW (4.64%). Our mapped built infrastructure products offer enhancements over NLCD spatially and temporally, over DW thematically, and over MBF both temporally and thematically. We demonstrate the potential of fusing data sources to improve LULC mapping and present a case for regionally parameterized models that can more accurately capture built infrastructure change over time. We used open-source approaches for built infrastructure detection, aiming for broader adoption of this workflow across other ecosystems and environments to support decision-making.
Description of the data and file structure
These data are annual maps of built infrastructure, with six classes, spanning the Snake River Plain ecoregion in southern Idaho. These products are ready-to-use, and can be imported into any geospatial software for analyses. These data were generated from a fusion of Sentinel-1 radar and Sentinel-2 multispectral imagery. The final MBI products are annual raster data types, that is pixelated, categorical data with 6 categories or classes; 1. Residential, 2. Infrastructure, 3. Paved, 4. Agriculture, 5. Vegetation, and 6. Range/Scrub.
If a user wants to generate these products themselves, or reproduce these products for a similar area, then Google Earth Engine and QGIS is required. The user must have an account with Google Earth Engine (GEE), load the MBI scripts into their repository, and run the code. For applying this model outside of the Snake River Plain Level III ecoregion, new training data must be generated, and can be created using either QGIS or GEE. All remote sensing imagery used for MBI is freely available in GEE.
Files
To access MBI for all years:
File: ‘MBI_2015-2024.zip’
Description: A zipped folder containing 30 seperate .tif files for the MBI product spanning the Snake River Plain from 2015 to 2024. Each raster has 6 classes with this notation: 1. Residential, 2. Infrastructure, 3. Paved, 4. Agriculture, 5. Vegetation, and 6. Range/Scrub. For each year of the MBI product, there are three seperate .tif files (-1.tif, -2.tif, and -3.tif) due to the size of the region. These are named RFYY where RF stands for Random Forest which was the algorithm used to create the MBI product and “YY” corresponds to the last two numbers of a given year:
‘RF”YY”-1.tif’ (e.g., RF19-1.tif is one of three MBI .tif files for the year 2019)
‘RF”YY”-2.tif’
‘RF”YY”-3.tif’
To access MBI for only selected years:
File: ‘MBI_20”YY”.zip’
Description: 10 seperate zipped folders, one for each year the MBI product is available (2015 to 2024). “YY” responds to the last two numbers of a given year. Each zipped folder contains three .tif files that have the raster MBI product across the Snake River Plain for that given year, for example:
‘MBI_2021.zip’ contains only the rasters for the year 2021 and contains three .tif files:
‘RF21-1.tif
‘RF21-2.tif’
‘RF21-3.tif’
Sharing/Accessing Information
The ready-to-use MBI product can also be accessed here:
https://drive.google.com/drive/folders/1faKaLfSwEiYhrunrSKAPdpGMEpogGR7V?usp=drive_link
Read more about these data here:
- Dolman et al. (2025) Mapping built infrastructure in semi-arid systems using data integration and open-source approaches for image classification. Remote Sensing Applications: Society and Environment, 37, 101472, https://doi.org/10.1016/j.rsase.2025.101472
- ArcStoryMaps https://storymaps.arcgis.com/stories/9a3e71e2b1a7469a8cb997f8a2ee231a
The MBI products were derived from the following sources:
- Sentinel-1 radar: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S1_GRD
- Sentinel-2 multispectral imagery: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_HARMONIZED
- Spectral indices calculated from multispectral bands
- Topographic variables from National Elevation Dataset (https://www.usgs.gov/publications/national-elevation-dataset)
- Distance to existing features, specifically roads from US Census TIGER/Line - Roads for each year (https://catalog.data.gov/dataset/census-tiger-line-roads1), railways from US Census TIGER/Line Nation Rails Shapefile for each year (https://catalog.data.gov/dataset/tiger-line-shapefile-2019-nation-u-s-rails-national-shapefile), and rivers from National Hydrography Dataset (https://www.usgs.gov/national-hydrography/national-hydrography-dataset)
Code/software
A Google Earth engine accout is required for a user to reproduce these products.
Here is a summary guide to using GEE, accessing and running MBI scripts in GEE, and making changes to these scripts.
A link to the Mapped built infrastructure product Google Earth Engine (GEE) repository:
https://code.earthengine.google.com/?accept_repo=users/megdolman/SIAC_BuiltInfrastructureMapping
The repository for creating the MBI product is seperated into 4 folders, each with seperate scripts:
-
Train
a. CreateTrainingStacks:* creates atmospherically corrected bottom-of-atmosphere image stacks from which to sample to create training data*
b. SampleTrainingStacks: samples random points across the region of interest. Pixels are then extracted at these random points to generate a pixel training dataset. For each class (six classes) and every year (four years of training polygons), 15,000 random points are generated
-
Classify
c. ClassifyImageStacks: imports the training pixels, generates annual image stacks, applies the Random Forest classifier to classify the image stacks, produces the annual mapped products of built infrastructure.
-
Validate
d. StratifySampleForValidation: applies a stratified sampling method to generate random points for algorithm validation
e. GenerateValidationData: adds geometry points at each point generated in the stratified random sample, for every year
-
Analysis
f. AnalyseTimeSeries: visualizes each year of the MBI product, from 2015 to 2024 (as of December 2024) and calculates the area of each class, for every year
g. CompareProducts: visually compare the four LULC products in Dolman et al. (2025), as well as calculate area of certain classes. The other products compared are Dynamic World, National Land Use and Land Cover, and Microsoft Building Footprints.