Dataset accompanying Buscombe et al.: Human-in-the-loop segmentation of earth surface imagery
Data files
Jan 17, 2022 version files 335.97 MB
-
BuscombeEtAl_HITL_Segmentation_of_Earth_Surface_Imagery_README.pdf
-
datasetA.zip
-
datasetB.zip
-
datasetC.zip
-
datasetD.zip
-
datasetE.zip
-
datasetF.zip
Abstract
The datasets used in this study are provided in 7 folders:
-
“dataset A”, containing data from Sandwich Town Neck Beach on Cape Cod, Massachusetts. These images are published as a USGS data series (Sherwood et al., 2021) are publicly available at https://doi.org/10.5066/P9BFD3YH
-
“dataset B”, containing data from North and South Carolina collected immediately after Hurricane Florence in October 2018. National Geodetic Survey emergency response imagery courtesy of the National Oceanic and Atmospheric Administration, available at https://storms.ngs.noaa.gov
-
“dataset C”, containing some examples of shoreline environments captured by a low-altitude aircraft. These images are published as a USGS data series (Kranenburg et al., 2020) are publicly available at https://doi.org/10.5066/P9CA3D8P
-
“dataset D”, containing data collected from the Pearl River and its tributary the Bogue Chitto, and from the Chickasawhay, Buoy and Leaf tributaries of the Pascagoula River, in spring 2021. Used with permission from U.S. Fish and Wildlife Service
-
“dataset E”, containing Sentinel-2 satellite images of coastal lagoon environments in Salinas Rivermouth Natural Preserve and National Wildlife Refuge in Monterey, California. Sentinel-2 imagery courtesy of European Space Agency (ESA)
-
“dataset F”, containing Landsat-8 of Cape Hatteras, Cape Hatteras National Seashore, North Carolina. Landsat-8 imagery is courtesy of U.S. Geological Surve
-
“code”, containing a version of the code used to generate the results contained in this data repository. Full details about this code can be obtained from the github code repository (https://github.com/dbuscombe-usgs/dash_doodler) and website (https://dbuscombe-usgs.github.io/dash_doodler/).
Methods
In each folder, there are three subfolders named 1) images, 2) label images, 3) annotations, and a text file called classes.txt that contains a list of classes for that imagery. The images folder contains the raw images used to generate label images using the program, another folder contains the label images generated by the program, and the annotations folder contains the raw annotations. All images are in standard image formats jpeg and png.
The classes.txt file is that used by the Doodler program to generate results. The file consists of class names, each on a new line. The integer values in the label and annotation data are associated with each class are assigned in order. For example, the classes.txt file for dataset B consists of the following class names:
-
water
-
sand
-
veg
-
dev
where `veg` is short for vegetated terrain, and `dev` is short for any type of human development such as buildings and roads. In the annotation and label data, water is encoded with integer 1, sand is 2, and so on.
Dataset A
The dataset consists of one, three-band orthomosaic image, at 5-cm and 25-cm resolutions, for mapping beach substrates of Sandwich Town Neck Beach on Cape Cod, Massachusetts. The orthomosaics are created from photographs collected from a low-altitude Uncrewed Aircraft System (UAS) on September 21, 2016, using a structure-from-motion workflow for high-resolution elevation mapping of coasts from aerial imagery. The 5-cm and 25-cm pixel imagery are divided into 1024 x 1024 pixel, 3-band (RGB) tiles for annotation, which results in 99 and 6 tiles for the respective resolutions. Due to no data at the boundaries, 64 tiles and one tile respectively are completely blank. The following categories are used; 1) water, 2) sand, 3) gravel, 4) cobble/boulder, 5) vegetated, 6) development. These images are published as a USGS data series (Sherwood et al., 2021) are publicly available at https://doi.org/10.5066/P9BFD3YH. Each image and label file comes with a .wld file (ESRI world file format, see https://desktop.arcgis.com/en/arcmap/latest/manage-data/raster-and-images/world-files-for-raster-datasets.htm ) and also a xml file with coordinate reference system metadata.
Dataset B
The dataset consists of a non-continuous spatial series of 80, three-band image tiles (1000 x 750 x 3 pixels), which are from Emergency Response Imagery collected by the National Geodetic Survey Remote Sensing Division of the US National Oceanographic and Atmospheric Administration, NOAA, that have been each divided into four tiles. The imagery is from North and South Carolina taken after Hurricane Florence (October, 2018). The images are labeled using the following classes: 1) water, 2) sand, 3) vegetated surface, and 4) development. The list of original NOAA image names is provided in the file, noaa_images.txt.
Dataset C
The dataset consists of a series of 10, three-band arbitrary images of shoreline environments such as could be collected from a low-altitude aircraft in numerous locations, each labeled by five people using the following four classes; 1) deep water, 2) whitewater, 3) intertidal area (including all visibly shallow water where the surface below the water is visible, swash regions, and wet sand), and 4) dry land. These images are published as a USGS data series (Kranenburg et al., 2020) are publicly available at https://doi.org/10.5066/P9CA3D8P
Dataset D
The dataset consists of a non-continuous spatial series of 51, one-band (greyscale) image tiles, each a short section of port or starboard scan consisting of 1024 consecutive sonar pings stacked as image columns. The length of each ping varied due to sonar range, resulting in the number of image rows varying between 1300 and 2000 pixels. The scans are collected using a Humminbird Solix sidescan sonar emitting a frequency modulated sound pulse with a nominal carrier frequency of 1.2 MHz, from sections of the Pearl River and its tributary the Bogue Chitto, and from the Chickasawhay, Buoy and Leaf tributaries of the Pascagoula River, in Spring 2021, for mapping in-stream physical habitats in coastal plain rivers of Louisiana and Mississippi. The dataset consists of 10 example scans from the Bogue Chitto River, four from the Buoy River, two from the Chickasawhay River, 12 from the Leaf River, and the remaining 23 from the mainstem Pearl River. The samples are selected for a variety of substrate types, water depths and turbidities.
Dataset E
The dataset consists of a time-series of 40, three-band false-color 10-m (122 x 342 x 3 pixels) Sentinel-2 satellite images of coastal lagoon environments in Salinas Rivermouth Natural Preserve and National Wildlife Refuge in Monterey, California, collected between Dec 31, 2018 and May 19, 2021. The false color images consist of near infrared (band eight), red (band four), and green (band three). The spatio-temporal time-series depicts various changes on the landscape, including the dynamics of the Salinas River mouth into the coastal ocean, surfzone and riverplume characteristics, changes to marsh and dune vegetation, and agricultural crop rotation. Therefore we defined the following classes: 1) water, 2) whitewater, 3) bare sand, 4) marsh veg, 5) dune veg, 6) crop/woody, 7) soil.
Dataset F
The dataset consists of consists of a time-series of 43, three-band visible-band pan-sharpened 15-m Landsat-8 satellite images (768 x 768 x 3 pixels) of Cape Hatteras, Cape Hatteras National Seashore, North Carolina, collected between Feb 15, 2015 and Sept 27, 2021. We labeled the following classes: 1) water, 2) whitewater (surf), 3) sand, 4) land (all dry land that is not sand). There are also some small clouds and shadows of clouds in the scene, all occurring above water, therefore they are labeled `water'.
References
-
Kranenburg, C.J., Ritchie, A.C., Brown, J.A., Over, J.R., Buscombe, D., Sherwood, C.R., Warrick, J.A., and Wernette, P.A., 2020, Post-Hurricane Florence aerial imagery: Cape Fear to Duck, North Carolina, October 6–8, 2018: U.S. Geological Survey data release, https://doi.org/10.5066/P91KB9SF.
-
Sherwood, C.R. and Over, J.R. and Soenen, K., 2021, Structure from motion products associated with UAS flights in Sandwich, Massachusetts: U.S. Geological Survey data release, https://doi.org/10.5066/P9BFD3YH