Global overview of cloud-, snow-, and shade-free Landsat (1982-2023) and Sentinel-2 (2015-2023) data
Data files
Mar 12, 2024 version files 17.97 GB
-
GLOBAL_GrowingSeason.tar
-
GLOBAL_LND_1982-2023_CSO.tar
-
GLOBAL_S2_2015-2023_CSO.tar
-
README.md
Abstract
Landsat and Sentinel-2 acquisitions are among the most frequently used medium-resolution (i.e., 10-30 m) optical data. The data are extensively used in terrestrial vegetation applications, including but not limited to, land cover and land use mapping, vegetation condition and phenology monitoring, and disturbance and change mapping. While the Landsat archives alone provide over 40 years, and counting, of continuous and consistent observations, since mid-2015 Sentinel-2 has enabled a revisit frequency of up to 2-days. Although the spatio-temporal availability of both data archives is well-known at the scene level, information on the actual availability of usable (i.e., cloud-, snow-, and shade-free) observations at the pixel level needs to be explored for each study to ensure correct parametrization of used algorithms, thus robustness of subsequent analyses. However, a priori data exploration is time and resource‑consuming, thus is rarely performed. As a result, the spatio-temporal heterogeneity of usable data is often inadequately accounted for in the analysis design, risking ill-advised selection of algorithms and hypotheses, and thus inferior quality of final results. Here we present a global dataset comprising precomputed daily availability of usable Landsat and Sentinel-2 data sampled at a pixel-level in a regular 0.18°-point grid. We based the dataset on the complete 1982-2023 Landsat surface reflectance data (Collection 2) and 2015-2023 Seninel-2 top-of-the-atmosphere reflectance scenes (pre‑Collection-1 and Collection-1). Derivation of cloud-, snow-, and shade-free observations followed the methodology developed in our recent study on data availability over Europe (Lewińska et al., 2023; https://doi.org/10.20944/preprints202308.2174.v2). Furthermore, we expanded the dataset with growing season information derived based on the 2001‑2019 time series of the yearly 500 m MODIS land cover dynamics product (MCD12Q2; Collection 6). As such, our dataset presents a unique overview of the spatio-temporal availability of usable daily Landsat and Sentinel-2 data at the global scale, hence offering much-needed a priori information aiding the identification of appropriate methods and challenges for terrestrial vegetation analyses at the local to global scales. The dataset can be viewed using the dedicated GEE App (link in Related Works).
README: Global overview of cloud-, snow-, and shade-free Landsat (1982-2023) and Sentinel-2 (2015-2023) data
Suggested citation: Lewińska, Katarzyna Ewa, et al. (Forthcoming 2024). Global overview of cloud-, snow-, and shade-free Landsat (1982-2023) and Sentinel-2 (2015-2023) data [Dataset]. Dryad. https://doi.org/10.5061/dryad.gb5mkkwxm
Description of the data and file structure
A global overview of usable (i.e., cloud-, snow-, and shade-free) 1982-2023 Landsat and 2015-2023 Sentinel-2 data, derived for a regular 0.18° x 0.18°-point grid.
Dataset featured in Lewińska, K.E., Ernst, S., Frantz D., Leser U., Hostert P., (year), Global overview of usable Landsat and Sentinel-2 data for 1982-2023, Submitted to Data in Brief
The complete dataset comprises three .csv files:
- GLOBAL_LND_1982-2023_CSO.csv: Daily data availability derived from Landsat 1982-2023 archives
- GLOBAL_S2_2015-2023_CSO.csv: Daily data availability derived from Sentinel-2 2015-2023 archives
- GLOBAL_GrowingSeason.csv: Growing season information for normal and leap years
Each file consists of 475,150 observations representing the global sample-point grid. Each observation is characterized by a unique identifier and coordinates (id, Lat, and Lon). The binary information on the availability of cloud-, snow-, and shade-free observation (i.e., 1 – valid observation; 0 – no data, or invalid observation) are available daily in variables named L_YYYY_MM_dd, where YYYY indicates the year, MM indicates the month, and dd indicates the day.
The GLOBAL_GrowingSeason.csv dataset comprises growing season masks (1 – within the growing season, 0 – outside the growing season, empty cells - no valid growing season) for leap and regular years. The former information is contained in columns Leap_MM_dd and the latter in columns Regular_MM_dd, where MM indicates the month, and dd indicates the day.
Sharing/Access information
The dataset was derived based on freely and openly accessible Landsat and Sentinel-2 data archives available in Google Earth Engine. We used all Landsat surface reflectance Level 2, Tier 1, Collection 2 scenes acquired with the Thematic Mapper (TM) (doi.org/10.5066/P918ROHC), Enhanced Thematic Mapper (ETM+) (https://doi.org/10.5066/P9TU80IG, and Operational Land Imager (OLI) (https://doi.org/10.5066/P975CC9B scanners between 22nd August 1982 and 31st December 2023, and Sentinel-2 TOA reflectance Level-1C scenes (pre Collection-1 (https://doi.org/10.5270/S2_-d8we2fl and Collection-1 (https://doi.org/10.5270/S2_-742ikth acquired with the MultiSpectral Instrument (MSI) between 23rd June 2015 and 31st December 2023.
Methods
We based our analyses on freely and openly accessible Landsat and Sentinel-2 data archives available in Google Earth Engine (Gorelick et al., 2017). We used all Landsat surface reflectance Level 2, Tier 1, Collection 2 scenes acquired with the Thematic Mapper (TM) (Earth Resources Observation And Science (EROS) Center, 1982), Enhanced Thematic Mapper (ETM+) (Earth Resources Observation And Science (EROS) Center, 1999), and Operational Land Imager (OLI) (Earth Resources Observation And Science (EROS) Center, 2013) scanners between 22nd August 1982 and 31st December 2023, and Sentinel-2 TOA reflectance Level-1C scenes (pre‑Collection-1 (European Space Agency, 2015, 2021) and Collection-1 (European Space Agency, 2022)) acquired with the MultiSpectral Instrument (MSI) between 23rd June 2015 and 31st December 2023.
We implemented a conservative pixel-quality screening to identify cloud-, snow-, and shade-free land pixels. For the Landsat time series, we relied on the inherent pixel quality bands (Foga et al., 2017; Zhu & Woodcock, 2012) excluding all pixels flagged as cloud, snow, or shadow as well as pixels with the fill-in value of 20,000 (scale factor 0.0001; (Zhang et al., 2022)). Furthermore, due to the Landsat 7 orbit drift (Qiu et al., 2021) we excluded all ETM+ scenes acquired after 31st December 2020. Because Sentinel-2 Level-2A quality masks lack the desired scope and accuracy (Baetens et al., 2019; Coluzzi et al., 2018), we resorted to Level-1C scenes accompanied by the supporting Cloud Probability product. Furthermore, we employed a selection of conditions, including a threshold on Band 10 (SWIR-Cirrus), which is not available at Level‑2A. Overall, our Sentinel-2-specific cloud, shadow, and snow screening comprised:
- exclusion of all pixels flagged as clouds and cirrus in the inherent ‘QA60’ cloud mask band;
- exclusion of all pixels with cloud probability >50% as defined in the corresponding Cloud Probability product available for each scene;
- exclusion of cirrus clouds (B10 reflectance >0.01);
- exclusion of clouds based on Cloud Displacement Analysis (CDI<‑0.5) (Frantz et al., 2018);
- exclusion of dark pixels (B8 reflectance <0.16) within cloud shadows modelled for each scene with scene‑specific sun parameters for the clouds identified in the previous steps. Here we assumed a cloud height of 2,000 m.
- exclusion of pixels within a 40-m buffer (two pixels at 20-m resolution) around each identified cloud and cloud shadow object.
- exclusion of snow pixels identified with a snow mask branch of the Sen2Cor processor (Main-Knorn et al., 2017).
Through applying the data screening, we generated a collection of daily availability records for Landsat and Sentinel-2 data archives. We next subsampled the resulting binary time series with a regular 0.18° x 0.18°‑point grid defined in the EPSG:4326 projection, obtaining 475,150 points located over land between ‑179.8867°W and 179.5733°E and 83.50834°N and ‑59.05167°S. Owing to the substantial amount of data comprised in the Landsat and Sentinel-2 archives and the computationally demanding process of cloud-, snow-, and shade-screening, we performed the subsampling in batches corresponding to a 4° x 4° regular grid and consolidated the final data in post-processing.
We derived the pixel-specific growing season information from the 2001-2019 time series of the yearly 500‑m MODIS land cover dynamics product (MCD12Q2; Collection 6) available in Google Earth Engine. We only used information on the start and the end of a growing season, excluding all pixels with quality below ‘best’. When a pixel went through more than one growing cycle per year, we approximated a growing season as the period between the beginning of the first growing cycle and the end of the last growing cycle. To fill in data gaps arising from low-quality data and insufficiently pronounced seasonality (Friedl et al., 2019), we used a 5x5 mean moving window filter to ensure better spatial continuity of our growing season datasets. Following (Lewińska et al., 2023), we defined the start of the season as the pixel-specific 25th percentile of the 2001-2019 distribution for the start of the season dates, and the end of the season as the pixel-specific 75th percentile of the 2001-2019 distribution for end of the season dates. Finally, we subsampled the start and end of the season datasets with the same regular 0.18° x 0.18°-point grid defined in the EPSG:4326 projection.
References:
- Baetens, L., Desjardins, C., & Hagolle, O. (2019). Validation of Copernicus Sentinel-2 Cloud Masks Obtained from MAJA, Sen2Cor, and FMask Processors Using Reference Cloud Masks Generated with a Supervised Active Learning Procedure. Remote Sensing, 11(4), 433. https://doi.org/10.3390/rs11040433
- Coluzzi, R., Imbrenda, V., Lanfredi, M., & Simoniello, T. (2018). A first assessment of the Sentinel-2 Level 1-C cloud mask product to support informed surface analyses. Remote Sensing of Environment, 217, 426–443. https://doi.org/10.1016/j.rse.2018.08.009
- Earth Resources Observation And Science (EROS) Center. (1982). Collection-2 Landsat 4-5 Thematic Mapper (TM) Level-1 Data Products [Other]. U.S. Geological Survey. https://doi.org/10.5066/P918ROHC
- Earth Resources Observation And Science (EROS) Center. (1999). Collection-2 Landsat 7 Enhanced Thematic Mapper Plus (ETM+) Level-1 Data Products [dataset]. U.S. Geological Survey. https://doi.org/10.5066/P9TU80IG
- Earth Resources Observation And Science (EROS) Center. (2013). Collection-2 Landsat 8-9 OLI (Operational Land Imager) and TIRS (Thermal Infrared Sensor) Level-1 Data Products [Other]. U.S. Geological Survey. https://doi.org/10.5066/P975CC9B
- European Space Agency. (2015). Sentinel-2 MSI Level-1C TOA Reflectance [dataset]. European Space Agency. https://doi.org/10.5270/S2_-d8we2fl
- European Space Agency. (2021). Sentinel-2 MSI Level-1C TOA Reflectance, Collection 0 [dataset]. European Space Agency. https://doi.org/10.5270/S2_-d8we2fl
- European Space Agency. (2022). Sentinel-2 MSI Level-1C TOA Reflectance [dataset]. European Space Agency. https://doi.org/10.5270/S2_-742ikth
- Foga, S., Scaramuzza, P. L., Guo, S., Zhu, Z., Dilley, R. D., Beckmann, T., Schmidt, G. L., Dwyer, J. L., Joseph Hughes, M., & Laue, B. (2017). Cloud detection algorithm comparison and validation for operational Landsat data products. Remote Sensing of Environment, 194, 379–390. https://doi.org/10.1016/j.rse.2017.03.026
- Frantz, D., Haß, E., Uhl, A., Stoffels, J., & Hill, J. (2018). Improvement of the Fmask algorithm for Sentinel-2 images: Separating clouds from bright surfaces based on parallax effects. Remote Sensing of Environment, 215, 471–481. https://doi.org/10.1016/j.rse.2018.04.046
- Friedl, M., Josh, G., & Sulla-Menashe, D. (2019). MCD12Q2 MODIS/Terra+Aqua Land Cover Dynamics Yearly L3 Global 500m SIN Grid V006 [dataset]. NASA EOSDIS Land Processes DAAC. https://doi.org/10.5067/MODIS/MCD12Q2.006
- Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18–27. https://doi.org/10.1016/j.rse.2017.06.031
- Lewińska, K. E., Frantz, D., Leser, U., & Hostert, P. (2023). Usable Observations over Europe: Evaluation of Compositing Windows for Landsat and Sentinel-2 Time Series [Preprint]. Environmental and Earth Sciences. https://doi.org/10.20944/preprints202308.2174.v2
- Main-Knorn, M., Pflug, B., Louis, J., Debaecker, V., Müller-Wilm, U., & Gascon, F. (2017). Sen2Cor for Sentinel-2. In L. Bruzzone, F. Bovolo, & J. A. Benediktsson (Eds.), Image and Signal Processing for Remote Sensing XXIII (p. 3). SPIE. https://doi.org/10.1117/12.2278218
- Qiu, S., Zhu, Z., Shang, R., & Crawford, C. J. (2021). Can Landsat 7 preserve its science capability with a drifting orbit? Science of Remote Sensing, 4, 100026. https://doi.org/10.1016/j.srs.2021.100026
- Zhang, Y., Woodcock, C. E., Arévalo, P., Olofsson, P., Tang, X., Stanimirova, R., Bullock, E., Tarrio, K. R., Zhu, Z., & Friedl, M. A. (2022). A Global Analysis of the Spatial and Temporal Variability of Usable Landsat Observations at the Pixel Scale. Frontiers in Remote Sensing, 3, 894618. https://doi.org/10.3389/frsen.2022.894618
- Zhu, Z., & Woodcock, C. E. (2012). Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sensing of Environment, 118, 83–94. https://doi.org/10.1016/j.rse.2011.10.028