Global overview of cloud-, snow-, and shade-free Landsat (1982-2024) and Sentinel-2 (2015-2024) data

Lewińska, Katarzyna Ewa 1 ; Ernst, Stefan1; Frantz, David2; Leser, Ulf1; Hostert, Patrick1

Published Mar 12, 2024; Updated Apr 11, 2025 on Dryad. https://doi.org/10.5061/dryad.gb5mkkwxm

Data files

Mar 12, 2024 version files 17.97 GB

Feb 14, 2025 version files 18.67 GB

Apr 11, 2025 version files 18.67 GB

Abstract

Landsat and Sentinel-2 acquisitions are among the most frequently used medium-resolution (i.e., 10-30 m) optical data. The data are extensively used in terrestrial vegetation applications, including but not limited to, land cover and land use mapping, vegetation condition and phenology monitoring, and disturbance and change mapping. While the Landsat archives alone provide over 40 years, and counting, of continuous and consistent observations, since mid-2015 Sentinel-2 has enabled a revisit frequency of up to 2-days. Although the spatio-temporal availability of both data archives is well-known at the scene level, information on the actual availability of usable (i.e., cloud-, snow-, and shade-free) observations at the pixel level needs to be explored for each study to ensure correct parametrization of used algorithms, thus robustness of subsequent analyses. However, a priori data exploration is time and resource‑consuming, thus is rarely performed. As a result, the spatio-temporal heterogeneity of usable data is often inadequately accounted for in the analysis design, risking ill-advised selection of algorithms and hypotheses, and thus inferior quality of final results. Here we present a global dataset comprising precomputed daily availability of usable Landsat and Sentinel-2 data sampled at a pixel-level in a regular 0.18°-point grid. We based the dataset on the complete 1982-2024 Landsat surface reflectance data (Collection 2) and 2015-2024 Seninel-2 top-of-the-atmosphere reflectance scenes (pre‑Collection-1 and Collection-1). Derivation of cloud-, snow-, and shade-free observations followed the methodology developed in our recent study on data availability over Europe (Lewińska et al., 2023; https://doi.org/10.20944/preprints202308.2174.v2). Furthermore, we expanded the dataset with growing season information derived based on the 2001‑2019 time series of the yearly 500 m MODIS land cover dynamics product (MCD12Q2; Collection 6). As such, our dataset presents a unique overview of the spatio-temporal availability of usable daily Landsat and Sentinel-2 data at the global scale, hence offering much-needed a priori information aiding the identification of appropriate methods and challenges for terrestrial vegetation analyses at the local to global scales. The dataset can be viewed using the dedicated GEE App (link in Related Works).

As of February 2025 the dataset has been extended with the 2024 data.

Suggested citation: Lewińska, Katarzyna Ewa, et al. (2024). Global overview of cloud-, snow-, and shade-free Landsat (1982-2024) and Sentinel-2 (2015-2024) data [Dataset]. Dryad. https://doi.org/10.5061/dryad.gb5mkkwxm

Description of the data and file structure

A global overview of usable (i.e., cloud-, snow-, and shade-free) 1982-2024 Landsat and 2015-2024 Sentinel-2 data, derived for a regular 0.18° x 0.18°-point grid.

Dataset featured in Lewińska K.E., Ernst S., Frantz D., Leser U., Hostert P., Global Overview of Usable Landsat and Sentinel-2 Data for 1982–2023. Data in Brief 57, (2024) https://doi.org/10.1016/j.dib.2024.111054

The complete dataset comprises three .csv files:

GLOBAL_LND_1982-2024_CSO.csv: Daily data availability derived from Landsat 1982-2024 archives
GLOBAL_S2_2015-2024_CSO.csv: Daily data availability derived from Sentinel-2 2015-2024 archives
GLOBAL_GrowingSeason.csv: Growing season information for normal and leap years

Each file consists of 475,150 observations representing the global sample-point grid. Each observation is characterized by a unique identifier and coordinates (id, Lat, and Lon). The binary information on the availability of cloud-, snow-, and shade-free observation (i.e., 1 – valid observation; 0 – no data, or invalid observation) are available daily in variables named L_YYYY_MM_dd, where YYYY indicates the year, MM indicates the month, and dd indicates the day.

The GLOBAL_GrowingSeason.csv dataset comprises growing season masks (1 – within the growing season, 0 – outside the growing season, empty cells - no valid growing season) for leap and regular years. The former information is contained in columns Leap_MM_dd and the latter in columns Regular_MM_dd, where MM indicates the month, and dd indicates the day.

Sharing/Access information

The dataset was derived based on freely and openly accessible Landsat and Sentinel-2 data archives available in Google Earth Engine. We used all Landsat surface reflectance Level 2, Tier 1, Collection 2 scenes acquired with the Thematic Mapper (TM) (doi.org/10.5066/P918ROHC), Enhanced Thematic Mapper (ETM+) (https://doi.org/10.5066/P9TU80IG, and Operational Land Imager (OLI) (https://doi.org/10.5066/P975CC9B scanners between 22nd August 1982 and 31st December 2024, and Sentinel-2 TOA reflectance Level-1C scenes (pre Collection-1 (https://doi.org/10.5270/S2_-d8we2fl and Collection-1 (https://doi.org/10.5270/S2_-742ikth acquired with the MultiSpectral Instrument (MSI) between 23rd June 2015 and 31st December 2024.

Version changes

12 March 2024: Data availability comprises 1982-2023 for Landsat and 2015-2023 for Sentinel-2

11 April 2025: Data availability record has been extended with the 2024 data capturing 1982-2024 for Landsat and 2015-2024 for Sentinel-2. Consequently, the GLOBAL_LND_1982_2023_CSO dataset has been replaced with GLOBAL_LND_1982_2024_CSO and GLOBAL_S2_2015-2023_CSO with GLOBAL_S2_2015-2024_CSO. The README.md files in each tar archive have been updated accordingly.

We based our analyses on freely and openly accessible Landsat and Sentinel-2 data archives available in Google Earth Engine (Gorelick et al., 2017). We used all Landsat surface reflectance Level 2, Tier 1, Collection 2 scenes acquired with the Thematic Mapper (TM) (Earth Resources Observation And Science (EROS) Center, 1982), Enhanced Thematic Mapper (ETM+) (Earth Resources Observation And Science (EROS) Center, 1999), and Operational Land Imager (OLI) (Earth Resources Observation And Science (EROS) Center, 2013) scanners between 22^nd August 1982 and 31^st December 2024, and Sentinel-2 TOA reflectance Level-1C scenes (pre‑Collection-1 (European Space Agency, 2015, 2021) and Collection-1 (European Space Agency, 2022)) acquired with the MultiSpectral Instrument (MSI) between 23^rd June 2015 and 31^st December 2024.

We implemented a conservative pixel-quality screening to identify cloud-, snow-, and shade-free land pixels. For the Landsat time series, we relied on the inherent pixel quality bands (Foga et al., 2017; Zhu & Woodcock, 2012) excluding all pixels flagged as cloud, snow, or shadow as well as pixels with the fill-in value of 20,000 (scale factor 0.0001; (Zhang et al., 2022)). Furthermore, due to the Landsat 7 orbit drift (Qiu et al., 2021) we excluded all ETM+ scenes acquired after 31^st December 2020. Because Sentinel-2 Level-2A quality masks lack the desired scope and accuracy (Baetens et al., 2019; Coluzzi et al., 2018), we resorted to Level-1C scenes accompanied by the supporting Cloud Probability product. Furthermore, we employed a selection of conditions, including a threshold on Band 10 (SWIR-Cirrus), which is not available at Level‑2A. Overall, our Sentinel-2-specific cloud, shadow, and snow screening comprised:

exclusion of all pixels flagged as clouds and cirrus in the inherent ‘QA60’ cloud mask band;
exclusion of all pixels with cloud probability >50% as defined in the corresponding Cloud Probability product available for each scene;
exclusion of cirrus clouds (B10 reflectance >0.01);
exclusion of clouds based on Cloud Displacement Analysis (CDI<‑0.5) (Frantz et al., 2018);
exclusion of dark pixels (B8 reflectance <0.16) within cloud shadows modelled for each scene with scene‑specific sun parameters for the clouds identified in the previous steps. Here we assumed a cloud height of 2,000 m.
exclusion of pixels within a 40-m buffer (two pixels at 20-m resolution) around each identified cloud and cloud shadow object.
exclusion of snow pixels identified with a snow mask branch of the Sen2Cor processor (Main-Knorn et al., 2017).

Through applying the data screening, we generated a collection of daily availability records for Landsat and Sentinel-2 data archives. We next subsampled the resulting binary time series with a regular 0.18° x 0.18°‑point grid defined in the EPSG:4326 projection, obtaining 475,150 points located over land between ‑179.8867°W and 179.5733°E and 83.50834°N and ‑59.05167°S. Owing to the substantial amount of data comprised in the Landsat and Sentinel-2 archives and the computationally demanding process of cloud-, snow-, and shade-screening, we performed the subsampling in batches corresponding to a 4° x 4° regular grid and consolidated the final data in post-processing.

We derived the pixel-specific growing season information from the 2001-2019 time series of the yearly 500‑m MODIS land cover dynamics product (MCD12Q2; Collection 6) available in Google Earth Engine. We only used information on the start and the end of a growing season, excluding all pixels with quality below ‘best’. When a pixel went through more than one growing cycle per year, we approximated a growing season as the period between the beginning of the first growing cycle and the end of the last growing cycle. To fill in data gaps arising from low-quality data and insufficiently pronounced seasonality (Friedl et al., 2019), we used a 5x5 mean moving window filter to ensure better spatial continuity of our growing season datasets. Following (Lewińska et al., 2023), we defined the start of the season as the pixel-specific 25^th percentile of the 2001-2019 distribution for the start of the season dates, and the end of the season as the pixel-specific 75^th percentile of the 2001-2019 distribution for end of the season dates. Finally, we subsampled the start and end of the season datasets with the same regular 0.18° x 0.18°-point grid defined in the EPSG:4326 projection.

References:

Baetens, L., Desjardins, C., & Hagolle, O. (2019). Validation of Copernicus Sentinel-2 Cloud Masks Obtained from MAJA, Sen2Cor, and FMask Processors Using Reference Cloud Masks Generated with a Supervised Active Learning Procedure. Remote Sensing, 11(4), 433. https://doi.org/10.3390/rs11040433
Coluzzi, R., Imbrenda, V., Lanfredi, M., & Simoniello, T. (2018). A first assessment of the Sentinel-2 Level 1-C cloud mask product to support informed surface analyses. Remote Sensing of Environment, 217, 426–443. https://doi.org/10.1016/j.rse.2018.08.009
Earth Resources Observation And Science (EROS) Center. (1982). Collection-2 Landsat 4-5 Thematic Mapper (TM) Level-1 Data Products [Other]. U.S. Geological Survey. https://doi.org/10.5066/P918ROHC
Earth Resources Observation And Science (EROS) Center. (1999). Collection-2 Landsat 7 Enhanced Thematic Mapper Plus (ETM+) Level-1 Data Products [dataset]. U.S. Geological Survey. https://doi.org/10.5066/P9TU80IG
Earth Resources Observation And Science (EROS) Center. (2013). Collection-2 Landsat 8-9 OLI (Operational Land Imager) and TIRS (Thermal Infrared Sensor) Level-1 Data Products [Other]. U.S. Geological Survey. https://doi.org/10.5066/P975CC9B
European Space Agency. (2015). Sentinel-2 MSI Level-1C TOA Reflectance [dataset]. European Space Agency. https://doi.org/10.5270/S2_-d8we2fl
European Space Agency. (2021). Sentinel-2 MSI Level-1C TOA Reflectance, Collection 0 [dataset]. European Space Agency. https://doi.org/10.5270/S2_-d8we2fl
European Space Agency. (2022). Sentinel-2 MSI Level-1C TOA Reflectance [dataset]. European Space Agency. https://doi.org/10.5270/S2_-742ikth
Foga, S., Scaramuzza, P. L., Guo, S., Zhu, Z., Dilley, R. D., Beckmann, T., Schmidt, G. L., Dwyer, J. L., Joseph Hughes, M., & Laue, B. (2017). Cloud detection algorithm comparison and validation for operational Landsat data products. Remote Sensing of Environment, 194, 379–390. https://doi.org/10.1016/j.rse.2017.03.026
Frantz, D., Haß, E., Uhl, A., Stoffels, J., & Hill, J. (2018). Improvement of the Fmask algorithm for Sentinel-2 images: Separating clouds from bright surfaces based on parallax effects. Remote Sensing of Environment, 215, 471–481. https://doi.org/10.1016/j.rse.2018.04.046
Friedl, M., Josh, G., & Sulla-Menashe, D. (2019). MCD12Q2 MODIS/Terra+Aqua Land Cover Dynamics Yearly L3 Global 500m SIN Grid V006 [dataset]. NASA EOSDIS Land Processes DAAC. https://doi.org/10.5067/MODIS/MCD12Q2.006
Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18–27. https://doi.org/10.1016/j.rse.2017.06.031
Lewińska K.E., Ernst S., Frantz D., Leser U., Hostert P., Global Overview of Usable Landsat and Sentinel-2 Data for 1982–2023. Data in Brief 57, (2024) https://doi.org/10.1016/j.dib.2024.111054
Main-Knorn, M., Pflug, B., Louis, J., Debaecker, V., Müller-Wilm, U., & Gascon, F. (2017). Sen2Cor for Sentinel-2. In L. Bruzzone, F. Bovolo, & J. A. Benediktsson (Eds.), Image and Signal Processing for Remote Sensing XXIII (p. 3). SPIE. https://doi.org/10.1117/12.2278218
Qiu, S., Zhu, Z., Shang, R., & Crawford, C. J. (2021). Can Landsat 7 preserve its science capability with a drifting orbit? Science of Remote Sensing, 4, 100026. https://doi.org/10.1016/j.srs.2021.100026
Zhang, Y., Woodcock, C. E., Arévalo, P., Olofsson, P., Tang, X., Stanimirova, R., Bullock, E., Tarrio, K. R., Zhu, Z., & Friedl, M. A. (2022). A Global Analysis of the Spatial and Temporal Variability of Usable Landsat Observations at the Pixel Scale. Frontiers in Remote Sensing, 3, 894618. https://doi.org/10.3389/frsen.2022.894618
Zhu, Z., & Woodcock, C. E. (2012). Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sensing of Environment, 118, 83–94. https://doi.org/10.1016/j.rse.2011.10.028

Global overview of cloud-, snow-, and shade-free Landsat (1982-2024) and Sentinel-2 (2015-2024) data

Data files

Abstract

README: Global overview of cloud-, snow-, and shade-free Landsat (1982-2024) and Sentinel-2 (2015-2024) data

Description of the data and file structure

Sharing/Access information

Version changes

Methods

Works referencing this dataset