Usable observations over Europe: Evaluation of compositing windows for landsat and sentinel-2 time series
Data files
Jul 08, 2024 version files 21.94 MB
-
dryad.5tb2rbp94.7z
21.93 MB
-
README.md
5.13 KB
Abstract
Landsat and Sentinel-2 data archives provide ever-increasing amounts of satellite data. However, the availability of usable observations greatly varies spatially and temporally. Pixel-based compositing that generates temporally equidistant cloud-free synthetic images can mitigate temporal variability, by constructing uninterrupted time series using different compositing windows. Here, we evaluated the feasibility of using compositing windows ranging from five days to one year for 1984-2021 Landsat and 2015-2021 Sentinel 2 time series to derive uninterrupted time series across Europe. We considered separate and joint use of both data archives and analyzed the spatio-temporal availability of composites during each calendar year and pixel-specific growing season across a variety of time windows and hypothesizing data interpolation. Our results demonstrated opportunities and limitations in the available data records to support medium- and long-term analyses requiring uninterrupted time series of composites with sub-annual temporal resolution. Spatial disparities across different compositing windows provide guidance on the feasibility of workflows relying on different data densities and on the challenges in wall-to-wall analyses. The feasibility of consistent time series based on composites with sub-monthly aggregation periods was mostly limited to the combined Landsat and Sentinel-2 archives after 2015, yet in some geographies requires interpolation of up to 50% of data.
Data description:
Daily data availability records over Europe produced for the paper:
Lewińska, K.E., Frantz D., Leser U., Hostert, P., 2024, Usable observations over Europe: evaluation of compositing windows for
Landsat and Sentinel-2 time series, European Journal of Remote Sensing, http://dx.doi.org/10.1080/22797254.2024.2372855.
The datasets are distributed as csv files:
DailyData.csv - comprising 1984-2021 daily data record from Landsat and Sentinel-2;
DailyData_L.csv - comprising 1984-2021 daily data record from Landsat;
DailyData_S2.csv - comprising 1984-2021 daily data record from Sentinel-2;
DailyData_CEF_MAKS.csv - comprising daily masks of growing season for a leap year 2020
Each file comprises the following columns:
- id: unique point id
- LAT: Latitude coordinates in LAEA projections
- LON: Longitude coordinates in LAEA projections
- Lat: Longitude coordinates in geographical coordinates system
- Lon: Longitude coordinates in geographical coordinates system
- daily data with column names following the L_YYYY_MM_DD pattern where YYYY signifies a year, MM a month and DD a day of observation.
- auxiliary columns like ‘top’, ‘bottom’, ‘left’, ‘right’ etc.
Furthermore, the DailyData_CEF_MAKS.csv
file comprises the following columns:
- SOS : day of the year identified as start of the first phenological season
- EOS : day of the year identified as end of the season last phenological season
- SL : the length of the season calculated in days between SOS and EOS
The complete methodology description is available in the paper featuring the data: (Lewińska et al. 2024)
To cast the csvs to gridded data (e.g., geoTIFF) the following projection definition needs to be used:
"+proj=laea +lat_0=52 +lon_0=10 +x_0=4331000 +y_0=3200000 +ellps=GRS80 +units=m +no_defs"
Which will result in a raster layer with the following size and coordinates:
Size is 233, 197
Coordinate System is:
PROJCRS["unknown",
BASEGEOGCRS["unknown",
DATUM["Unknown based on GRS80 ellipsoid",
ELLIPSOID["GRS 1980",6378137,298.257222101004,
LENGTHUNIT["metre",1],
ID["EPSG",7019]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]]],
CONVERSION["Lambert Azimuthal Equal Area",
METHOD["Lambert Azimuthal Equal Area",
ID["EPSG",9820]],
PARAMETER["Latitude of natural origin",52,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",10,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["False easting",4331000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",3200000,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]]]
Data axis to CRS axis mapping: 1,2
Origin = (2646541.000000000000000,5395879.000000000000000)
Pixel Size = (20000.000000000000000,-20000.000000000000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
COMPRESSION=LZW
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left ( 2646541.000, 5395879.000) ( 31d14' 0.04"W, 67d 3' 5.89"N)
Lower Left ( 2646541.000, 1455879.000) ( 8d18'26.72"W, 34d25'54.27"N)
Upper Right ( 7306541.000, 5395879.000) ( 70d23'53.48"E, 59d11'39.47"N)
Lower Right ( 7306541.000, 1455879.000) ( 41d28' 0.11"E, 30d41'15.61"N)
Center ( 4976541.000, 3425879.000) ( 19d47'33.92"E, 53d38'13.43"N)
Sharing & Access information
CC license: CC 00
Data are available at datadryad.org as: https://doi.org/10.5061/dryad.5tb2rbp94
and through a GEE App: https://katarzynaelewinska.users.earthengine.app/view/europedataval
The data were derived based on freely available Landsat surface reflectance Level 2, Tier 1 (Collection 2) scenes from 1984 through 2021 and Sentinel-2 TOA reflectance Level-1C (pre Collection-1; European Space Agency, 2021).
The growing season was defined using 2001-2019 time series of the yearly 500 m MODIS land cover dynamics product (MCD12Q2; Collection 6)
Disclaimer:
The authors accept no responsibility for errors or omissions in this work and shall not be liable for any damage caused by these.
Funding Information
We gratefully acknowledge support from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 414984028 – SFB 1404 FONDA.
Please cite the data as:
https://doi.org/10.5061/dryad.5tb2rbp94 and http://dx.doi.org/10.1080/22797254.2024.2372855.
We used all Landsat surface reflectance Level 2, Tier 1 (Collection 2) scenes from 1984 through 2021 and Sentinel-2 TOA reflectance Level-1C (pre‑Collection-1; European Space Agency, 2021) scenes from 2016 through 2021 acquired over Europe, as available in Google Earth Engine (data accessed in June 2022; Gorelick et al., 2017). We utilized Seninel-2 Level-1C data instead of Level-2A because the Level-2A inherent quality data lack the desired scope and accuracy (Baetens et al., 2019; Coluzzi et al., 2018). Yet, the Level-1C products are accompanied by cloud probabilities (Zupanc, 2017) facilitating improved cloud screening. Furthermore, for cloud screening we also used Band 10 (Cirrus), which is not available as Level‑2A. Because we performed data availability analyses, i.e., we tallied the daily presence/absence of usable observation, the disparity between Landsat and Sentinel-2 reflectance values was here irrelevant, and the intra-sensor normalization was not needed. The difference in processing levels, however, played out in cloud, shadow, and snow masking accuracy, where the Sentinel-2 workflow assembles several approaches with known accuracies (Skakun et al., 2022), but has not been evaluated as a whole. We acknowledge that for real-life reflectance-based applications, data from corresponding processing levels need to be used and the reflectance normalized among the sensors (Okujeni et al., 2024). We recommend thus either preprocessing of the Sentinel-2 TOA data to achieve the desired quality of masks, or linking Sentinel-2 Level-A2 data with Level-1C band 10 and relevant Cloud Probability scenes for more rigorous cloud screening.
To ensure that only pixels with the highest quality entered the analysis we applied conservative pixel-quality screening. For Landsat scenes, we excluded all pixels flagged as cloud, shadow, or snow using the inherent pixel quality bands (Foga et al., 2017; Z. Zhu & Woodcock, 2012) and discarded saturated pixels (Zhang et al., 2022). We further used the quality bands to exclude all data gaps in the Landsat 7 acquisitions occurring due to the SLC scanline failure (Andréfouët et al., 2003). Although the accuracy of the inherent pixel‑quality bands differs among the Landsat sensors due to the differences in the sensor’s build and thus availability of thermal and cirrus‑specific bands (Foga et al., 2017), the Landsat quality bands are acclaimed standardized quality product. Finally, owing to Landsat 7’s orbit drift (Qiu et al., 2021), we excluded all ETM+ scenes acquired after 31st December 2020.
We used a 20‑km grid of 16,642 equidistant points to analyze the availability of useable Landsat and Sentinel-2 observations over Europe. We distributed points according to the Lambert azimuthal equal-area projection (LAEA, EPSG:3035), which is the preferred projection for EU-wide products. Despite LAEA being the equal-area projection, the distance distortion within our study area was mostly below 10 m, which is less than one pixel in high‑resolution Sentinel-2 bands. The systematic gridded sampling design ensured good representation of the West-East and South‑North climatic and phenological gradients, and facilitated graphical presentation of results.
We derived the time series of usable Landsat and Sentinel-2 observations over Europe sampling individual pixels spaced systematically every 20 km in the latitudinal and longitudinal directions. We identified sampling locations according to the Lambert azimuthal equal-area projection (LAEA, EPSG:3035), which is the preferred projection for EU‑wide products. Despite LAEA being the equal-area projection, the distance distortion within our study area was mostly below 10 m, which is less than one pixel in high‑resolution Sentinel-2 bands. The systematic point sampling design is used to derive overview statistics for big datasets and in nearest neighbor-based rescaling of rasters. The 20-km sampling interval resulted in 16,642 locations over land ensuring good representation of the West-East and South‑North climatic and phenological gradients, as well as facilitating graphical presentation of results.
For each sampled pixel we recorded the date of the valid cloud-, shadow, and snow-free Landsat and Sentinel-2 acquisition. We used the information at the original resolution and assumed each sampled pixel to be a probabilistic sample of the surrounding 20x20-km area, making the process analogous to the nearest neighbor resampling. We excluded duplicated data entries coming from the vertical overlaps among Landsat tiles in the same row, and vertical and horizontal overlaps among Sentinel-2 granules from the same swath. This resulted in daily data availability for 1984-2021 (1 – valid observation; 0 – no data or no valid observation), which we used to derive availability information for composites with aggregation periods of five, 10, 15, 20, and 25 days; one, two, three, four, six and 12 months. The non-overlapping compositing windows compartmentalized daily information for each year into 73, 37, 24, 18, 15, 12, six, four, three, two, and one composites for each calendar year, respectively. We used January 1st as the starting date for the compositing window sequence for each year. When the last compositing window was shorter than half its window width, we merged it with the penultimate composite. For each data point and every considered aggregation period we recorded the amount of available observations and considered a composite as ‘successful’ if at least one valid observation was available.
The data are distributed as a csv and GeoTIFF (both formats comprising exactly the same information) and can be open and query using any software able to handle these data formats.