Skip to main content
Dryad

Patterns discovery dataset for particulate matter (pm2.5) pollution trends in Japan

Data files

Dec 12, 2024 version files 52.98 MB

Abstract

Air pollution presents a significant environmental risk, impacting human health, accelerating climate change, and disrupting ecosystems. The main aim of air pollution research is to pinpoint the most harmful pollutants identified in previous studies and to map regions exposed to high pollution levels. This study introduces a large-scale, high-quality dataset to advance the analysis of PM2.5 pollution and reveal hidden patterns through pattern mining techniques. The dataset covers five years of hourly PM2.5 measurements collected from approximately 1,900 sensors across Japan, sourced from the Ministry of the Environment's Soramame platform. This platform offers hourly pollutant records, downloadable as monthly raw data files. The unorganised raw data files are systematically organised and stored in database tables using an Entity-Relationship (ER) schema.

The primary objective of this dataset is to aid in developing and validating pattern mining models, enabling the accurate detection of frequent patterns within the PM2.5 dataset under diverse conditions. The dataset collection includes the "FINAL_DATASET" CSV file containing timestamps, sensor location IDs, and recorded PM2.5 values. Due to storage limitations, raw data files are excluded from the compressed ZIP (AEROS) file but can be accessed directly via the link provided in the README (Data). By revealing complex patterns, this dataset is a valuable resource for researchers employing pattern mining techniques in PM2.5 analysis. Publicly sharing this dataset promotes collaboration and advances efforts to identify frequently polluted sensors or regions. Researchers are invited to use and contribute to the dataset, broadening its relevance and potential impact.