Z-ENG: AI-based air pollution prediction

2025-2026 tavasz

Nincs megadva

Téma leírása

Dive into an innovative project that combines data science, machine learning, and environmental awareness by developing an Air Pollution Prediction System. This platform will analyze historical and real-time air quality data to forecast pollution levels, helping individuals and policymakers make informed decisions. Users can visualize pollution trends, receive alerts for hazardous conditions, and explore predictive models that incorporate meteorological and environmental factors.

Part 1: Project Lab (Design & MVP)

Design and implement an Air Pollution Prediction System that forecasts pollution levels for a selected city using historical monitoring data.

The main requirements include:

  • Data pipeline: download/import air quality measurements (e.g., PM2.5) from a reputable source (OpenAQ / EEA / EPA)
  • Local database (SQLite/PostgreSQL) storing measurements and station metadata
  • Forecasting models (at least two):
    • a simple baseline (e.g., “persistence”: tomorrow = today)
    • a machine learning model (e.g., linear regression / random forest / gradient boosting)
  • Clear evaluation on a time-based split using forecasting metrics (MAE/RMSE) and comparison against the baseline
  • User interface (web or desktop) showing:
    • current/latest readings
    • predicted curve for the next hours/day
    • simple alerts when predicted values exceed a chosen threshold

MVP outcome: a working system that ingests real data, produces forecasts, and visualizes them clearly for end users.


Part 2: Thesis (Enhancements & Evaluation)

  • Extend the system to be more realistic and academically stronger:
  • Add feature engineering (lag features, rolling averages, time-of-day, weekday/seasonality)
  • Extend to multiple pollutants (PM10, NO₂, O₃, etc.) as supported by the chosen dataset
  • Add an Air Quality Index (AQI) view (optional but attractive) using official AQI calculation guidance
  • Improve evaluation (walk-forward validation, error analysis by season)
  • Add interpretability (feature importance) and basic robustness (missing data handling)

Feltételek

  • Basic knowledge of: (1) Python and/or Java (2) Artificial Intelligence and Machine Learning Basics (3) Data Collection & Processing (4) Visualization & Reporting, and any other related tools

Maximális létszám: 2 fő