A comprehensive Python-based platform for building energy analysis, including:
- Building Energy Benchmarking - Compare building performance against industry standards
- HVAC Fault Detection System - Advanced anomaly detection for HVAC systems
- ETL Pipeline: Extract, Transform, Load functionality for building energy data
- Benchmarking Engine: Calculate Energy Use Intensity (EUI) and compare buildings
- RESTful API: FastAPI-based web service for accessing benchmarking functionality
- Sample Data Generation: Generate synthetic building energy data for testing
- Synthetic Data Generation: Realistic HVAC sensor data with labeled faults
- Dual Anomaly Detection: Rules-based + ML-based (Isolation Forest) detection
- PostgreSQL Database: Efficient storage and querying of detected anomalies
- REST API: FastAPI endpoints for programmatic access to alerts
- Interactive Dashboard: Real-time Streamlit dashboard with visualizations
- 4 Fault Types: Clogged filter, compressor failure, temperature drift, oscillating control
This HVAC component is a slim demo aligned with the dedicated HVAC-Fault-Detection-with-Anomaly-Pipeline repository; use that project for deeper fault-detection workflows.
├── api/ # Original benchmarking API
│ ├── __init__.py
│ └── main.py # API endpoints and server
├── benchmarking/ # Core benchmarking logic
│ ├── __init__.py
│ └── model.py # Benchmarking functions and models
├── src/ # HVAC Fault Detection System (NEW!)
│ ├── __init__.py
│ ├── generate_hvac_data.py # Synthetic HVAC data generation
│ ├── pipeline_batch.py # ETL pipeline with feature engineering
│ ├── models.py # Anomaly detection (rules + ML)
│ ├── db.py # PostgreSQL database interface
│ ├── api.py # FastAPI service for alerts
│ └── dashboard_app.py # Streamlit interactive dashboard
├── data/ # Data storage
│ ├── raw/ # Raw HVAC data (86,400 records)
│ └── processed/ # Processed features and anomalies
├── notebooks/ # Jupyter notebooks for analysis
├── sample_data/ # Sample benchmarking data
├── tests/ # Test suite for benchmarking + HVAC components (run `make test`)
│ ├── test_benchmarking.py
│ ├── test_generate_hvac_data.py
│ ├── test_pipeline_batch.py
│ ├── test_models.py
│ └── test_db.py
├── docker-compose.yml # PostgreSQL setup for HVAC system
├── generate_sample_data.py # Original ETL script
├── requirements.txt # Python dependencies
├── README.md # This file
└── HVAC_README.md # Detailed HVAC system documentation
Get started with the project in 4 simple steps:
# 0. Create and activate virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# 1. Install dependencies
make install
# Or: pip install -r requirements.txt
# 2. Generate sample data (one command for all systems!)
make sample-data
# 3. Run tests to verify everything works
make testThat's it! You now have all sample data generated and can start using the system.
Run the full pipeline and export data for the frontend demo:
# 1. Setup and activate virtual environment
python -m venv .venv
source .venv/bin/activate
# 2. Install dependencies
make install
# 3. Generate all sample data (benchmarking + HVAC)
make sample-data
# 4. Export canonical JSON for frontend
make export-json
# 5. Validate JSON schema
make validate-jsonArtifact paths:
| Output | Path | Size |
|---|---|---|
| Benchmarking data (demo slice) | sample_data/buildings.csv |
~5 KB (6 buildings) |
| HVAC raw data | data/raw/hvac_raw.parquet |
~5 MB |
| HVAC processed | data/processed/hvac_features.parquet |
~8 MB |
| Anomalies | data/processed/anomalies.parquet |
~500 KB |
| Frontend JSON | artifacts/json/building_benchmarking.json |
~5 KB |
Sync to frontend demo:
cd ../energy-pipeline-demo && ./sync-data.sh-
Clone the repository:
git clone https://github.com/shahabsalehi/Sustainable-Building-Energy-Benchmarking-Pipeline.git cd Sustainable-Building-Energy-Benchmarking-Pipeline -
Create virtual environment (recommended):
python -m venv .venv source .venv/bin/activate # Linux/macOS # .venv\Scripts\activate # Windows
-
Install dependencies:
make install # Or: pip install -r requirements.txt
Generate all sample data with a single command:
make sample-dataThis generates:
- Benchmarking data: a compact demo slice (6 buildings) in
sample_data/by default; use generator flags to produce larger sets if needed - HVAC data: 30 days of sensor data (86,400 records) with fault detection in
data/
Alternatively, generate data for each system separately:
# Benchmarking data only
python generate_sample_data.py
# HVAC data only
python src/generate_hvac_data.py
python src/pipeline_batch.py
python src/models.pyStart the FastAPI server using Make:
make run-apiOr directly:
python -m uvicorn api.main:app --reload
# Or: python api/main.pyThe API will be available at http://localhost:8000 with interactive docs at http://localhost:8000/docs
- GET / - API information and available endpoints
- GET /health - Health check endpoint
- POST /benchmark - Benchmark a building's energy performance
curl -X POST "http://localhost:8000/benchmark" \
-H "Content-Type: application/json" \
-d '{
"building_id": "B001",
"area": 1000,
"energy_consumption": 50000,
"building_type": "office"
}'{
"building_id": "B001",
"eui": 50.0,
"performance_rating": "Good",
"recommendations": [
"Consider LED lighting upgrades",
"Review HVAC system efficiency",
"Implement building automation system"
]
}from benchmarking.model import benchmark_building
building_data = {
'building_id': 'B001',
'area': 1000, # square meters
'energy_consumption': 50000, # kWh per year
'building_type': 'office'
}
result = benchmark_building(building_data)
print(f"EUI: {result['eui']} kWh/m²/year")
print(f"Rating: {result['performance_rating']}")Run all tests with a single command:
make testOr with coverage report:
make test-covOr directly with pytest:
# All tests
pytest tests/ -v
# With coverage
pytest tests/ --cov=benchmarking --cov=api --cov=src --cov-report=html
# Specific test modules
pytest tests/test_benchmarking.py -v
pytest tests/test_generate_hvac_data.py -v
pytest tests/test_pipeline_batch.py -v
pytest tests/test_models.py -v
pytest tests/test_db.py -vThe HVAC Fault Detection System is a complete anomaly detection pipeline for commercial building HVAC systems.
make docker-up
# Or: docker-compose up -dmake sample-dataThis automatically runs the complete HVAC pipeline (data generation, ETL, anomaly detection).
python src/db.pypython src/api.pyAccess the API at http://localhost:8000 and interactive docs at http://localhost:8000/docs
make run-dashboard
# Or: streamlit run src/dashboard_app.pyAccess the dashboard at http://localhost:8501
- Synthetic Data: 86,400 realistic HVAC sensor records with 4 fault types
- ETL Pipeline: 28 features including rolling statistics and lag features
- Anomaly Detection:
- Rules-based: 4 detection rules (temp_drift, clogged_filter, compressor_failure, oscillating_control)
- ML-based: Isolation Forest trained on normal data
- Total: ~3,209 anomalies detected
- PostgreSQL Database: Efficient storage with indexed queries
- REST API: 3 endpoints for querying alerts and summaries
- Interactive Dashboard: Real-time monitoring with time-series plots and analytics
# Get high-severity alerts
curl "http://localhost:8000/alerts?severity=high&limit=100"
# Get alerts for specific zone
curl "http://localhost:8000/alerts?zone_id=Z1&start=2024-01-01T00:00:00&end=2024-01-31T23:59:59"
# Get summary statistics
curl "http://localhost:8000/alerts/summary"-
4 Interactive Tabs:
- Time Series: View sensor data with anomaly markers
- Anomaly Table: Sortable, filterable table with CSV export
- Analytics: Charts and distributions (severity, rules, zones, trends)
- About: System documentation
-
Filters: Time range, zone, severity, detection rule
-
Visualizations: 6 interactive charts using Plotly
-
Real-time: Auto-refresh with 60-second cache
For complete HVAC system documentation, see HVAC_README.md
For convenience, a Makefile is provided with common commands:
make help # Show all available commands
make install # Install dependencies
make sample-data # Generate all sample data
make test # Run all tests
make test-cov # Run tests with coverage
make docker-up # Start PostgreSQL
make docker-down # Stop PostgreSQL
make run-api # Start API server
make run-dashboard # Start dashboard
make export-json # Export benchmarking data to JSON
make clean # Remove generated data and cachesExport benchmarking data to JSON for integration with the energy-pipeline-demo frontend:
# Create and activate virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Or: make install# Export benchmarking data to JSON
make export-json
# Output: artifacts/json/building_benchmarking.json (canonical)The canonical export building_benchmarking.json provides a rich, production-ready schema:
{
"pipeline": "sustainable_building_benchmarking",
"generated_at": "2025-11-28T12:00:00Z",
"portfolio_summary": {
"total_buildings": 12,
"total_floor_area_m2": 45680,
"avg_energy_intensity_kwh_m2": 98.4,
"portfolio_co2_tons": 892.5,
"top_performer_pct": 25,
"needs_improvement_pct": 33
},
"benchmark_categories": {
"energy_intensity": {
"excellent": "< 70 kWh/m²",
"good": "70-90 kWh/m²",
"average": "90-110 kWh/m²",
"poor": "> 110 kWh/m²"
}
},
"buildings": [
{
"building_id": "BLD-001",
"name": "Main Office Tower",
"location": "Stockholm",
"floor_area_m2": 5200,
"building_type": "Office",
"year_built": 2018,
"energy_intensity_kwh_m2": 72.5,
"co2_intensity_kg_m2": 16.2,
"energy_percentile": 15,
"rating": "Excellent",
"certifications": ["LEED Gold", "BREEAM Excellent"]
}
]
}| Section | Fields | Description |
|---|---|---|
pipeline |
string | Pipeline identifier |
generated_at |
ISO 8601 UTC | Export timestamp |
portfolio_summary |
object | Portfolio-wide aggregate metrics |
benchmark_categories |
object | Rating thresholds definition |
buildings |
array | Individual building records with metrics |
After export, copy to the frontend demo:
# From workspace root
cp Sustainable-Building-Energy-Benchmarking-Pipeline/artifacts/json/building_benchmarking.json \
energy-pipeline-demo/public/data/
# Or use the sync script
cd energy-pipeline-demo && ./sync-data.shThis pipeline is designed for easy transition to production databases:
The canonical JSON schema maps directly to relational tables:
| JSON Section | Database Table | Notes |
|---|---|---|
portfolio_summary |
fact_portfolio_metrics |
Aggregate portfolio KPIs |
benchmark_categories |
dim_benchmark_thresholds |
Rating definitions |
buildings |
dim_buildings + fact_building_metrics |
Building data + metrics |
-- Dimension table for buildings
CREATE TABLE dim_buildings (
building_id VARCHAR(50) PRIMARY KEY,
name VARCHAR(255),
location VARCHAR(255),
floor_area_m2 NUMERIC,
building_type VARCHAR(100),
year_built INT
);
-- Fact table for building metrics
CREATE TABLE fact_building_metrics (
id SERIAL PRIMARY KEY,
building_id VARCHAR(50) REFERENCES dim_buildings(building_id),
energy_intensity_kwh_m2 NUMERIC,
co2_intensity_kg_m2 NUMERIC,
energy_percentile INT,
rating VARCHAR(50),
certifications JSONB,
measured_at TIMESTAMPTZ DEFAULT NOW()
);
-- Portfolio summary view
CREATE VIEW vw_portfolio_summary AS
SELECT
COUNT(*) as total_buildings,
SUM(floor_area_m2) as total_floor_area_m2,
AVG(energy_intensity_kwh_m2) as avg_energy_intensity,
SUM(co2_intensity_kg_m2 * floor_area_m2) / 1000 as portfolio_co2_tons
FROM dim_buildings b
JOIN fact_building_metrics m ON b.building_id = m.building_id;- Development: Use JSON exports +
make export-json - Staging: Add database loader, integrate with existing
docker-compose.ymlPostgreSQL - Production: Use same schema, add scheduled refresh via Airflow/Prefect
The benchmarking API (api/main.py) already supports database-backed queries and can be switched from file to DB mode.
-
Benchmarking Engine (
benchmarking/model.py):- Calculate Energy Use Intensity (EUI)
- Compare building performance against benchmarks
- Generate energy efficiency recommendations
-
FastAPI Application (
api/main.py):- RESTful API for benchmarking services
- Input validation using Pydantic models
- Health check and status endpoints
-
ETL Pipeline (
generate_sample_data.py):- Extract: Generate/load building data
- Transform: Calculate metrics and categorize performance
- Load: Save processed data in multiple formats
- To add new benchmarking metrics, extend
benchmarking/model.py - To add new API endpoints, modify
api/main.py - To add data sources, update the ETL functions in
generate_sample_data.py
This pipeline includes a Databricks notebook for enterprise-scale processing using the medallion architecture.
Raw CSV → Bronze (Delta) → Silver (Delta) → Gold (Delta) → JSON → HF Dataset
| Layer | Description | Location |
|---|---|---|
| Bronze | Raw ingested data with metadata | /FileStore/benchmarking/bronze |
| Silver | Cleaned, validated, enriched data | /FileStore/benchmarking/silver |
| Gold | Aggregated business metrics | /FileStore/benchmarking/gold |
- Create Databricks Workspace (Community Edition is free)
- Import Notebook: Upload
notebooks/benchmarking_medallion.py - Configure Cluster: Single-node, auto-terminate enabled
- Run Pipeline: Execute the notebook cells in sequence
After running the Databricks pipeline, export the Gold summary to Hugging Face:
# Option 1: From local JSON (after downloading from DBFS)
make push-gold-to-hf
# Option 2: Direct from DBFS (requires databricks-cli)
python scripts/databricks_to_hf.py \
--dbfs-path /FileStore/benchmarking/gold_summary.json \
--dataset-name shahabsalehi/building-benchmarking| Secret | Purpose |
|---|---|
HF_TOKEN |
Write access to HF Dataset |
DATABRICKS_HOST |
(Optional) Databricks workspace URL |
DATABRICKS_TOKEN |
(Optional) Databricks PAT |
The Gold layer maintains the same JSON schema as the standard make export-json output, ensuring frontend compatibility:
Databricks Gold → gold_summary.json → HF Dataset → Frontend
↓
Local Pipeline → building_benchmarking.json → HF Dataset → Frontend
- Python 3.8+
- FastAPI - Modern web framework for building APIs
- Streamlit - Interactive data dashboards
- Pandas - Data manipulation and analysis
- NumPy - Numerical computing
- Scikit-learn - Machine learning (Isolation Forest)
- SQLAlchemy - Database ORM
- PostgreSQL - Relational database (via Docker)
- Plotly - Interactive visualizations
- Pytest - Testing framework
- Uvicorn - ASGI server
- Databricks - Enterprise Spark processing (optional)
- Delta Lake - ACID transactions for medallion layers
See LICENSE file for details.
Contributions are welcome! Please see CONTRIBUTING.md for guidelines on:
- Setting up your development environment
- Running tests
- Coding standards and best practices
- Submitting pull requests
Quick start for contributors:
make install # Set up environment
make sample-data # Generate test data
make test # Run test suite- For larger benchmarking datasets, adjust generator parameters and switch API mode for pagination.
- For HVAC fault demos beyond the light slice here, use the dedicated
HVAC-Fault-Detection-with-Anomaly-Pipeline. - For Databricks workflows, the medallion notebook is portable to Unity Catalog with minimal changes.