Dry Bridge Solar Data ETL

An ETL (Extract, Transform, Load) pipeline for collecting and processing solar production data from Dry Bridge Solar Farm. This pipeline extracts data via web scraping, transforms it into standardized energy metrics, and loads it into a PostgreSQL database for analysis

Installation

This project uses modern Python tooling for dependency management and development. Choose one of the installation methods below:

uv is a fast Python package installer and resolver.

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install the project and its dependencies
uv sync

# Activate the virtual environment
source .venv/bin/activate

Configuration

Environment Setup

Copy the example environment file:

cp env.example .env

The env.example file contains credentials to connect to the docker development database. When you're ready to upload data to production, edit .env with the production database credentials

DB_HOST=localhost
DB_PORT=5432
DB_NAME=dry_bridge_db
DB_USER=your_username
DB_PASSWORD=your_password

Database Setup

The database is setup and managed by the DBA team. The Office of Sustainability and Resiliency is the data owner

Usage

The application provides a CLI interface with three main commands: extract, load, and refresh.

Docker Deployment

For production deployment in Kubernetes as a CronJob, see docs/DOCKER.md

Extracting Data

Extract solar data from the web dashboard:

# Extract all available data (from July 1, 2023 to now) into ./output
dry-bridge extract

# Extract data for a specific date range
dry-bridge extract 2023-08-01 2024-08-01

# Extract to a custom output directory
dry-bridge extract all now ./custom_output

Re-run the extract command to continue - it automatically skips existing files.

Loading Data

Load extracted data into the database:

# Load both raw and processed data
dry-bridge load

# Load only raw data
dry-bridge load --no-transform

# Load only processed data  
dry-bridge load --no-raw

# Load from custom output directory
dry-bridge load ./custom_output

Refreshing Data

Fill gaps and add new data (recommended for scheduled jobs):

# Query database for missing timestamps and fill gaps
dry-bridge refresh

The refresh command is designed to run on a schedule (e.g., every 15 minutes via cron or Kubernetes CronJob). It automatically:

Detects missing 15-minute intervals in the raw data table
Scrapes only the necessary dates to fill gaps
Adds new data as it becomes available
Retries failed fetches on the next run (no retry limits)

Complete ETL Pipeline

For a complete initial ETL run:

# Extract all data and load into database
dry-bridge extract && dry-bridge load

Database Schema

The application creates two tables:

`dry_bridge_solar_processed`

timestamp (TIMESTAMP, PRIMARY KEY): UTC timestamp
kw (FLOAT): Power in kilowatts
kwh (FLOAT): Energy in kilowatt-hours
mmbtu (FLOAT): Energy in million British thermal units
mtco2e (FLOAT): Carbon dioxide equivalent in metric tons avoided

`dry_bridge_solar_raw`

timestamp (TEXT): Original timestamp string
name (TEXT): Measurement source name
type (TEXT): Measurement type
units (TEXT): Units of measurement
value (FLOAT): Raw measurement value

Note: The raw table is the source of truth for what data has been fetched from the API.

Database Migration

The dry_bridge_fetch_attempts table is no longer used and can be safely dropped:

DROP TABLE IF EXISTS dry_bridge_fetch_attempts;

Data Export

Export processed data to CSV:

\copy (SELECT * FROM dry_bridge_solar_processed ORDER BY timestamp) TO 'solar_production_export.csv' WITH CSV HEADER;

Development

Development Setup

# Install with development dependencies
uv sync

Code Quality

This project uses several tools for code quality:

# Linting with ruff
uv run ruff check .

# Code formatting with ruff
uv run ruff format .

# Type checking
uv run mypy src/

# Dependency checking
uv run deptry .

Testing

Run the test suite:

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=dry_bridge

# Run specific test file
uv run pytest tests/transform_test.py

Note: Tests require a test database named test_dry_bridge_db with the same user credentials.

Project Structure

src/dry_bridge/
├── __init__.py          # Package initialization and documentation
├── __main__.py          # CLI application entry point
├── scrape.py           # Web scraping functionality
├── transform.py        # Data transformation and calculations
└── load.py             # Database operations and loading

tests/
├── data/               # Test data files
└── transform_test.py   # Unit tests

# Configuration files
├── pyproject.toml      # Project metadata and dependencies
├── mise.toml          # Tool version management
├── uv.lock            # Locked dependency versions
└── .env.example       # Environment variable template

Troubleshooting

Common Issues

Authentication Errors: The scraper relies on session cookies from the dashboard. If you encounter authentication issues, the dashboard may have changed its authentication mechanism.
Database Connection: Ensure PostgreSQL is running and your credentials in .env are correct.
Missing Data: Some days may have no data available from the solar farm. This is normal and will be logged as failed downloads.
Timezone Issues: The application handles Eastern time to UTC conversion automatically, accounting for daylight saving time transitions.

Logging

The application uses Python's standard logging. To increase verbosity:

import logging
logging.basicConfig(level=logging.INFO)

License

This project is for internal use with the Dry Bridge solar farm data monitoring.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
docs		docs
src/dry_bridge		src/dry_bridge
test_output		test_output
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
env.example		env.example
mise.toml		mise.toml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dry Bridge Solar Data ETL

Installation

Configuration

Environment Setup

Database Setup

Usage

Docker Deployment

Extracting Data

Loading Data

Refreshing Data

Complete ETL Pipeline

Database Schema

`dry_bridge_solar_processed`

`dry_bridge_solar_raw`

Database Migration

Data Export

Development

Development Setup

Code Quality

Testing

Project Structure

Troubleshooting

Common Issues

Logging

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

brown-ccv/s12y-dry-bridge-scrape

Folders and files

Latest commit

History

Repository files navigation

Dry Bridge Solar Data ETL

Installation

Configuration

Environment Setup

Database Setup

Usage

Docker Deployment

Extracting Data

Loading Data

Refreshing Data

Complete ETL Pipeline

Database Schema

dry_bridge_solar_processed

dry_bridge_solar_raw

Database Migration

Data Export

Development

Development Setup

Code Quality

Testing

Project Structure

Troubleshooting

Common Issues

Logging

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

`dry_bridge_solar_processed`

`dry_bridge_solar_raw`

Packages