A web scraping application that collects daily flavor information from frozen custard shops in the Milwaukee area and displays them in a modern web UI.
- Technology Stack
- Supported Locations
- Features
- Quick Start
- Testing & Quality
- Ecosystem Testing
- Scraper Architecture
- Development Workflow
- Docker Deployment
- Troubleshooting
- Contributing
- Language: Python 3.12+
- Web Scraping:
- BeautifulSoup (static sites)
- Selenium with ChromeDriver (dynamic sites)
- Playwright (Facebook pages)
- Testing: pytest with unittest, comprehensive mocking
- Code Quality: black, flake8, isort, autoflake, pre-commit hooks
- CI/CD: GitHub Actions with automated testing and deployment
- Deployment: Docker, Docker Compose, Cloudflare Pages
- Package Management: pip with pyproject.toml
- Bubba's Frozen Custard
- Culver's
- Gilles
- Kopp's Frozen Custard
- Murf's Frozen Custard
- Oscar's Frozen Custard
- Big Deal Burgers
- Robust scrapers for each shop using multiple strategies:
- BeautifulSoup for static sites (Kopp's, Bubba's, Murf's)
- Selenium for dynamic sites (Oscar's, Culver's)
- Playwright for Facebook pages (Big Deal Burgers)
- Base scraper architecture with common functionality and error handling
- Location registry (locations.yaml) for easy configuration
- Modern UI with date-anchored cards and interactive map
- Comprehensive testing:
- Unit tests
- Integration tests
- Ecosystem tests against live sites
- CI/CD with linting, formatting, security, and coverage checks
- Pre-commit hooks for code quality
# Clone the repository
git clone <your-repo-url>
cd daily-flavors-app
# Run with Docker Compose
docker compose up
# Access the app in your browser
open http://localhost:8080# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -e .[dev]
# Generate the static flavors data
python scripts/generate_flavors.py
# Serve the static site
python -m http.server --directory static 8080-
Run all unit tests:
pytest tests/ -v
-
Run tests for a specific scraper:
pytest tests/test_bigdeal_scraper.py -v
-
Run with coverage:
pytest --cov=app tests/
-
Run pre-commit hooks manually:
pre-commit run --all-files
-
Install pre-commit hooks locally:
pre-commit install
-
Format code:
black app/ tests/ isort app/ tests/
-
Lint code:
flake8 app/ tests/
All checks (flake8, black, isort, autoflake, pip-audit) run automatically in CI and pre-commit hooks.
A dedicated ecosystem test suite checks that all scrapers are working against live sites. This is run daily via GitHub Actions and can be run locally:
pytest --ecosystemImportant Notes:
- The ecosystem test is skipped by default unless you pass the
--ecosystemflag - Tests run against live websites - network connection required
- Each scraper must return at least one valid flavor with all required fields
- The daily workflow alerts you if a site changes or breaks scraping
All scrapers inherit from BaseScraper which provides common functionality:
-
Create scraper class:
from app.scrapers.scraper_base import BaseScraper class MyNewScraper(BaseScraper): def __init__(self): super().__init__("brand_key") # Must match locations.yaml def scrape(self): self.log_start() if not self.locations: self.log_error("No locations found") return [] results = [] # Scraping logic here self.log_complete(len(results)) return results
-
Add locations to
app/locations.yaml:brand_key: - id: unique-location-id name: "Display Name" brand: BrandName address: "123 Main St" lat: 43.0 lng: -88.0 url: "https://website.com" facebook: "https://facebook.com/page" # Optional enabled: true
-
Write comprehensive unit tests (
tests/test_brand_scraper.py):- Test flavor extraction with various patterns
- Mock external services (no real network calls)
- Cover edge cases and error handling
- Aim for 40+ test cases
-
Add to ecosystem test (
tests/test_scraper_ecosystem.py) -
Test manually with real data before committing
self.create_flavor()- Creates standardized flavor dictself.log_start(),self.log_complete(),self.log_error()- Logging helpersself.locations- Auto-loaded from locations.yaml- Built-in session management and user agents
- GitHub Actions workflow runs on PRs and main branch:
- Linting, formatting, security audit, and test coverage
This project uses Cloudflare Pages for hosting. Preview deployments are automatically created for each pull request and cleaned up when the PR is merged or closed.
To enable automatic cleanup of preview deployments, add these secrets to your GitHub repository (Settings → Secrets and variables → Actions):
-
CLOUDFLARE_API_TOKEN: Create an API token in Cloudflare Dashboard
- Go to My Profile → API Tokens → Create Token
- Use "Edit Cloudflare Pages" template or custom token with:
- Account → Cloudflare Pages → Edit permissions
-
CLOUDFLARE_ACCOUNT_ID: Found in Cloudflare Dashboard
- Click on any site → Copy your Account ID from the right sidebar
-
CLOUDFLARE_PROJECT_NAME: Your Cloudflare Pages project name
- Example:
daily-flavors-app
- Example:
The cleanup workflow automatically runs when pull requests are closed (merged or abandoned) and removes all associated preview deployments.
Static content is served from the static/ directory. The daily data file is generated at static/data/flavors.json by the scraper script.
# Build the container (uses pyproject.toml, not requirements.txt)
docker build -t daily-custard-app .
# Run the container (serves static site on port 8000 inside container)
docker run -p 8080:8000 daily-custard-app
# Or use Docker Compose (recommended for local dev)
docker compose up- The Dockerfile now uses
pyproject.tomlfor dependency management. You do not needrequirements.txt. - The image is automatically tagged with the release version and
latestin CI/CD.
-
ChromeDriver not found
# Install ChromeDriver manually brew install chromedriver # macOS
-
Module import errors
# Ensure all dependencies are installed pip install .[dev] -
Encoding issues
- Enable debug logging to see detailed encoding attempts
- The scrapers include multiple encoding fallback strategies
-
Site structure changes
- Enable debug logging to inspect HTML structure
- Update selectors in the relevant scraper function
-
Facebook scraper issues (Big Deal Burgers)
- Posts may be truncated - scraper automatically clicks "See more" buttons
- Flavor announcements must match specific patterns (e.g., "is our flavor of the day")
- Check up to 10 recent posts for flavor announcements
-
Playwright browser issues
# Install Playwright browsers playwright install chromium
See the Scraper Architecture section above for detailed instructions.
Quick checklist:
- ✅ Create scraper class inheriting from
BaseScraper - ✅ Add location(s) to
locations.yaml - ✅ Write comprehensive unit tests
- ✅ Add to ecosystem test
- ✅ Update README with new location
- ✅ Test manually with real data
- ✅ Ensure all tests pass and code is formatted
-
Create a feature branch:
git checkout -b feature/my-changes
-
Make your changes following project conventions
-
Run tests and checks:
pytest tests/ -v # Run unit tests pytest --ecosystem # Test against live sites black app/ tests/ # Format code flake8 app/ tests/ # Lint code pre-commit run --all-files # Run all hooks
-
Commit with descriptive messages:
git commit -m "Add Big Deal Burgers scraper with Facebook integration" -
Push and create PR:
git push origin feature/my-changes
This project includes GitHub Copilot instructions at .github/copilot-instructions.md to help AI-assisted development follow project conventions.
All contributions are welcome! Please follow these guidelines:
- Code Quality: All code must pass linting (flake8), formatting (black, isort), and security checks (pip-audit)
- Testing: Write comprehensive unit tests (40+ tests for new scrapers) and ensure all tests pass
- Documentation: Update README and docstrings for any new features
- Feature Branches: All changes must go through pull requests - no direct commits to main
- Commit Messages: Use descriptive commit messages explaining what and why
- Pre-commit Hooks: Install and run pre-commit hooks before committing
Run pre-commit install to automatically check code quality before each commit.