Skip to content

Comments

Add Franklin County Board of Commissioners spider#9

Open
1992tw wants to merge 2 commits intoCity-Bureau:mainfrom
1992tw:feature/colum-franklin-boc
Open

Add Franklin County Board of Commissioners spider#9
1992tw wants to merge 2 commits intoCity-Bureau:mainfrom
1992tw:feature/colum-franklin-boc

Conversation

@1992tw
Copy link

@1992tw 1992tw commented Feb 13, 2026

What's this PR do?

This PR adds a new spider for Franklin County Board of Commissioners meetings that scrapes data from their calendar API. The spider uses a two-step pattern: POST to the calendar endpoint for meeting items, then GET each detail endpoint for full meeting info (location, agenda/minutes links, descriptions). It covers ~14 months of meetings (2 months back through 12 months forward).

Why are we doing this?

This addresses the need to track Franklin County Board of Commissioners meetings as part of the City Scrapers project. This resolves the ticket for Franklin County Board of Commissioners (Batch 3, Agency ID 1872).

Steps to manually test

  1. Ensure the project is installed:
pipenv sync --dev
  1. Activate the virtual env and enter the pipenv shell:
pipenv shell
  1. Run the spider:
scrapy crawl colum_franklin_boc -O test_output.csv
  1. Monitor the output and ensure no errors are raised.
  2. Inspect test_output.csv to ensure the data looks valid (should contain General Sessions, Briefing Sessions, etc.).
  3. Ensure all tests pass:
pytest tests/test_colum_franklin_boc.py -v

Are there any smells or added technical debt to note?

No major debt. There is a commented-out MEETING_NAME_RE filter (lines 39, 84-85) that could be cleaned up or re-enabled if non-meeting calendar items (holidays, community events) need to be excluded. The spider sets ROBOTSTXT_OBEY = False because the API endpoints are not crawlable otherwise. Test suite covers 24 cases including General Session, Briefing Session, invalid responses, and parametrized field checks.

@coderabbitai
Copy link

coderabbitai bot commented Feb 13, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @1992tw, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new web scraping spider designed to collect and standardize meeting data for the Franklin County Board of Commissioners. The spider leverages a multi-stage API interaction to gather detailed meeting information, ensuring accurate capture of event specifics while intelligently filtering out irrelevant calendar entries. This enhancement expands the project's data coverage by integrating a new public body's meeting schedule.

Highlights

  • New Spider Implementation: Introduced a new Scrapy spider, ColumFranklinBocSpider, to scrape meeting information from the Franklin County Board of Commissioners website.
  • Two-Step API Interaction: The spider employs a two-step process, first making POST requests to a calendar API to retrieve meeting IDs, then sending GET requests to a detail endpoint for each meeting to extract comprehensive information.
  • Event Filtering Logic: Implemented regular expression-based filtering to distinguish actual board meetings (Session, Hearing, Meeting) from other calendar events like holidays or community activities.
  • Robust Data Extraction: Includes custom parsing methods for extracting meeting start times, descriptions, locations (handling varied formats), and document links (agendas, minutes) from the HTML responses.
  • Comprehensive Testing: Added a dedicated test file (tests/test_colum_franklin_boc.py) along with multiple JSON fixture files to ensure the new spider functions correctly, including tests for filtering, data extraction, and handling invalid responses.
  • Dependency Lock File Update: The Pipfile.lock was added, reflecting the updated or new Python package dependencies required for the project, likely due to the new spider.
Changelog
  • Pipfile.lock
    • Added a new dependency lock file.
  • city_scrapers/spiders/colum_franklin_boc.py
    • Added a new Scrapy spider for the Franklin County Board of Commissioners.
  • tests/files/colum_franklin_boc.json
    • Added a JSON fixture for calendar API response testing.
  • tests/files/colum_franklin_boc_detail.json
    • Added a JSON fixture for general session detail testing.
  • tests/files/colum_franklin_boc_detail_briefing.json
    • Added a JSON fixture for briefing session detail testing.
  • tests/files/colum_franklin_boc_detail_invalid.json
    • Added a JSON fixture for invalid detail response testing.
  • tests/test_colum_franklin_boc.py
    • Added unit tests for the new Franklin County Board of Commissioners spider.
Activity
  • No human activity (comments, reviews) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new spider for the Franklin County Board of Commissioners. The spider implementation is well-structured, using a two-step API scraping process, and is accompanied by a comprehensive set of tests. However, I've identified a critical issue with the Pipfile.lock file, which appears to be corrupted with non-existent package versions, and this will prevent dependencies from being installed. Additionally, I've provided a couple of suggestions to improve the robustness of the spider's parsing logic, aligning it more closely with Scrapy best practices.

Comment on lines +146 to +147
desc = sel.css("div.meeting-container > p::text").get()
return desc.strip() if desc else ""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation using css(...).get() is a bit fragile. It will only extract the first text node from the first <p> tag. If the <p> tag contains other tags like <br> or <strong>, their text content will be missed. Using xpath('string(...)') is more robust as it concatenates all descendant text nodes of the selected element.

Suggested change
desc = sel.css("div.meeting-container > p::text").get()
return desc.strip() if desc else ""
desc = sel.xpath("string(//div[@class='meeting-container']/p[1])").get()
return desc.strip() if desc else ""

Comment on lines +183 to +193
def _parse_links(self, sel):
"""Extract document links (agendas, minutes) from meeting detail HTML."""
links = []
for doc_div in sel.css("div.meeting-document"):
title = doc_div.css("h3::text").get("").strip()
href = doc_div.css("div.alt-formats a::attr(href)").get()
if href:
if not href.startswith("http"):
href = "https://www.franklincountyohio.gov" + href
links.append({"href": href, "title": title or "Document"})
return links

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To make URL joining more robust, I suggest passing the response object to this method and using response.urljoin(). This is the standard Scrapy way to handle relative URLs and is more reliable than string concatenation.

You would also need to update the call in parse_detail at line 126 from self._parse_links(sel) to self._parse_links(response, sel).

Suggested change
def _parse_links(self, sel):
"""Extract document links (agendas, minutes) from meeting detail HTML."""
links = []
for doc_div in sel.css("div.meeting-document"):
title = doc_div.css("h3::text").get("").strip()
href = doc_div.css("div.alt-formats a::attr(href)").get()
if href:
if not href.startswith("http"):
href = "https://www.franklincountyohio.gov" + href
links.append({"href": href, "title": title or "Document"})
return links
def _parse_links(self, response, sel):
"""Extract document links (agendas, minutes) from meeting detail HTML."""
links = []
for doc_div in sel.css("div.meeting-document"):
title = doc_div.css("h3::text").get("").strip()
href = doc_div.css("div.alt-formats a::attr(href)").get()
if href:
href = response.urljoin(href)
links.append({"href": href, "title": title or "Document"})
return links

@1992tw 1992tw force-pushed the feature/colum-franklin-boc branch from 7381a93 to abccfb2 Compare February 13, 2026 08:51
Copy link
Collaborator

@msrezaie msrezaie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from tweaking the location parsing, it looks good.

@msrezaie msrezaie marked this pull request as ready for review February 18, 2026 16:06
Copy link
Collaborator

@msrezaie msrezaie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants