Add Franklin County Board of Commissioners spider by 1992tw · Pull Request #9 · City-Bureau/city-scrapers-coloh

1992tw · 2026-02-13T08:45:32Z

What's this PR do?

This PR adds a new spider for Franklin County Board of Commissioners meetings that scrapes data from their calendar API. The spider uses a two-step pattern: POST to the calendar endpoint for meeting items, then GET each detail endpoint for full meeting info (location, agenda/minutes links, descriptions). It covers ~14 months of meetings (2 months back through 12 months forward).

Why are we doing this?

This addresses the need to track Franklin County Board of Commissioners meetings as part of the City Scrapers project. This resolves the ticket for Franklin County Board of Commissioners (Batch 3, Agency ID 1872).

Steps to manually test

Ensure the project is installed:

pipenv sync --dev

Activate the virtual env and enter the pipenv shell:

pipenv shell

Run the spider:

scrapy crawl colum_franklin_boc -O test_output.csv

Monitor the output and ensure no errors are raised.
Inspect test_output.csv to ensure the data looks valid (should contain General Sessions, Briefing Sessions, etc.).
Ensure all tests pass:

pytest tests/test_colum_franklin_boc.py -v

Are there any smells or added technical debt to note?

No major debt. There is a commented-out MEETING_NAME_RE filter (lines 39, 84-85) that could be cleaned up or re-enabled if non-meeting calendar items (holidays, community events) need to be excluded. The spider sets ROBOTSTXT_OBEY = False because the API endpoints are not crawlable otherwise. Test suite covers 24 cases including General Session, Briefing Session, invalid responses, and parametrized field checks.

coderabbitai · 2026-02-13T08:45:45Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-13T08:45:55Z

Summary of Changes

Hello @1992tw, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new web scraping spider designed to collect and standardize meeting data for the Franklin County Board of Commissioners. The spider leverages a multi-stage API interaction to gather detailed meeting information, ensuring accurate capture of event specifics while intelligently filtering out irrelevant calendar entries. This enhancement expands the project's data coverage by integrating a new public body's meeting schedule.

Highlights

New Spider Implementation: Introduced a new Scrapy spider, ColumFranklinBocSpider, to scrape meeting information from the Franklin County Board of Commissioners website.
Two-Step API Interaction: The spider employs a two-step process, first making POST requests to a calendar API to retrieve meeting IDs, then sending GET requests to a detail endpoint for each meeting to extract comprehensive information.
Event Filtering Logic: Implemented regular expression-based filtering to distinguish actual board meetings (Session, Hearing, Meeting) from other calendar events like holidays or community activities.
Robust Data Extraction: Includes custom parsing methods for extracting meeting start times, descriptions, locations (handling varied formats), and document links (agendas, minutes) from the HTML responses.
Comprehensive Testing: Added a dedicated test file (tests/test_colum_franklin_boc.py) along with multiple JSON fixture files to ensure the new spider functions correctly, including tests for filtering, data extraction, and handling invalid responses.
Dependency Lock File Update: The Pipfile.lock was added, reflecting the updated or new Python package dependencies required for the project, likely due to the new spider.

Changelog

Pipfile.lock
- Added a new dependency lock file.
city_scrapers/spiders/colum_franklin_boc.py
- Added a new Scrapy spider for the Franklin County Board of Commissioners.
tests/files/colum_franklin_boc.json
- Added a JSON fixture for calendar API response testing.
tests/files/colum_franklin_boc_detail.json
- Added a JSON fixture for general session detail testing.
tests/files/colum_franklin_boc_detail_briefing.json
- Added a JSON fixture for briefing session detail testing.
tests/files/colum_franklin_boc_detail_invalid.json
- Added a JSON fixture for invalid detail response testing.
tests/test_colum_franklin_boc.py
- Added unit tests for the new Franklin County Board of Commissioners spider.

Activity

No human activity (comments, reviews) has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new spider for the Franklin County Board of Commissioners. The spider implementation is well-structured, using a two-step API scraping process, and is accompanied by a comprehensive set of tests. However, I've identified a critical issue with the Pipfile.lock file, which appears to be corrupted with non-existent package versions, and this will prevent dependencies from being installed. Additionally, I've provided a couple of suggestions to improve the robustness of the spider's parsing logic, aligning it more closely with Scrapy best practices.

Pipfile.lock

gemini-code-assist · 2026-02-13T08:48:07Z

city_scrapers/spiders/colum_franklin_boc.py

+        desc = sel.css("div.meeting-container > p::text").get()
+        return desc.strip() if desc else ""


The current implementation using css(...).get() is a bit fragile. It will only extract the first text node from the first <p> tag. If the <p> tag contains other tags like <br> or <strong>, their text content will be missed. Using xpath('string(...)') is more robust as it concatenates all descendant text nodes of the selected element.

Suggested change

desc = sel.css("div.meeting-container > p::text").get()

return desc.strip() if desc else ""

desc = sel.xpath("string(//div[@class='meeting-container']/p[1])").get()

return desc.strip() if desc else ""

gemini-code-assist · 2026-02-13T08:48:07Z

city_scrapers/spiders/colum_franklin_boc.py

+    def _parse_links(self, sel):
+        """Extract document links (agendas, minutes) from meeting detail HTML."""
+        links = []
+        for doc_div in sel.css("div.meeting-document"):
+            title = doc_div.css("h3::text").get("").strip()
+            href = doc_div.css("div.alt-formats a::attr(href)").get()
+            if href:
+                if not href.startswith("http"):
+                    href = "https://www.franklincountyohio.gov" + href
+                links.append({"href": href, "title": title or "Document"})
+        return links


To make URL joining more robust, I suggest passing the response object to this method and using response.urljoin(). This is the standard Scrapy way to handle relative URLs and is more reliable than string concatenation.

You would also need to update the call in parse_detail at line 126 from self._parse_links(sel) to self._parse_links(response, sel).

Suggested change

def _parse_links(self, sel):

"""Extract document links (agendas, minutes) from meeting detail HTML."""

links = []

for doc_div in sel.css("div.meeting-document"):

title = doc_div.css("h3::text").get("").strip()

href = doc_div.css("div.alt-formats a::attr(href)").get()

if href:

if not href.startswith("http"):

href = "https://www.franklincountyohio.gov" + href

links.append({"href": href, "title": title or "Document"})

return links

def _parse_links(self, response, sel):

"""Extract document links (agendas, minutes) from meeting detail HTML."""

links = []

for doc_div in sel.css("div.meeting-document"):

title = doc_div.css("h3::text").get("").strip()

href = doc_div.css("div.alt-formats a::attr(href)").get()

if href:

href = response.urljoin(href)

links.append({"href": href, "title": title or "Document"})

return links

msrezaie

Aside from tweaking the location parsing, it looks good.

city_scrapers/spiders/colum_franklin_boc.py

msrezaie

Looks good

gemini-code-assist bot reviewed Feb 13, 2026

View reviewed changes

Add Franklin County Board of Commissioners spider

abccfb2

1992tw force-pushed the feature/colum-franklin-boc branch from 7381a93 to abccfb2 Compare February 13, 2026 08:51

msrezaie requested changes Feb 17, 2026

View reviewed changes

city_scrapers/spiders/colum_franklin_boc.py Show resolved Hide resolved

msrezaie marked this pull request as ready for review February 18, 2026 16:06

Fix location parsing to separate room/building info from street address

8d81c90

msrezaie approved these changes Feb 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Add Franklin County Board of Commissioners spider#9

Add Franklin County Board of Commissioners spider#9
1992tw wants to merge 2 commits intoCity-Bureau:mainfrom
1992tw:feature/colum-franklin-boc

1992tw commented Feb 13, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 13, 2026 •

edited

Loading

Review skipped

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Feb 13, 2026

Uh oh!

gemini-code-assist bot Feb 13, 2026

Uh oh!

msrezaie left a comment

Uh oh!

Uh oh!

msrezaie left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		desc = sel.css("div.meeting-container > p::text").get()
		return desc.strip() if desc else ""

Uh oh!

Comments

Conversation

1992tw commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's this PR do?

Why are we doing this?

Steps to manually test

Are there any smells or added technical debt to note?

Uh oh!

coderabbitai bot commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist bot commented Feb 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

msrezaie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

msrezaie left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1992tw commented Feb 13, 2026 •

edited

Loading

coderabbitai bot commented Feb 13, 2026 •

edited

Loading