Skip to content

Eliminate duplicate inner_text() calls in Facebook scrapers#42

Closed
Copilot wants to merge 23 commits intomainfrom
copilot/sub-pr-30
Closed

Eliminate duplicate inner_text() calls in Facebook scrapers#42
Copilot wants to merge 23 commits intomainfrom
copilot/sub-pr-30

Conversation

Copy link
Contributor

Copilot AI commented Feb 22, 2026

is_facebook_post_from_today() was calling article.inner_text() internally, then the scraper called it again immediately after — two cross-process Playwright DOM calls per post when only one is needed.

Changes

  • utils.py: Added optional article_text parameter to is_facebook_post_from_today(); when supplied, skips the internal inner_text() call
  • leons.py / bigdeal.py: Fetch article.inner_text() once per post into text_content, validate it, then pass it via article_text=text_content to the date-check helper
# Before: two inner_text() calls per post
if not is_facebook_post_from_today(article, self.logger):  # calls inner_text() internally
    continue
text_content = article.inner_text()  # called again

# After: one inner_text() call per post
text_content = article.inner_text()
if not is_facebook_post_from_today(article, self.logger, article_text=text_content):
    continue
  • tests/test_utils.py: Added test_pre_fetched_text_is_used_without_calling_inner_text to assert inner_text() is never invoked on the article mock when article_text is provided

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

jjlauterbach and others added 22 commits February 18, 2026 22:44
…32)

* Initial plan

* fix: Cache Playwright inner_text() to avoid duplicate cross-browser calls

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>
* Initial plan

* feat: Add retry logic with exponential backoff to Big Deal scraper

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>
* Initial plan

* Add See more expansion and scroll logic to BigDeal scraper

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>
* Initial plan

* Add HTML entity decoding to Big Deal Burgers scraper

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>
…aper (#33)

* Initial plan

* fix: Add page scroll and See more expansion to Big Deal scraper

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

* fix: Merge bigdeal branch changes - add _sanitize_flavor_name and fix formatting

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…al scraper (#39)

* Initial plan

* fix: Guard inner_text() against stale element errors; fix test mocks

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>
…er-article (#38)

* Initial plan

* Mirror Leon's pattern in bigdeal scraper: collect articles, filter top-level, expand See more per-article

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

* Integrate new test from base branch to resolve merge conflict

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

* Fix test conflicts: keep bigdeal inline mocks, only add helper and update See more test

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

* Move query_selector mock setup to avoid hunk conflict with bigdeal's evaluate addition

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Initial plan

* fix: apply black formatting to fix CI build

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

* fix: reformat with black 24.4.2 to match pre-commit config

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>
…rapers (#41)

* Initial plan

* Reduce duplicate inner_text() calls in Facebook scrapers

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

* Fix black formatting in leons.py and bigdeal.py to pass CI

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

* Fix formatting: use black 24.4.2 (pinned CI version) to reformat files

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>
Co-authored-by: Jeff Lauterbach <jjlauterbach@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 22, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Copilot AI changed the title [WIP] WIP Address feedback on Big Deal Burgers scraper implementation Eliminate duplicate inner_text() calls in Facebook scrapers Feb 22, 2026
Copilot AI requested a review from jjlauterbach February 22, 2026 01:57
Base automatically changed from bigdeal to main February 23, 2026 03:13
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Feb 23, 2026

Deploying daily-custard with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2ed3029
Status: ✅  Deploy successful!
Preview URL: https://7d4e29d5.daily-custard.pages.dev
Branch Preview URL: https://copilot-sub-pr-30.daily-custard.pages.dev

View logs

@jjlauterbach jjlauterbach deleted the copilot/sub-pr-30 branch February 25, 2026 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants