Skip to content

fix: Guard article.inner_text() against stale element errors in BigDeal scraper#39

Merged
jjlauterbach merged 2 commits intobigdealfrom
copilot/sub-pr-30-again
Feb 21, 2026
Merged

fix: Guard article.inner_text() against stale element errors in BigDeal scraper#39
jjlauterbach merged 2 commits intobigdealfrom
copilot/sub-pr-30-again

Conversation

Copy link
Contributor

Copilot AI commented Feb 21, 2026

A detached/stale Playwright element during article.inner_text() would abort the entire scrape attempt with no retry, since the outer wrapper only retries on PlaywrightTimeoutError.

Changes

  • bigdeal.py: Wrap article.inner_text() in try/except PlaywrightError + except Exception with continue, matching Leon's scraper pattern — a bad element skips to the next post instead of aborting
try:
    text_content = article.inner_text()
except PlaywrightError as e:
    self.logger.debug(f"Failed to extract text from post {i}: {e}")
    continue
except Exception as e:
    self.logger.debug(f"Unexpected error extracting text from post {i}: {e}")
    continue
  • tests/test_bigdeal_scraper.py: Fixed 5 broken tests where mock articles lacked evaluate.return_value = False — the comment-filtering step calls article.evaluate(...), and an unset Mock() returns a truthy value, causing all articles to be silently dropped as nested comments

  • tests/test_bigdeal_scraper.py: Added test_scrape_facebook_inner_text_error_continues_to_next_post — verifies that a PlaywrightError on one article's inner_text() skips that element and returns the flavor from the next valid post


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Co-authored-by: jjlauterbach <1447549+jjlauterbach@users.noreply.github.com>
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Feb 21, 2026

Deploying daily-custard with  Cloudflare Pages  Cloudflare Pages

Latest commit: 0d1dbe6
Status: ✅  Deploy successful!
Preview URL: https://470bccd1.daily-custard.pages.dev
Branch Preview URL: https://copilot-sub-pr-30-again.daily-custard.pages.dev

View logs

Copy link
Contributor Author

Copilot AI commented Feb 21, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Copilot AI changed the title [WIP] Refactor Big Deal Burgers scraper based on feedback fix: Guard article.inner_text() against stale element errors in BigDeal scraper Feb 21, 2026
Copilot AI requested a review from jjlauterbach February 21, 2026 18:33
@jjlauterbach jjlauterbach marked this pull request as ready for review February 21, 2026 18:34
Copilot AI review requested due to automatic review settings February 21, 2026 18:34
@jjlauterbach jjlauterbach merged commit 8901daf into bigdeal Feb 21, 2026
1 check passed
@jjlauterbach jjlauterbach deleted the copilot/sub-pr-30-again branch February 21, 2026 18:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the resilience of the Big Deal Facebook Playwright scraper by preventing a single stale/detached post element from aborting an entire scrape, and updates unit tests to reflect the scraper’s nested-comment filtering behavior.

Changes:

  • Guard article.inner_text() in the BigDeal scraper so Playwright DOM errors skip the bad post and continue scanning.
  • Fix existing BigDeal scraper tests by setting article.evaluate.return_value = False so mocked posts aren’t filtered out as nested comments.
  • Add a regression test ensuring an inner_text() Playwright error on one post still allows a later valid post to be returned.

Reviewed changes

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

File Description
app/scrapers/bigdeal.py Handles inner_text() Playwright errors per-article to avoid aborting the scrape attempt on a single bad element.
tests/test_bigdeal_scraper.py Updates mocks to pass nested-article filtering and adds a test covering inner_text() PlaywrightError continuation behavior.

mock_article2.inner_text.return_value = "Today's flavor is Vanilla!"
mock_article2.evaluate.return_value = False

mock_page.query_selector_all.return_value = [mock_article1, mock_article2]
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test, mock_page.query_selector_all.return_value is used for all selectors, but the scraper calls query_selector_all('text="See more"') before query_selector_all('[role="article"]'). With the current setup the article mocks can be treated as "See more" buttons (calling is_visible()/click()), which makes the test less representative and could hide regressions. Consider routing query_selector_all via side_effect (as done in test_scrape_facebook_page_expands_see_more_buttons) to return [] for the "See more" selector and the articles list for [role="article"].

Suggested change
mock_page.query_selector_all.return_value = [mock_article1, mock_article2]
def query_selector_all_side_effect(selector: str):
if selector == 'text="See more"':
# No "See more" buttons in this scenario
return []
if selector == '[role="article"]':
# Return the mocked posts
return [mock_article1, mock_article2]
return []
mock_page.query_selector_all.side_effect = query_selector_all_side_effect

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants