fix(tasks): handle GitHub app rate limits and improve notification debouncing#625
fix(tasks): handle GitHub app rate limits and improve notification debouncing#625seer-by-sentry[bot] wants to merge 1 commit intomainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #625 +/- ##
========================================
Coverage 93.35% 93.36%
========================================
Files 1292 1293 +1
Lines 47128 47281 +153
Branches 1567 1573 +6
========================================
+ Hits 43996 44143 +147
- Misses 2823 2829 +6
Partials 309 309
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
@sentry review |
apps/worker/tasks/base.py
Outdated
| except MaxRetriesExceededError: | ||
| # Log checkpoint if flows have begun | ||
| if UploadFlow.has_begun(): | ||
| UploadFlow.log(UploadFlow.UNCAUGHT_RETRY_EXCEPTION) | ||
| if TestResultsFlow.has_begun(): | ||
| TestResultsFlow.log(TestResultsFlow.UNCAUGHT_RETRY_EXCEPTION) | ||
| # Return None to indicate task failure after max retries exceeded | ||
| return None | ||
| except LockRetryLimitExceededError: |
There was a problem hiding this comment.
I am absolutely not happy with these two exceptions being named so similar, but they have different purposes and can't be combined, since one is from Celery.
8a412bd to
53e7e2d
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
…bouncing Add notification debouncing with Redis fencing tokens and improve lock handling across notification tasks. Handle GitHub app rate limits by retrying with appropriate delays from the API response. Changes: - Add NotificationDebouncer with fencing tokens to coalesce concurrent notifications - Standardize lock acquisition via notification_lock helper - Add LockAcquisitionLimitError and handle_lock_retry for consistent retry handling - Handle NoConfiguredAppsAvailable in BundleAnalysisProcessorTask - Use earliest_retry_after_seconds from exception or fall back to next hour - Add NOTIF_LOCK_ERROR checkpoint to TestResultsFlow for observability - Add _partition_apps_by_rate_limit_status for improved rate limit handling Fixes: - WORKER-WJ5 - NoConfiguredAppsAvailable in bundle analysis - WORKER-WJ8 - "Not retrying lock acquisition - max retries exceeded" - WORKER-WJA - LockNotOwnedError from lock timeout - WORKER-WJ4 - LockError in BundleAnalysisProcessor - WORKER-WJ3 - Bundle analysis processor exceeded max retries
53e7e2d to
5cba3e8
Compare
Fixes WORKER-WJ5
Also addresses:
Summary
The core issue was that persistent
NoConfiguredAppsAvailableerrors prevented repository service calls, causing tasks to exhaust lock retries and fail. This PR:NotificationDebouncerwith Redis fencing tokens to coalesce concurrent notificationsnotification_lock, newLockAcquisitionLimitError, andhandle_lock_retryhelperearliest_retry_after_secondsfrom the exception or fall back toget_seconds_to_next_hourNOTIF_LOCK_ERRORadded toTestResultsFlowfor observabilityChanges
NoConfiguredAppsAvailableexception inBundleAnalysisProcessorTaskLockRetryLimitExceededErrortoLockAcquisitionLimitErrorfor clarityThis fix was generated by Seer in Sentry, triggered by Joe Becher. 👁️ Run ID: 7765586
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. In 2022 this entity acquired Codecov and as result Sentry is going to need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.
Note
Introduces Redis-based notification debouncing and unified lock/retry handling across worker tasks, plus improved GitHub App rate-limit backoff.
NotificationDebouncerwith fencing tokens (SKIP_DEBOUNCE_TOKEN) andnotification_lock; integrate intonotify,test_analytics_notifier, andbundle_analysis_notifyto coalesce concurrent runs and exit when tokens are stalehandle_lock_retryand newLockAcquisitionLimitError; logNOTIF_LOCK_ERRORinUploadFlow/TestResultsFlow;BaseCodecovTasknow catches lock-acquisition limits and factors DB-retry logic into_handle_database_error_retryearliest_retry_after_secondswhen present; otherwise fall back toget_seconds_to_next_hour; apply in base/notify/bundle-analysis processor_get_earliest_rate_limit_ttl), and surface it viaNoConfiguredAppsAvailableWritten by Cursor Bugbot for commit 5cba3e8. This will update automatically on new commits. Configure here.