Retry if driver throws an JobQueueDriverError connectionError by adam-fowler · Pull Request #77 · hummingbird-project/swift-jobs

adam-fowler · 2025-03-05T12:05:59Z

Add withExponentialBackoff which retries an operation with exponential backoff if it throws a JobQueueDriverError with code set to .connectionError.
Wrap all driver operations in withExponentialBackoff
Add driver specific retry options to JobQueueOptions

codecov · 2025-03-05T12:08:11Z

Codecov Report

Attention: Patch coverage is 94.11765% with 4 lines in your changes missing coverage. Please review.

Project coverage is 91.75%. Comparing base (b6f6cb2) to head (3383a0f).

Files with missing lines	Patch %	Lines
Sources/Jobs/JobQueueDriverError.swift	72.72%	3 Missing ⚠️
Sources/Jobs/JobQueueHandler.swift	98.11%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #77      +/-   ##
==========================================
- Coverage   92.20%   91.75%   -0.45%     
==========================================
  Files          23       24       +1     
  Lines        1296     1347      +51     
==========================================
+ Hits         1195     1236      +41     
- Misses        101      111      +10

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

thoven87 · 2025-03-05T16:54:13Z

Sources/Jobs/JobQueueHandler.swift

+                return try await operation()
+            } catch let error as JobQueueDriverError where error.code == .connectionError {
+                logger.debug("\(message()) failed")
+                if self.options.driverRetryStrategy.shouldRetry(attempt: attempt, error: error) {


Should this call still be made since the default maxAttempt is set to the maximum int value? We can have a maximum of two states here where a job was popped off a queue and we loose connection to the driver and will retry until connected or the job lost connection while polling.

For the first case, I am wondering if we should have a background running that finds jobs with states 'processing' that do not exist in a queue? Or should we by default move jobs with such state to their specific queue?

So if we hit the retry limit the error is propagated further up and the job queue handler exits and we'll have to restart the queue process to continue processing jobs. The default is set to .max as the alternative is exiting the process.

If the default is set to a lower number and we exit the handler then the cleanup at start can fixup any jobs left in the processing state.

So if we hit the retry limit the error is propagated further up and the job queue handler exits and we'll have to restart the queue process to continue processing jobs. The default is set to .max as the alternative is exiting the process.

If the default is set to a lower number and we exit the handler then the cleanup at start can fixup any jobs left in the processing state.

By default all the drivers are setup to do nothing on boot. I think this should be documented.

There is a lot of documentation to add. We have made a lot of changes since the last release

There is a lot of documentation to add. We have made a lot of changes since the last release

Indeed! I will help with documents too

Also, I forgot to mention this earlier. How will this work with the Postgres driver? PostgresNIO seems to keep on retrying after a connection lost. I am that familiar with the Redis driver, I suppose it'll be same since the connection pool logic seems very similar between the two?

Yeah PostgresNIO will retry connections ad-infinitum. So in theory it isn't an issue when using the Postgres driver.

Redis is different in that it will eventually throw an error and has different errors for when an open connection was closed and when a connection couldn't be made.

Without this change the error would be propagated up and end the job queue handler and eventually the application.

We could move the retry to the drivers instead. I'm already asking the drivers to recognise connection errors.

adam-fowler · 2025-03-06T09:28:10Z

I'm going to put this on hold, while I think about it. I might push this functionality down to the drivers where needed

adam-fowler added 2 commits March 5, 2025 10:57

Add JobQueueDriverError

ae47876

Retry driver operations if they throw errors instead of just dying

f014377

adam-fowler requested review from Joannis and thoven87 as code owners March 5, 2025 12:06

Add test with driver throwing an error

6f93e5c

adam-fowler linked an issue Mar 5, 2025 that may be closed by this pull request

Job queue iteration error handling #66

Open

Only retry on receiving a connection error

3383a0f

adam-fowler changed the title ~~Retry if driver throws an error~~ Retry if driver throws an JobQueueDriverError connectionError Mar 5, 2025

thoven87 reviewed Mar 5, 2025

View reviewed changes

thoven87 approved these changes Mar 5, 2025

View reviewed changes

adam-fowler marked this pull request as draft March 6, 2025 09:28

adam-fowler mentioned this pull request Mar 6, 2025

Job queue iteration error handling #66

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Retry if driver throws an JobQueueDriverError connectionError#77

Retry if driver throws an JobQueueDriverError connectionError#77
adam-fowler wants to merge 4 commits intomainfrom
driver-error

adam-fowler commented Mar 5, 2025 •

edited

Loading

Uh oh!

codecov bot commented Mar 5, 2025 •

edited

Loading

Uh oh!

thoven87 Mar 5, 2025

Uh oh!

adam-fowler Mar 5, 2025 •

edited

Loading

Uh oh!

thoven87 Mar 5, 2025

Uh oh!

adam-fowler Mar 5, 2025

Uh oh!

thoven87 Mar 5, 2025

Uh oh!

thoven87 Mar 5, 2025

Uh oh!

adam-fowler Mar 6, 2025

Uh oh!

adam-fowler commented Mar 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

adam-fowler commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

thoven87 Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

adam-fowler Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thoven87 Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

adam-fowler Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

thoven87 Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

thoven87 Mar 5, 2025

Choose a reason for hiding this comment

Uh oh!

adam-fowler Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

adam-fowler commented Mar 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adam-fowler commented Mar 5, 2025 •

edited

Loading

codecov bot commented Mar 5, 2025 •

edited

Loading

adam-fowler Mar 5, 2025 •

edited

Loading