Retry if driver throws an JobQueueDriverError connectionError#77
Retry if driver throws an JobQueueDriverError connectionError#77adam-fowler wants to merge 4 commits intomainfrom
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #77 +/- ##
==========================================
- Coverage 92.20% 91.75% -0.45%
==========================================
Files 23 24 +1
Lines 1296 1347 +51
==========================================
+ Hits 1195 1236 +41
- Misses 101 111 +10 ☔ View full report in Codecov by Sentry. |
| return try await operation() | ||
| } catch let error as JobQueueDriverError where error.code == .connectionError { | ||
| logger.debug("\(message()) failed") | ||
| if self.options.driverRetryStrategy.shouldRetry(attempt: attempt, error: error) { |
There was a problem hiding this comment.
Should this call still be made since the default maxAttempt is set to the maximum int value? We can have a maximum of two states here where a job was popped off a queue and we loose connection to the driver and will retry until connected or the job lost connection while polling.
For the first case, I am wondering if we should have a background running that finds jobs with states 'processing' that do not exist in a queue? Or should we by default move jobs with such state to their specific queue?
There was a problem hiding this comment.
So if we hit the retry limit the error is propagated further up and the job queue handler exits and we'll have to restart the queue process to continue processing jobs. The default is set to .max as the alternative is exiting the process.
If the default is set to a lower number and we exit the handler then the cleanup at start can fixup any jobs left in the processing state.
There was a problem hiding this comment.
So if we hit the retry limit the error is propagated further up and the job queue handler exits and we'll have to restart the queue process to continue processing jobs. The default is set to
.maxas the alternative is exiting the process.If the default is set to a lower number and we exit the handler then the cleanup at start can fixup any jobs left in the
processingstate.
By default all the drivers are setup to do nothing on boot. I think this should be documented.
There was a problem hiding this comment.
There is a lot of documentation to add. We have made a lot of changes since the last release
There was a problem hiding this comment.
There is a lot of documentation to add. We have made a lot of changes since the last release
Indeed! I will help with documents too
There was a problem hiding this comment.
Also, I forgot to mention this earlier. How will this work with the Postgres driver? PostgresNIO seems to keep on retrying after a connection lost. I am that familiar with the Redis driver, I suppose it'll be same since the connection pool logic seems very similar between the two?
There was a problem hiding this comment.
Yeah PostgresNIO will retry connections ad-infinitum. So in theory it isn't an issue when using the Postgres driver.
Redis is different in that it will eventually throw an error and has different errors for when an open connection was closed and when a connection couldn't be made.
Without this change the error would be propagated up and end the job queue handler and eventually the application.
We could move the retry to the drivers instead. I'm already asking the drivers to recognise connection errors.
|
I'm going to put this on hold, while I think about it. I might push this functionality down to the drivers where needed |
withExponentialBackoffwhich retries an operation with exponential backoff if it throws aJobQueueDriverErrorwith code set to.connectionError.withExponentialBackoffJobQueueOptions