Conversation
Noticed the following error would happen a lot on OpenShift: > [2025-08-05T07:07:10Z INFO core_dump_agent] Setting s3 endpoint location to: <REDACTED> > [2025-08-05T07:07:10Z INFO core_dump_agent] Uploading: /var/mnt/core-dump-handler/cores/<REDACTED>.zip > [2025-08-05T07:07:10Z INFO core_dump_agent] zip size is 129879392 > [2025-08-05T07:07:19Z ERROR core_dump_agent] Upload Failed hyper: channel closed Restarting the pod or retrying the upload would not help. After upgrading, the uploads finally worked again: > Retrying reqwest: error sending request for url (http://<REDACTED>/core-dumps-storage-bucket-<SNIP>.zip?partNumber=3&uploadId=<REDACTED>) Signed-off-by: Nathan Monfils <nathan.monfils@destiny.eu>
b6c0d77 to
350152f
Compare
added 2 commits
January 23, 2026 16:42
Prevents zip files from being lost if the upload failed for whichever reason. Signed-off-by: Nathan Monfils <nathan.monfils@destiny.eu>
Signed-off-by: Nathan Monfils <nathan.monfils@destiny.eu>
350152f to
40a7c34
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi!
We have been using
core-dump-handlerfor a little while but frequently encountered the following behavior on OpenShift:Restarting the pod would attempt the upload again, but it would not succeed either. We've had to resort to fetching the zip file using
kubectlwhich is quite painful operationally.For me the issue is twofold:
For the first problem, simply updating
rust-s3and its dependencies worked a treat:The lack of retry isn't nearly as painful, but in order to offer a harder guarantee that we won't have missing uploads (e.g. because of an issue with the S3 bucket itself), I have added a retry mechanism with exponential backoff.
On that front the application behavior is a bit spaghetti, and I am not 100 % sure that
use_inotify == "true"is the right condition to check to enable the retry behavior. However in my setup (k8s using inotify) this works a treat.This is actually a rebase of a broader change we are implemented internally, which also includes Prometheus metrics for core dumps to allow us to write some alerts (which is why I didn't raise an issue beforehand, the work had to be done anyway). If this PR gets merged I will create a follow-up for that.