Skip to content

Fix Memory Leaks and Improve Cleanup in S3 Multipart Upload#15210

Open
kumarpritam863 wants to merge 18 commits intoapache:mainfrom
kumarpritam863:fix/s3-multipart-upload-cleanup
Open

Fix Memory Leaks and Improve Cleanup in S3 Multipart Upload#15210
kumarpritam863 wants to merge 18 commits intoapache:mainfrom
kumarpritam863:fix/s3-multipart-upload-cleanup

Conversation

@kumarpritam863
Copy link
Contributor

Cleanup staging files list and multipart upload after close() and ubortUpload() after closing all the staging files.

@github-actions github-actions bot added the AWS label Feb 1, 2026
@kumarpritam863
Copy link
Contributor Author

@singhpk234 can you please review.

@kumarpritam863
Copy link
Contributor Author

@RussellSpitzer can you please take a look.

Comment on lines +410 to +412
// clear staging files and multipart map
stagingFiles.clear();
multiPartMap.clear();
Copy link
Contributor

@singhpk234 singhpk234 Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its fine to do eager cleanup
though wouldn't the stream post abort closed and hence GCed ? is this staying in memory for long

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @singhpk234 for the review.

Regarding memory management:
While the staging files list will eventually allow objects to be garbage-collected once they go out of scope, I’m concerned that retaining strong references to many FileAndDigest objects (especially in upload-heavy / long-running workloads) can still cause practical issues:

  • Increased heap pressure during periods of high concurrent or sequential uploads
  • Longer object lifetime → more frequent / longer GC pauses
  • Higher risk of OutOfMemoryError during peak load (I’ve sometimes observed OOMs in similar scenarios when large numbers of parts accumulate without cleanup while running Iceberg-Kafka-Connect)

Even though the theoretical lifetime is finite, the practical memory pressure and GC overhead seem non-negligible in our use case.

Also although it does not effect the AWS multipart upload as AWS requires the part number to be unique but starting the part number from 1 and keeping it in low bounds make managing CompleteMultipartUpload requests easier. Currently the part number comes from the Index() of the part-file from staging files list which can start from a higher number if the previous files are not cleared.

Please let me know your thoughts on these.

@kumarpritam863
Copy link
Contributor Author

Hi @singhpk234 I have also added the tests for proper cleaning of staging files and multipart map. Can you please review once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants