Skip to content

fix(common): add exponential backoff to MinIO HTTP downloads#716

Open
unidel2035 wants to merge 3 commits intoOpenSPG:masterfrom
unidel2035:issue-712-9f36f806
Open

fix(common): add exponential backoff to MinIO HTTP downloads#716
unidel2035 wants to merge 3 commits intoOpenSPG:masterfrom
unidel2035:issue-712-9f36f806

Conversation

@unidel2035
Copy link

@unidel2035 unidel2035 commented Nov 1, 2025

Summary

This PR fixes the 503 Service Unavailable errors when downloading files from MinIO, as reported in issue #712.

Problem

Users were experiencing 503 Server Error: Service Unavailable errors when uploading files to the knowledge base. The error occurred during file download from MinIO storage through the download_from_http function in kag/common/utils.py:300.

The root cause was that the retry mechanism (@retry(stop=stop_after_attempt(3))) was retrying immediately without any delay between attempts, giving MinIO no time to recover from temporary overload or unavailability.

Solution

Added exponential backoff to the retry logic:

  • Import: Added wait_exponential from tenacity library
  • Retry Strategy: Updated decorator to @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
    • First retry: waits ~2 seconds
    • Second retry: waits ~4 seconds
    • Maximum wait: 10 seconds
  • Documentation: Enhanced function docstring to document retry behavior and optional dest parameter

Changes

  • kag/common/utils.py: Added exponential backoff to download_from_http function
  • tests/unit/common/test_utils.py: Added comprehensive unit tests covering:
    • Successful download scenarios
    • Retry behavior on 503 errors
    • Max retry exhaustion handling

Testing

  • ✅ All code formatted with black
  • ✅ Passes flake8 linting
  • ✅ Added unit tests with mocked HTTP responses
  • ✅ Tests verify retry count and exponential backoff behavior

Impact

This change significantly improves reliability when uploading files to knowledge bases by gracefully handling transient MinIO service unavailability, consistent with retry patterns used elsewhere in the codebase (e.g., LLM client, hybrid retrieval).

Fixes #712


🤖 Generated with Claude Code

unidel2035 and others added 2 commits November 1, 2025 16:01
Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: undefined
This fix addresses the 503 Service Unavailable errors reported in issue OpenSPG#712
when downloading files from MinIO. The download_from_http function now includes
exponential backoff retry logic to handle transient service unavailability.

Changes:
- Added wait_exponential to the retry decorator with multiplier=1, min=2, max=10
- Updated function docstring to document retry behavior
- Added comprehensive unit tests for retry mechanism

The exponential backoff gives MinIO time to recover from temporary overload,
significantly improving reliability when uploading files to the knowledge base.

Fixes OpenSPG#712

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@unidel2035 unidel2035 changed the title [WIP] [Bug] [Module Name] minio报错 fix(common): add exponential backoff to MinIO HTTP downloads Nov 1, 2025
@unidel2035 unidel2035 marked this pull request as ready for review November 1, 2025 16:07
@unidel2035
Copy link
Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

📎 Log file uploaded as GitHub Gist (226KB)
🔗 View complete solution draft log


Now working session is ended, feel free to review and add any feedback on the solution draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [Module Name] minio报错

1 participant