Handle retries without returning errors for non-network errors by paymanblu · Pull Request #22 · IBM/watsonx-go

paymanblu · 2025-07-25T23:10:05Z

Fix Retry logic to avoid loss of request body on retry and return response body on non-200 Responses:

Issue #22

This PR addresses an issue in the watsonx-go client library where HTTP request bodies were being lost during retry attempts, causing 400 Bad Request errors from the server.

Description

Root Cause: When using bytes.NewBuffer() for HTTP request bodies, the buffer gets consumed on the first request attempt. On subsequent retries, the request body was empty, leading to 400 HTML formatted server errors instead of expected json http responses.
Impact: All retry attempts after the first request would fail with 400 Bad Request errors, making the retry mechanism behave unexpectedly, and lose context of error response from server.
Additional Issues: The existing retry logic, wraps non-200 status codes into an error and returns them as a non-nil error which causes the caller to no longer have access to the body of the http response which may contain critically useful troubleshooting information about the server error. Especially for 400 and 500 error (i.e. non-network errors), the server typically responds with helpful error, and status information. Currently, only the status number is returned in the err and nil is returned as the response. The current behavior of wrapping the status code integer as an error and returning it to the caller also seems to be inconsistent with traditional go http.Client behavior of returning nil error and the response for the caller to handle as they see fit.

Code Changes

1. Enhanced Retry Mechanism (`pkg/models/retry.go`)

Refactor of the retry logic to support request body handling and allowing for nil error and non-nil response for non-200 and non-network error cases.
Added req.GetBody support: Enables automatic request body recreation for retries (i.e. in prepareRequest)
New retry configuration options:
- WithRetryIfV2(): Response-based retry conditions (vs legacy error-based)
- WithOnRetryV2(): Enhanced retry callbacks with response access
- WithReturnHTTPStatusAsErr(): Controls legacy behavior of converting HTTP status to errors
- Retry(): Now expectes a request parameter for better control over the request (e.g. ability to prepare request body for retry)
Improved error handling: Proper separation of network errors vs HTTP response errors

2. Fixed Request Body Handling (`pkg/models/generate.go`, `pkg/models/embedding.go`)

Replaced bytes.NewBuffer() with bytes.NewReader(): Prevents buffer consumption issues
Enhanced error handling: Better HTTP status code detection and error reporting

3. Added Test Coverage (`pkg/internal/tests/models/generate_test.go`)

Added invalid parameter tests: Validate proper error handling for bad requests
Added invalid model tests: Test both new and legacy retry behaviors
Enhanced error validation: Parse and validate JSON error responses
Improved test organization: Better constants and helper functions

4. Development Tools Enhancement (`Makefile`)

Added fmt-check target: Check code formatting without modifying files
Enhanced build pipeline: Added proper dependencies between targets
Improved structure: Better organization and documentation

Migration Notes

No breaking changes: All existing code continues to work unchanged
Optional enhancements: New retry options available for advanced use cases
Legacy support: Old retry behavior maintained through configuration flags

Files Modified

pkg/models/retry.go - Core retry mechanism improvements
pkg/models/generate.go - Request body fix for text generation
pkg/models/embedding.go - Request body fix for embeddings
pkg/internal/tests/models/generate_test.go - Enhanced test coverage
Makefile - Added formatting check capabilities

Externally Visible Outcome

GenerateText err - Before fix

GenerateText err - After fix

{"errors":[{"code":"model_not_supported","message":"Model 'meta-llama/llama-3-70b-instruct-foo-test' is not supported","more_info":"https://cloud.ibm.com/apidocs/watsonx-ai#text-generation"}],"trace":"3a8e0587e089f3d55d8f5d16f4f90d7c","status_code":404}

Discussion Topics

It would simplify the retry.go logic if we could just make breaking changes and document them. It seems that although the Retry function and the RetryConfig are exported and visible, it is very likely that the consumers of the existing behavior will not have tried to alter the default behavior, and so we could possibly just update the code and document a migration path forward. The migration in such a case would be minimal in that the GenerateText and EmbedDocuments still return an error response as before, but the error response is just a richer response with more details.
The current PR is a bit more complex than it needs to be because it is trying to preserve backwards compatibility in the Retry logic by maintaining v1 and v2 logic and switch between them via the flag returnHTTPStatusAsErr.

paymanblu · 2025-07-25T23:17:45Z

This initial draft is meant for discussion purposes. I would be happy to discuss and revise the strategy of using different naming for the feature flags in this PR, or any suggestions.

paymanblu

I am leaving some Notes for the maintainers to make it easier to follow the thought process.

paymanblu · 2025-07-25T23:18:53Z

README.md

 ```go
 result, _ := client.GenerateText(
-  "meta-llama/llama-3-1-8b-instruct",
+  "meta-llama/llama-3-3-70b-instruct",


Note: the initial models seem to no longer be available by default

paymanblu · 2025-07-25T23:20:48Z

pkg/internal/tests/models/retry_test.go

+	elapsedTime := endTime.Sub(startTime)
+	expectedMinimumTime := backoffTime * time.Duration(expectedRetries)
+
+	if err != nil {


Note: The new behavior is that the error will be nil and the response will be non-nil.

This will give the user the opportunity to react to the response and use the response contents for troubleshooting purposes by logging/printing them to the end user.

paymanblu · 2025-07-25T23:22:07Z

pkg/models/retry.go


-		if !opts.retryIf(err) {
-			return nil, err
+		shouldRetry = retryConfig.retryIf(retryIfV1Err) || retryConfig.retryIfV2(resp, err)


Note: we call both the legacy and new retryIf for backwards compatibility purposes

paymanblu · 2025-07-25T23:22:35Z

pkg/models/retry.go


-		lastErr = err
-		opts.onRetry(n+1, err)
+		retryConfig.onRetry(n+1, statusAsErr)


Note: we call both the legacy and new onRetry for backwards compatibility purposes

paymanblu · 2025-07-25T23:23:43Z

pkg/models/retry.go

 type HttpClient struct {
-	httpClient *http.Client
+	httpClient  *http.Client
+	retryConfig *RetryConfig


Note: we add the RetryConfig to the struct as a way to influence the Client retry logic from the outside.

paymanblu · 2025-07-25T23:26:06Z

pkg/models/retry.go

 type RetryIfFunc func(error) bool

+// RetryIfFuncV2 determines whether a retry should be attempted based on the response.
+type RetryIfFuncV2 func(*http.Response, error) bool


Note: this RetryIfV2 variation gives the user the ability to decide what they want to do based on the original response and the error. We don't coerce the status into an error prematurely as to give the user more control over the original Response and error to make a better informed decision on when to Retry.

christopherallis · 2025-10-22T18:40:17Z

Thanks @paymanblu for this contribution. Could you help let the DCO pass? Just run git commit -s and push the changes.

Handle retries without returning errors for non-network errors

48919b6

paymanblu commented Jul 25, 2025

View reviewed changes

paymanblu mentioned this pull request Jul 26, 2025

Client Retry logic causes requests to not return response body #23

Open

paymanblu and others added 3 commits July 27, 2025 12:14

Fix empty request body on retry attempts. Add more tests

59b888f

cleanup redundant GetBody in embedding

17626a8

Merge branch 'main' into fix-error-logic-with-default-fallback-to-legacy

ba9bbad

Merge branch 'main' into fix-error-logic-with-default-fallback-to-legacy

f49a1e4

christopherallis self-requested a review January 7, 2026 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle retries without returning errors for non-network errors#22

Handle retries without returning errors for non-network errors#22
paymanblu wants to merge 5 commits intoIBM:mainfrom
paymanblu:fix-error-logic-with-default-fallback-to-legacy

paymanblu commented Jul 25, 2025 •

edited

Loading

Uh oh!

paymanblu commented Jul 25, 2025

Uh oh!

paymanblu left a comment

Uh oh!

paymanblu Jul 25, 2025

Uh oh!

paymanblu Jul 25, 2025

Uh oh!

paymanblu Jul 25, 2025

Uh oh!

paymanblu Jul 25, 2025

Uh oh!

paymanblu Jul 25, 2025

Uh oh!

paymanblu Jul 25, 2025

Uh oh!

christopherallis commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

paymanblu commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix Retry logic to avoid loss of request body on retry and return response body on non-200 Responses:

Description

Code Changes

1. Enhanced Retry Mechanism (pkg/models/retry.go)

2. Fixed Request Body Handling (pkg/models/generate.go, pkg/models/embedding.go)

3. Added Test Coverage (pkg/internal/tests/models/generate_test.go)

4. Development Tools Enhancement (Makefile)

Migration Notes

Files Modified

Externally Visible Outcome

GenerateText err - Before fix

GenerateText err - After fix

Discussion Topics

Uh oh!

paymanblu commented Jul 25, 2025

Uh oh!

paymanblu left a comment

Choose a reason for hiding this comment

Uh oh!

paymanblu Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

paymanblu Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

paymanblu Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

paymanblu Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

paymanblu Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

paymanblu Jul 25, 2025

Choose a reason for hiding this comment

Uh oh!

christopherallis commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

paymanblu commented Jul 25, 2025 •

edited

Loading

1. Enhanced Retry Mechanism (`pkg/models/retry.go`)

2. Fixed Request Body Handling (`pkg/models/generate.go`, `pkg/models/embedding.go`)

3. Added Test Coverage (`pkg/internal/tests/models/generate_test.go`)

4. Development Tools Enhancement (`Makefile`)