Skip to content

Add client retry guidance#2954

Merged
JordonPhillips merged 3 commits intomainfrom
client-guidance-retries
Feb 25, 2026
Merged

Add client retry guidance#2954
JordonPhillips merged 3 commits intomainfrom
client-guidance-retries

Conversation

@JordonPhillips
Copy link
Contributor

Notably I don't concretely define any retry strategies as AWS uses them. Should I? I started writing such a section, but a lot of it comes down to specifics of the service and tuning of the parameters.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@JordonPhillips JordonPhillips requested a review from a team as a code owner February 2, 2026 20:03
Copy link
Contributor

@ianbotsf ianbotsf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notably I don't concretely define any retry strategies as AWS uses them. Should I? I started writing such a section, but a lot of it comes down to specifics of the service and tuning of the parameters.

Yes, I think we should give an overview of the standard retry mode (but probably not adaptive retry mode). Fortunately, you already talk about the relevant concepts in Why is a retry system recommended? so it's just tying it all together.

Comment on lines 20 to 37

These sorts of events are called **retry storms**, and are often the result of
poorly managed retry behavior. A retry loop with no delays between attempts is
most likely to contribute to a retry storm, but a simple delay between attempts
can be just as bad because it can result in spikes of requests from the same
system.

Instead of a fixed delay, using **exponential backoff** to produce delays that
are longer each time balances the desire to get a quick success with the desire
to give the service more time to recover. Adding some randomness to that delay
(known as **jitter**) can result in a smoother request load. This strategy,
called **exponential backoff with jitter**, is relatively common but it isn't
perfect.

There is no perfect retry implementation. Strategies will inevitably improve
over time as the scale of systems grows and new cascading failure conditions are
observed. However, the right interface reflecting the problem domain can make
sure the right extension points are available for future expansion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: These paragraphs discuss retry strategies and so I don't believe they belong under the header "Why is a retry system recommended?". I suggest a new section for these.

Comment on lines 217 to 220
## Example request loop

The following is a simplified example of what it looks like to use a
`RetryStrategy` to implement a retryable request loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This example doesn't include any of the retryability features you discussed in the previous section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an implementation of the part of the request pipeline that uses retry strategies. It is only concerned with creating the parameters and using the RetryStrategy it is not concerned with what the RetryStrategy does under the hood, including examining error metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps your standard retry mode example will tie it all together more strongly but the ordering of these sections feels confusing. In one section we introduce the retry strategy API, then the next section discusses possible implementation details without connecting them to that API, then the following section shows how to use the strategy public API. At the very least we should reorder these sections so that we finish talking about the retry strategy API before introducing orthogonal concepts.

@JordonPhillips JordonPhillips force-pushed the client-guidance-retries branch from 69b2ec9 to b23a9aa Compare February 11, 2026 17:16
@github-actions
Copy link
Contributor

This pull request does not contain a staged changelog entry. To create one, use the ./.changes/new-change command. For example:

./.changes/new-change --pull-requests "#2954" --type feature --description "Add client retry guidance"

Make sure that the description is appropriate for a changelog entry and that the proper feature type is used. See ./.changes/README or run ./.changes/new-change -h for more information.

Copy link
Contributor

@ianbotsf ianbotsf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like the flow of these sections and the example retry strategy at the end really ties things all together. Just a few minor things left.

Comment on lines 100 to 102
An initial retry token should be acquired at the beginning of a request, before
the first attempt is made. If an initial token cannot be acquired, the client
should still make an attempt.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Why should the client still make an attempt if an initial token cannot be fetched? Doesn't that indicate the retry strategy thinks we should not make the initial attempt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This system is about managing retry behavior, not about gating access to the service on the client side.

On a technical level, the retry strategy may not be able to recover if no attempts are being made at all. I'll make this a bit more clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, why do we even need an initial token?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still useful for tracking request information. Some strategies might want to track success rate, for example. Or they can impose an initial delay.

@JordonPhillips JordonPhillips merged commit 40ddb4a into main Feb 25, 2026
16 checks passed
@JordonPhillips JordonPhillips deleted the client-guidance-retries branch February 25, 2026 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants