Conversation
ianbotsf
left a comment
There was a problem hiding this comment.
Notably I don't concretely define any retry strategies as AWS uses them. Should I? I started writing such a section, but a lot of it comes down to specifics of the service and tuning of the parameters.
Yes, I think we should give an overview of the standard retry mode (but probably not adaptive retry mode). Fortunately, you already talk about the relevant concepts in Why is a retry system recommended? so it's just tying it all together.
|
|
||
| These sorts of events are called **retry storms**, and are often the result of | ||
| poorly managed retry behavior. A retry loop with no delays between attempts is | ||
| most likely to contribute to a retry storm, but a simple delay between attempts | ||
| can be just as bad because it can result in spikes of requests from the same | ||
| system. | ||
|
|
||
| Instead of a fixed delay, using **exponential backoff** to produce delays that | ||
| are longer each time balances the desire to get a quick success with the desire | ||
| to give the service more time to recover. Adding some randomness to that delay | ||
| (known as **jitter**) can result in a smoother request load. This strategy, | ||
| called **exponential backoff with jitter**, is relatively common but it isn't | ||
| perfect. | ||
|
|
||
| There is no perfect retry implementation. Strategies will inevitably improve | ||
| over time as the scale of systems grows and new cascading failure conditions are | ||
| observed. However, the right interface reflecting the problem domain can make | ||
| sure the right extension points are available for future expansion. |
There was a problem hiding this comment.
Style: These paragraphs discuss retry strategies and so I don't believe they belong under the header "Why is a retry system recommended?". I suggest a new section for these.
| ## Example request loop | ||
|
|
||
| The following is a simplified example of what it looks like to use a | ||
| `RetryStrategy` to implement a retryable request loop. |
There was a problem hiding this comment.
Nit: This example doesn't include any of the retryability features you discussed in the previous section.
There was a problem hiding this comment.
This is an implementation of the part of the request pipeline that uses retry strategies. It is only concerned with creating the parameters and using the RetryStrategy it is not concerned with what the RetryStrategy does under the hood, including examining error metadata.
There was a problem hiding this comment.
Perhaps your standard retry mode example will tie it all together more strongly but the ordering of these sections feels confusing. In one section we introduce the retry strategy API, then the next section discusses possible implementation details without connecting them to that API, then the following section shows how to use the strategy public API. At the very least we should reorder these sections so that we finish talking about the retry strategy API before introducing orthogonal concepts.
69b2ec9 to
b23a9aa
Compare
|
This pull request does not contain a staged changelog entry. To create one, use the Make sure that the description is appropriate for a changelog entry and that the proper feature type is used. See |
ianbotsf
left a comment
There was a problem hiding this comment.
Nice, I like the flow of these sections and the example retry strategy at the end really ties things all together. Just a few minor things left.
| An initial retry token should be acquired at the beginning of a request, before | ||
| the first attempt is made. If an initial token cannot be acquired, the client | ||
| should still make an attempt. |
There was a problem hiding this comment.
Question: Why should the client still make an attempt if an initial token cannot be fetched? Doesn't that indicate the retry strategy thinks we should not make the initial attempt?
There was a problem hiding this comment.
This system is about managing retry behavior, not about gating access to the service on the client side.
On a technical level, the retry strategy may not be able to recover if no attempts are being made at all. I'll make this a bit more clear.
There was a problem hiding this comment.
If that's the case, why do we even need an initial token?
There was a problem hiding this comment.
It's still useful for tracking request information. Some strategies might want to track success rate, for example. Or they can impose an initial delay.
Notably I don't concretely define any retry strategies as AWS uses them. Should I? I started writing such a section, but a lot of it comes down to specifics of the service and tuning of the parameters.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.