DRIVERS-3326: clarify retry behavior when errors with NoWritesPerformed are encountered#1878
DRIVERS-3326: clarify retry behavior when errors with NoWritesPerformed are encountered#1878
Conversation
| pool exception originating from the driver) or the error is labeled "NoWritesPerformed", the most recently | ||
| encountered error that does not contain a "NoWritesPerformed" label MUST be returned instead. | ||
| - If all server errors are labeled "NoWritesPerformed", then the first error should be raised. | ||
| - Otherwise, return the most recently encountered error. |
There was a problem hiding this comment.
What is the "otherwise" case that this bullet is actually addressing?
Combining the logic in the first two bullets (let me know if I'm mistaken here), drivers should return the most recent error without a "NoWritesPerformed" label, or fall back to the first error (if everything had "NoWritesPerformed").
Note: it's not clear to me if "NoWritesPerformed" is only added by the server for its responses or something a driver might tack onto a client-side error (that supports labels). I assume it's only a server error, as this spec never talks about adding it.
IIUC, a comprehensive test for this spec change requires multiple attempts to be made (i.e. CSOT or backpressure). Therefore, any proxy that modifies the error response between the server and client is insufficient. Ignoring the CSOT/backpressure requirement for a moment, couldn't you forgo a proxy by staying multiple fail points with slightly different error codes and a "NoWritesPerformed" as needed? Kind of unfortunate that failCommand gives us no control over the error message, as that would probably be the easiest thing to tweak. |
|
@prestonvasquez I'm adding you as a review in case you have time; I think you wrote added |
jmikola
left a comment
There was a problem hiding this comment.
Suggest removing an unrelated change to a previous changelog entry. The actual content from the PR LGTM.
- `prosetest` -> `prose tests`
This PR clarifies the behavior when multiple errors with NoWritesPerformed are encountered (potentially possible when CSOT is enabled, or in the future with backpressure).
I wanted to add testing, but I ran into some difficulties that led me to believe it probably isn't worth adding test infrastructure just for this test.
To get around this, I wrote a simple JS proxy that added an incrementing counter to the payload of each response, so that the error a user receives has an identifier. This worked, but when I went to test this, I ran into another issue: to test this scenario, we need multiple retryable writes (CSOT or backpressure). Backpressure hasn't merged yet, so we can't rely on that feature. When I tried using CSOT, I could not actually reproduce this scenario because every CSOT error manifested as a timeoutMS error (which does not have a NoWritesPerformed error label).
One option is to make this PR depend on the client backpressure work and to put my proxy into drivers-evergreen-tools. This still might be more trouble than it's worth, because this is an additional CI variant for drivers, just for one test. The proxy must:
Ultimately, I decided the testing here is more trouble than it's worth. Happy to reconsider if reviewers feel differently though.
Please complete the following before merging:
clusters).