-
Notifications
You must be signed in to change notification settings - Fork 277
Description
Describe the bug
Useful context:
- https://datatracker.ietf.org/doc/html/rfc7540#section-6.8
- https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-troubleshooting.html#response-code-000
The issue
When concurrently calling a lambda through aws-sdk-lambda, DispatchFailure errors are returned to user code at seemingly regular intervals.
HTTP/2 GOAWAY errors not retried, causing DispatchFailure under sustained load
Summary
When making sustained high-concurrency requests to AWS APIs using the AWS SDK for Rust, the SDK intermittently fails with DispatchFailure errors complaining about HTTP/2 GOAWAY frames.
These errors are not retried automatically by the SDK, instead the failures that propagate to application code.
DispatchFailure(DispatchFailure { source: ConnectorError { kind: Io,
source: hyper_util::client::legacy::Error(SendRequest,
hyper::Error(Http2, Error { kind: GoAway(b"", NO_ERROR, Remote) })),
connection: Unknown } })
To Reproduce
- aws-sdk-lambda: 1.113.0
- aws-config: 1.8.12
- aws-smithy-runtime: 1.9.8
- aws-smithy-http-client: 1.1.6 so this is occurs even after #4145
- Rust: 1.85 (2024 edition)
- Platform: Linux x86_64 AND Firecracker VM (when making calls from aws lambda)
Quickest way to encouter the error seems to be to repeatedly send batches of concurrent requests.
use anyhow::Result;
use aws_config::BehaviorVersion;
use aws_sdk_lambda::Client;
use aws_smithy_runtime_api::client::result::SdkError;
use aws_smithy_types::Blob;
use futures_util::future::join_all;
const FUNCTION_ARN: &str = "YOUR_LAMBDA_ARN";
const PAYLOAD: &str = "[]";
const BATCH_SIZE: usize = 100;
const MAX_BATCHES: usize = 1000;
/// Check if an SDK error is a dispatch failure (e.g., HTTP/2 GOAWAY).
fn is_dispatch_failure<E, R>(err: &SdkError<E, R>) -> bool {
matches!(err, SdkError::DispatchFailure(_))
}
#[tokio::main]
async fn main() -> Result<()> {
let config = aws_config::load_defaults(BehaviorVersion::latest()).await;
let client = Client::new(&config);
for batch in 0..MAX_BATCHES {
let mut tasks = Vec::with_capacity(BATCH_SIZE);
for _ in 0..BATCH_SIZE {
let client = client.clone();
tasks.push(tokio::spawn(async move {
client
.invoke()
.function_name(FUNCTION_ARN)
.payload(Blob::new(PAYLOAD))
.send()
.await
}));
}
let results = join_all(tasks).await;
for res in results {
match res.unwrap() {
Ok(_) => {}
Err(e) => {
eprintln!("failure at batch {}: {:?}", batch, e);
assert!(is_dispatch_failure(&e));
return Err(e.into());
}
}
}
if batch % 10 == 0 {
eprintln!("Completed batch {}", batch);
}
}
Ok(())
}Clues
When batching, we pretty consistenly hit the error on at batch index 99. What a suspicious number!
If batch size is 100 and we hit the error on the 100th batch, we are hitting the error around request 10,000.
Specuation
If I had to guess, I'd say these api requests are passing throug a load balancer configured to kill the connection
after 10,000 request.
Potential Fixes (assuming speculation is correct)
- Remove the Stream ID cap on the load balancer serving the api (allow more than 10,000 request per stream.
- or, account for the limitation within aws client libraries, either gracefully handle the error, or explicitly
create a new connection on request number10,001.
1. seems ideal but would involve some beurocratic challenges I imagine.
Workaround
In the meantime, if you are another user encountering this error, you might want to retry:
fn do_retry<E, R>(err: &SdkError<E, R>) -> bool {
matches err {
SdkError::DispatchFailure(d) => d.is_io() || d.is_user(),
_ => false,
}
}
// `d.is_io()` covers initial error,
// `d.is_user()` covers the fallout, other request on the same connection suffer when the connection dropsI find that this error can be retried immediately, no delay, and the request succeds the second time.
However, you might be calling the same lambda extra times when you retry here.
Side-note
If load balancer is killing the connection after the http2 GOAWAY,
as is the default on aws load balancer,
then we might be wasting paid customer lambda requests. I'd wonder about in-progress
requests not being reported back to the user. Normally already-established h2 streams
are exepected to stay alive after GOAWAY. GOAWAY just prevents creation of new streams.
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
Current Behavior
Reproduction Steps
Possible Solution
No response
Additional Information/Context
No response
Version
├── aws-config v1.8.12
│ ├── aws-credential-types v1.2.11
│ │ ├── aws-smithy-async v1.2.8
│ │ ├── aws-smithy-runtime-api v1.11.0
│ │ │ ├── aws-smithy-async v1.2.8 (*)
│ │ │ ├── aws-smithy-types v1.4.0
│ │ ├── aws-smithy-types v1.4.0 (*)
│ ├── aws-runtime v1.5.18
│ │ ├── aws-credential-types v1.2.11 (*)
│ │ ├── aws-sigv4 v1.3.7
│ │ │ ├── aws-credential-types v1.2.11 (*)
│ │ │ ├── aws-smithy-eventstream v0.60.14
│ │ │ │ ├── aws-smithy-types v1.4.0 (*)
│ │ │ ├── aws-smithy-http v0.62.6
│ │ │ │ ├── aws-smithy-eventstream v0.60.14 (*)
│ │ │ │ ├── aws-smithy-runtime-api v1.11.0 (*)
│ │ │ │ ├── aws-smithy-types v1.4.0 (*)
│ │ │ ├── aws-smithy-runtime-api v1.11.0 (*)
│ │ │ ├── aws-smithy-types v1.4.0 (*)
│ │ ├── aws-smithy-async v1.2.8 (*)
│ │ ├── aws-smithy-eventstream v0.60.14 (*)
│ │ ├── aws-smithy-http v0.62.6 (*)
│ │ ├── aws-smithy-runtime v1.9.8
│ │ │ ├── aws-smithy-async v1.2.8 (*)
│ │ │ ├── aws-smithy-http v0.62.6 (*)
│ │ │ ├── aws-smithy-http-client v1.1.6
│ │ │ │ ├── aws-smithy-async v1.2.8 (*)
│ │ │ │ ├── aws-smithy-runtime-api v1.11.0 (*)
│ │ │ │ ├── aws-smithy-types v1.4.0 (*)
│ │ │ │ │ │ ├── aws-lc-rs v1.15.3
│ │ │ │ │ │ │ ├── aws-lc-sys v0.36.0
│ │ │ │ │ │ │ ├── aws-lc-rs v1.15.3 (*)
│ │ │ ├── aws-smithy-observability v0.2.0
│ │ │ │ └── aws-smithy-runtime-api v1.11.0 (*)
│ │ │ ├── aws-smithy-runtime-api v1.11.0 (*)
│ │ │ ├── aws-smithy-types v1.4.0 (*)
│ │ ├── aws-smithy-runtime-api v1.11.0 (*)
│ │ ├── aws-smithy-types v1.4.0 (*)
│ │ ├── aws-types v1.3.11
│ │ │ ├── aws-credential-types v1.2.11 (*)
│ │ │ ├── aws-smithy-async v1.2.8 (*)
│ │ │ ├── aws-smithy-runtime-api v1.11.0 (*)
│ │ │ ├── aws-smithy-types v1.4.0 (*)
│ ├── aws-sdk-sso v1.92.0
│ │ ├── aws-credential-types v1.2.11 (*)
│ │ ├── aws-runtime v1.5.18 (*)
│ │ ├── aws-smithy-async v1.2.8 (*)
│ │ ├── aws-smithy-http v0.62.6 (*)
│ │ ├── aws-smithy-json v0.61.9
│ │ │ └── aws-smithy-types v1.4.0 (*)
│ │ ├── aws-smithy-observability v0.2.0 (*)
│ │ ├── aws-smithy-runtime v1.9.8 (*)
│ │ ├── aws-smithy-runtime-api v1.11.0 (*)
│ │ ├── aws-smithy-types v1.4.0 (*)
│ │ ├── aws-types v1.3.11 (*)
│ ├── aws-sdk-ssooidc v1.94.0
│ │ ├── aws-credential-types v1.2.11 (*)
│ │ ├── aws-runtime v1.5.18 (*)
│ │ ├── aws-smithy-async v1.2.8 (*)
│ │ ├── aws-smithy-http v0.62.6 (*)
│ │ ├── aws-smithy-json v0.61.9 (*)
│ │ ├── aws-smithy-observability v0.2.0 (*)
│ │ ├── aws-smithy-runtime v1.9.8 (*)
│ │ ├── aws-smithy-runtime-api v1.11.0 (*)
│ │ ├── aws-smithy-types v1.4.0 (*)
│ │ ├── aws-types v1.3.11 (*)
│ ├── aws-sdk-sts v1.96.0
│ │ ├── aws-credential-types v1.2.11 (*)
│ │ ├── aws-runtime v1.5.18 (*)
│ │ ├── aws-smithy-async v1.2.8 (*)
│ │ ├── aws-smithy-http v0.62.6 (*)
│ │ ├── aws-smithy-json v0.61.9 (*)
│ │ ├── aws-smithy-observability v0.2.0 (*)
│ │ ├── aws-smithy-query v0.60.9
│ │ │ ├── aws-smithy-types v1.4.0 (*)
│ │ ├── aws-smithy-runtime v1.9.8 (*)
│ │ ├── aws-smithy-runtime-api v1.11.0 (*)
│ │ ├── aws-smithy-types v1.4.0 (*)
│ │ ├── aws-smithy-xml v0.60.13
│ │ ├── aws-types v1.3.11 (*)
│ ├── aws-smithy-async v1.2.8 (*)
│ ├── aws-smithy-http v0.62.6 (*)
│ ├── aws-smithy-json v0.61.9 (*)
│ ├── aws-smithy-runtime v1.9.8 (*)
│ ├── aws-smithy-runtime-api v1.11.0 (*)
│ ├── aws-smithy-types v1.4.0 (*)
│ ├── aws-types v1.3.11 (*)
├── aws-credential-types v1.2.11 (*)
├── aws-sdk-lambda v1.113.0
│ ├── aws-credential-types v1.2.11 (*)
│ ├── aws-runtime v1.5.18 (*)
│ ├── aws-smithy-async v1.2.8 (*)
│ ├── aws-smithy-eventstream v0.60.14 (*)
│ ├── aws-smithy-http v0.62.6 (*)
│ ├── aws-smithy-json v0.61.9 (*)
│ ├── aws-smithy-observability v0.2.0 (*)
│ ├── aws-smithy-runtime v1.9.8 (*)
│ ├── aws-smithy-runtime-api v1.11.0 (*)
│ ├── aws-smithy-types v1.4.0 (*)
│ ├── aws-types v1.3.11 (*)
├── aws-sdk-sts v1.96.0 (*)
├── aws-smithy-http-client v1.1.6 (*)
├── aws-smithy-runtime v1.9.8 (*)
├── aws-smithy-runtime-api v1.11.0 (*)
├── aws-smithy-types v1.4.0 (*)
Environment details (OS name and version, etc.)
Firecracker VM and recent linux x86_64
Logs
No response