-
Notifications
You must be signed in to change notification settings - Fork 4.6k
stats: process RPC stats/tracing in health and ORCA producers only if a handler is configured #8874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8874 +/- ##
==========================================
- Coverage 83.13% 82.48% -0.66%
==========================================
Files 414 414
Lines 32821 32824 +3
==========================================
- Hits 27287 27075 -212
- Misses 4098 4104 +6
- Partials 1436 1645 +209
🚀 New features to boost your workflow:
|
|
Curious, how did you run into this? Is there an issue for this? |
Can you please elaborate on this? My understanding is that when the I think it would be good to open an issue that documents all the details and the approach taken (and why this is the preferred approach, if there are other approached available at all). Thanks. |
| } | ||
|
|
||
| s, err := as.transport.NewStream(as.ctx, as.callHdr) | ||
| s, err := as.transport.NewStream(as.ctx, as.callHdr, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this how the health and ORCA streams get a nil stats handler? We want to eliminate this duplication between the regular client stream and the non-retry client stream at some point. We tried to prioritize that work for Q1, but it didn't happen. This has been a maintenance burden for a while now, and a source of hard to find bugs.
|
Just did a rebase with master to fast forward my commit and address the vet-proto failure. |
dde0677 to
734cd3c
Compare
|
Just FYI, no action needed. I was considering whether we'd need to pass grpc-go/stats/opentelemetry/client_metrics.go Lines 192 to 199 in d7b3f93
Consequently, this change should also eliminate the error log from the OTel metrics handler as well. As before, per-call metrics will not be recorded for RPCs initiated by producers. |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors how RPC stats and tracing are handled to resolve an issue with repetitive error logging for health and ORCA producers. The change introduces a per-stream stats.Handler, moving away from a transport-level handler for RPC events. This allows streams like those for health checks, which don't have stats configured, to bypass stats processing, thus preventing the erroneous log messages. The implementation is clean and effective: the stats.Handler is now part of the ClientStream struct and is passed during stream creation. The NewStream function signature has been updated across the transport layer, and call sites have been correctly modified. For regular RPCs, the existing stats handler is plumbed through, while for internal streams like health checks, nil is passed, correctly disabling stats for them. The changes are logical and well-executed. This is a good improvement to reduce log noise and make stats handling more precise.
easwars
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add a test that verifies that the offending log is not seen for health/ORCA streams?
| headerChanClosed uint32 // set when headerChan is closed. Used to avoid closing headerChan multiple times. | ||
| bytesReceived atomic.Bool // indicates whether any bytes have been received on this stream | ||
| unprocessed atomic.Bool // set if the server sends a refused stream or GOAWAY including this stream | ||
| statsHandler stats.Handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add a short trailing comment explaining when this can be nil? Thanks.
| } | ||
|
|
||
| s, err := as.transport.NewStream(as.ctx, as.callHdr) | ||
| s, err := as.transport.NewStream(as.ctx, as.callHdr, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add a comment here as to why we are setting it to nil here, so that future readers dont have to second guess themselves.
|
|
||
| // NewStream creates a Stream for an RPC. | ||
| NewStream(ctx context.Context, callHdr *CallHdr) (*ClientStream, error) | ||
| NewStream(ctx context.Context, callHdr *CallHdr, handler stats.Handler) (*ClientStream, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that a stats.Handler field is stored in the transport now. Is it required anymore at all given that the stats handler is passed at stream creation time?
This resolves an issue where below log statement keeps printing repeatedly:
"ERROR: [otel-plugin] ctx passed into client side stats handler metrics event handling has no client attempt data present".
When new stream is created for health/orca producers, stats and tracing is not setup. However, this fact is ignored during RPC and an error logs is printed to denote that stats cannot be handled. We will enable stream to have its own reference to the stats handler and only process per RPC implementation when it is present (like in case of regular data streams).
Internal issue: b/385685802
RELEASE NOTES: