-
Notifications
You must be signed in to change notification settings - Fork 4.6k
experimental/stats: Expose Telemetry Label Callback #8877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
09e6612 to
6dda7c7
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8877 +/- ##
==========================================
+ Coverage 83.15% 83.25% +0.10%
==========================================
Files 414 415 +1
Lines 32751 32825 +74
==========================================
+ Hits 27235 27330 +95
+ Misses 4096 4076 -20
+ Partials 1420 1419 -1
🚀 New features to boost your workflow:
|
| } | ||
|
|
||
| estats.ExecuteTelemetryLabelCallback(info.Ctx, "grpc.lb.locality", xdsinternal.LocalityString(lID)) | ||
| estats.ExecuteTelemetryLabelCallback(info.Ctx, "grpc.lb.backend_service", d.clusterName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this solves your use case, and this looks good. But this forces the LB policy to know about the callbacks and can get fragmented over a period of time when labels are updated from multiple places.
Can we instead centralize this? When the central function to update telemetry state is called, it should be responsible for triggering any registered callbacks. This decouples the LB logic from the telemetry hook logic and ensures consistency across the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that centralizing this would be better, but I was struggling with how we would support it without a much more complex change / something that doesn't interfere with the existing internal API since these are only currently set if the open telemetry labels are initialized (which requires otel instrumentation)
I wonder if we should encapsulate the telemetry label collection into it's own component and then inject it here. That way we establish the pattern for label collection in a single place and all future cases would just inject the same struct? I think a less invasive approach could be to just use a single function...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a slightly different approach in my change set that basically moves the update to the existing internal/stats package and accepts the full map to merge. This felt like the least intrusive way to make the change and it side-steps the Key/Value pairs as variadic arguments to the execute function.
Let me know what you think!
experimental/stats/telemetry_test.go
Outdated
| tracker[key] = value | ||
| }, | ||
| additionalLabels: map[string]string{"grpc.lb.backend_service": "grpc.lb.backend_service_other_val"}, | ||
| wantLabels: map[string]string{"grpc.lb.backend_service": "grpc.lb.backend_service_other_val", "grpc.lb.locality_val": "grpc.lb.locality_val"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should the key be "grpc.lb.locality" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed fe8c628
| // TestTelemetryLabels tests registering a callback function with the context and | ||
| // the effects of executing the callback on a local label state tracker. Each test | ||
| // case constructs a new context with the provided callback registered. | ||
| func (s) TestTelemetryLabels(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some end to end tests which test for back end service label values in different scenarios, could you modify them to validate that your scenario works end to end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in fe8c628
experimental/stats/telemetry.go
Outdated
| // key and value. If no callback is registered it does nothing. | ||
| // | ||
| // If the registered callback panics it will be swallowed and logged | ||
| func ExecuteTelemetryLabelCallback(ctx context.Context, key, value string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would it make more sense to have
func ExecuteTelemetryLabelCallback(ctx context.Context, kvs ...string)
so that we can batch multiple key/value callbacks in one loop and have the defer only once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good suggestion
I wonder if there's value in encoding the data into a struct to make it a touch more extensible later on. Something along these lines
type Label struct {
Key string
Value string
}
func ExecuteTelemetryLabelCallback(ctx context.Context, labels ...Label) {
// ...unpack and callback
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the other comment I took a slightly different approach when looking at how we merge the labels into the stats label key by passing an updates map
fe8c628
Let me know if this feels more ergonomic
| type telemetryLabelCallbackKey struct{} | ||
|
|
||
| // WithTelemetryLabelCallback registers a callback function that is executed whenever | ||
| // telemetry labels will be updated. This does _not_ require opentelemetry instrumentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should may be add a warning in the doc string that this callback is intended to execute in the rpc hotpath and users need to be careful of the performance impact when eventually ExecuteTelemetryLabelCallback is invoked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in fe8c628
mbissa
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, sending for second review.
Fixes #8682
Expose a new experimental API for registering a telemetry label callback function.
Some clients may not be instrumented with opentelemetry which restricts valuable information from being propagated to stats handlers. This gives clients the ability to collect otel labels by registering a label callback on the context and collecting the information themselves in their stats handlers.
RELEASE NOTES: