Skip to content

Conversation

@seth-epps
Copy link

Fixes #8682

Expose a new experimental API for registering a telemetry label callback function.

Some clients may not be instrumented with opentelemetry which restricts valuable information from being propagated to stats handlers. This gives clients the ability to collect otel labels by registering a label callback on the context and collecting the information themselves in their stats handlers.

RELEASE NOTES:

  • experimental/stats: Expose Telemetry Label Callback

@seth-epps seth-epps force-pushed the i-8682/expose-recorder branch from 09e6612 to 6dda7c7 Compare February 2, 2026 22:27
@arjan-bal arjan-bal requested a review from mbissa February 3, 2026 07:44
@arjan-bal arjan-bal added the Type: Feature New features or improvements in behavior label Feb 3, 2026
@arjan-bal arjan-bal added this to the 1.80 Release milestone Feb 3, 2026
@codecov
Copy link

codecov bot commented Feb 3, 2026

Codecov Report

❌ Patch coverage is 88.88889% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.25%. Comparing base (49e224f) to head (fe8c628).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
experimental/stats/telemetry.go 83.33% 1 Missing and 1 partial ⚠️
internal/xds/balancer/clusterimpl/picker.go 87.50% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8877      +/-   ##
==========================================
+ Coverage   83.15%   83.25%   +0.10%     
==========================================
  Files         414      415       +1     
  Lines       32751    32825      +74     
==========================================
+ Hits        27235    27330      +95     
+ Misses       4096     4076      -20     
+ Partials     1420     1419       -1     
Files with missing lines Coverage Δ
internal/stats/labels.go 92.30% <100.00%> (-7.70%) ⬇️
internal/xds/balancer/clusterimpl/picker.go 76.82% <87.50%> (-18.30%) ⬇️
experimental/stats/telemetry.go 83.33% <83.33%> (ø)

... and 25 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

}

estats.ExecuteTelemetryLabelCallback(info.Ctx, "grpc.lb.locality", xdsinternal.LocalityString(lID))
estats.ExecuteTelemetryLabelCallback(info.Ctx, "grpc.lb.backend_service", d.clusterName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this solves your use case, and this looks good. But this forces the LB policy to know about the callbacks and can get fragmented over a period of time when labels are updated from multiple places.

Can we instead centralize this? When the central function to update telemetry state is called, it should be responsible for triggering any registered callbacks. This decouples the LB logic from the telemetry hook logic and ensures consistency across the library.

Copy link
Author

@seth-epps seth-epps Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that centralizing this would be better, but I was struggling with how we would support it without a much more complex change / something that doesn't interfere with the existing internal API since these are only currently set if the open telemetry labels are initialized (which requires otel instrumentation)

I wonder if we should encapsulate the telemetry label collection into it's own component and then inject it here. That way we establish the pattern for label collection in a single place and all future cases would just inject the same struct? I think a less invasive approach could be to just use a single function...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a slightly different approach in my change set that basically moves the update to the existing internal/stats package and accepts the full map to merge. This felt like the least intrusive way to make the change and it side-steps the Key/Value pairs as variadic arguments to the execute function.

fe8c628

Let me know what you think!

tracker[key] = value
},
additionalLabels: map[string]string{"grpc.lb.backend_service": "grpc.lb.backend_service_other_val"},
wantLabels: map[string]string{"grpc.lb.backend_service": "grpc.lb.backend_service_other_val", "grpc.lb.locality_val": "grpc.lb.locality_val"},
Copy link
Contributor

@mbissa mbissa Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should the key be "grpc.lb.locality" ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed fe8c628

// TestTelemetryLabels tests registering a callback function with the context and
// the effects of executing the callback on a local label state tracker. Each test
// case constructs a new context with the provided callback registered.
func (s) TestTelemetryLabels(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some end to end tests which test for back end service label values in different scenarios, could you modify them to validate that your scenario works end to end.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in fe8c628

// key and value. If no callback is registered it does nothing.
//
// If the registered callback panics it will be swallowed and logged
func ExecuteTelemetryLabelCallback(ctx context.Context, key, value string) {
Copy link
Contributor

@mbissa mbissa Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would it make more sense to have
func ExecuteTelemetryLabelCallback(ctx context.Context, kvs ...string)
so that we can batch multiple key/value callbacks in one loop and have the defer only once?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good suggestion

I wonder if there's value in encoding the data into a struct to make it a touch more extensible later on. Something along these lines

type Label struct {
	Key string
	Value string
}

func ExecuteTelemetryLabelCallback(ctx context.Context, labels ...Label) {
	// ...unpack and callback
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the other comment I took a slightly different approach when looking at how we merge the labels into the stats label key by passing an updates map
fe8c628

Let me know if this feels more ergonomic

type telemetryLabelCallbackKey struct{}

// WithTelemetryLabelCallback registers a callback function that is executed whenever
// telemetry labels will be updated. This does _not_ require opentelemetry instrumentation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should may be add a warning in the doc string that this callback is intended to execute in the rpc hotpath and users need to be careful of the performance impact when eventually ExecuteTelemetryLabelCallback is invoked.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in fe8c628

@mbissa mbissa assigned seth-epps and unassigned mbissa Feb 5, 2026
Copy link
Contributor

@mbissa mbissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sending for second review.

@mbissa mbissa requested review from arjan-bal and dfawley February 6, 2026 08:37
@mbissa mbissa assigned arjan-bal and dfawley and unassigned seth-epps Feb 6, 2026
@arjan-bal arjan-bal assigned mbissa and unassigned dfawley and arjan-bal Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Type: Feature New features or improvements in behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose the labels package to use with custom stats handler

4 participants