Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions experimental/stats/telemetry.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
/*
*
* Copyright 2026 gRPC authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/

package stats

import (
"context"

"google.golang.org/grpc/grpclog"
)

// LabelCallback is a function that is executed when telemetry
// label keys are updated
type LabelCallback func(string, string)

type telemetryLabelCallbackKey struct{}

// WithTelemetryLabelCallback registers a callback function that is executed whenever
// telemetry labels will be updated. This does _not_ require opentelemetry instrumentation
// to be configured on the client or server.
func WithTelemetryLabelCallback(ctx context.Context, callback LabelCallback) context.Context {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithTelemetryLabelCallback currently overwrites any existing callback in the context. We should change it so that it supports chaining (i.e., fetching the existing callback and returning a new one that calls both) to ensure compatibility across multiple stats handlers

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 29bc885

if callback == nil {
return ctx
}
return context.WithValue(ctx, telemetryLabelCallbackKey{}, callback)
}

// ExecuteTelemetryLabelCallback runs the registered callback on the context with the provided
// key and value. If no callback is registered it does nothing.
//
// If the registered callback panics it will be swallowed and logged
func ExecuteTelemetryLabelCallback(ctx context.Context, key, value string) {
Copy link
Contributor

@mbissa mbissa Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would it make more sense to have
func ExecuteTelemetryLabelCallback(ctx context.Context, kvs ...string)
so that we can batch multiple key/value callbacks in one loop and have the defer only once?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good suggestion

I wonder if there's value in encoding the data into a struct to make it a touch more extensible later on. Something along these lines

type Label struct {
	Key string
	Value string
}

func ExecuteTelemetryLabelCallback(ctx context.Context, labels ...Label) {
	// ...unpack and callback
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the other comment I took a slightly different approach when looking at how we merge the labels into the stats label key by passing an updates map
fe8c628

Let me know if this feels more ergonomic

if ctx == nil {
return
}
if callback, ok := ctx.Value(telemetryLabelCallbackKey{}).(LabelCallback); ok {
defer func() {
if r := recover(); r != nil {
grpclog.Component("experimental-stats").Warningf("LabelCallback panicked: %v", r)
}
}()
callback(key, value)
}
}
87 changes: 87 additions & 0 deletions experimental/stats/telemetry_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
/*
*
* Copyright 2026 gRPC authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/

package stats

import (
"context"
"testing"
"time"

"github.com/google/go-cmp/cmp"
)

// TestTelemetryLabels tests registering a callback function with the context and
// the effects of executing the callback on a local label state tracker. Each test
// case constructs a new context with the provided callback registered.
func (s) TestTelemetryLabels(t *testing.T) {
commonLabelValues := map[string]string{"grpc.lb.backend_service": "grpc.lb.backend_service_val", "grpc.lb.locality_val": "grpc.lb.locality_val"}
tracker := map[string]string{}

tests := map[string]struct {
callback func(string, string)
additionalLabels map[string]string
wantLabels map[string]string
}{
"NilCallback": {
callback: nil,
wantLabels: map[string]string{},
},
"NoOPCallback": {
callback: func(string, string) {},
wantLabels: map[string]string{},
},
"PanicCallback": {
callback: func(string, string) { panic("intentional panic") },
wantLabels: map[string]string{},
},
"MutatingCallback": {
callback: func(key, value string) {
tracker[key] = value
},
wantLabels: map[string]string{"grpc.lb.backend_service": "grpc.lb.backend_service_val", "grpc.lb.locality_val": "grpc.lb.locality_val"},
},
"OverrideLabelsWithCallback": {
callback: func(key, value string) {
tracker[key] = value
},
additionalLabels: map[string]string{"grpc.lb.backend_service": "grpc.lb.backend_service_other_val"},
wantLabels: map[string]string{"grpc.lb.backend_service": "grpc.lb.backend_service_other_val", "grpc.lb.locality_val": "grpc.lb.locality_val"},
Copy link
Contributor

@mbissa mbissa Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should the key be "grpc.lb.locality" ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed fe8c628

},
}
for name, test := range tests {
t.Run(name, func(t *testing.T) {
// resest the tracker at the end of every test
ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
t.Cleanup(func() {
tracker = map[string]string{}
cancel()
})
ctx = WithTelemetryLabelCallback(ctx, test.callback)
for k, v := range commonLabelValues {
ExecuteTelemetryLabelCallback(ctx, k, v)
}
for k, v := range test.additionalLabels {
ExecuteTelemetryLabelCallback(ctx, k, v)
}
if diff := cmp.Diff(tracker, test.wantLabels); diff != "" {
t.Fatalf("tracked labels did not match expcted values (-got, +want): %v", diff)
}
})
}
}
3 changes: 3 additions & 0 deletions internal/xds/balancer/clusterimpl/picker.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import (
"google.golang.org/grpc/balancer"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/connectivity"
estats "google.golang.org/grpc/experimental/stats"
"google.golang.org/grpc/internal/stats"
"google.golang.org/grpc/internal/wrr"
xdsinternal "google.golang.org/grpc/internal/xds"
Expand Down Expand Up @@ -164,6 +165,8 @@ func (d *picker) Pick(info balancer.PickInfo) (balancer.PickResult, error) {
return pr, err
}

estats.ExecuteTelemetryLabelCallback(info.Ctx, "grpc.lb.locality", xdsinternal.LocalityString(lID))
estats.ExecuteTelemetryLabelCallback(info.Ctx, "grpc.lb.backend_service", d.clusterName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this solves your use case, and this looks good. But this forces the LB policy to know about the callbacks and can get fragmented over a period of time when labels are updated from multiple places.

Can we instead centralize this? When the central function to update telemetry state is called, it should be responsible for triggering any registered callbacks. This decouples the LB logic from the telemetry hook logic and ensures consistency across the library.

Copy link
Author

@seth-epps seth-epps Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that centralizing this would be better, but I was struggling with how we would support it without a much more complex change / something that doesn't interfere with the existing internal API since these are only currently set if the open telemetry labels are initialized (which requires otel instrumentation)

I wonder if we should encapsulate the telemetry label collection into it's own component and then inject it here. That way we establish the pattern for label collection in a single place and all future cases would just inject the same struct? I think a less invasive approach could be to just use a single function...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a slightly different approach in my change set that basically moves the update to the existing internal/stats package and accepts the full map to merge. This felt like the least intrusive way to make the change and it side-steps the Key/Value pairs as variadic arguments to the execute function.

fe8c628

Let me know what you think!

if labels != nil {
labels["grpc.lb.locality"] = xdsinternal.LocalityString(lID)
labels["grpc.lb.backend_service"] = d.clusterName
Expand Down