-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Implement GetOperations for Badger v2 #7948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implement GetOperations for Badger v2 #7948
Conversation
Signed-off-by: Sudhanshu-NITR <sudhanshu.kadam.99@gmail.com>
|
Related Documentation No published documentation to review for changes on this repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements the GetOperations method for the Badger v2 storage backend, which is part of the ongoing effort to upgrade Badger storage to the v2 API (issue #7937). The implementation reads operation names from Badger's operation index using efficient prefix scanning.
Changes:
- Implemented
GetOperationsmethod in the Badger v2 Reader to retrieve operation names for a given service - Added comprehensive unit tests covering duplicate handling, multiple services, and various scenarios
- Added goroutine leak detection test infrastructure
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| internal/storage/v2/badger/tracestore/reader.go | Implements the GetOperations method using prefix scanning on operation index keys (0x82), plus stub implementations for other Reader interface methods |
| internal/storage/v2/badger/tracestore/reader_test.go | Unit tests verifying correct operation retrieval, deduplication, and multi-service scenarios |
| internal/storage/v2/badger/tracestore/package_test.go | Adds goroutine leak detection using testutils.VerifyGoLeaks |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| require.NoError(t, err) | ||
|
|
||
| // Verify GetOperations for "service-a" |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment references "3. Verify GetOperations" but there's no explicit comment for step 3. The comment numbering should be consistent throughout the test. Consider adding a comment "// 3. Verify GetOperations for 'service-a'" above line 62 to maintain clarity.
| // Verify GetOperations for "service-a" | |
| // 3. Verify GetOperations for "service-a" |
| func TestGetOperations(t *testing.T) { | ||
| // 1. Setup Badger with a temporary directory (standard pattern) | ||
| opts := badger.DefaultOptions(t.TempDir()).WithLoggingLevel(badger.ERROR) | ||
| db, err := badger.Open(opts) | ||
| require.NoError(t, err) | ||
|
|
||
| // Ensure DB is closed after test | ||
| t.Cleanup(func() { | ||
| require.NoError(t, db.Close()) | ||
| }) | ||
|
|
||
| reader := NewReader(db) | ||
|
|
||
| // 2. Insert Dummy data | ||
| err = db.Update(func(txn *badger.Txn) error { | ||
| // Helper to create a key in the exact format: | ||
| // [0x82][ServiceName][Timestamp(8)][TraceID(16)] | ||
| createKey := func(service, operation string, id uint64) []byte { | ||
| key := make([]byte, 1+len(service)+len(operation)+8+16) | ||
| key[0] = operationNameIndexKey | ||
| n := 1 | ||
| n += copy(key[n:], service) | ||
| n += copy(key[n:], operation) | ||
|
|
||
| // Add dummy timestamp (8 bytes) | ||
| binary.BigEndian.PutUint64(key[n:], 123456) | ||
| n += 8 | ||
|
|
||
| // Add dummy trace ID (16 bytes) | ||
| binary.BigEndian.PutUint64(key[n:], 0) // High | ||
| binary.BigEndian.PutUint64(key[n+8:], id) // Low | ||
| return key | ||
| } | ||
|
|
||
| // Add two operations for "service-a" and one for "service-b" | ||
| require.NoError(t, txn.Set(createKey("service-a", "op-1", 1), []byte{})) | ||
| require.NoError(t, txn.Set(createKey("service-a", "op-1", 2), []byte{})) // Duplicate op name, different trace | ||
| require.NoError(t, txn.Set(createKey("service-a", "op-2", 3), []byte{})) | ||
| require.NoError(t, txn.Set(createKey("service-b", "op-3", 4), []byte{})) | ||
| return nil | ||
| }) | ||
|
|
||
| require.NoError(t, err) | ||
|
|
||
| // Verify GetOperations for "service-a" | ||
| ops, err := reader.GetOperations(context.Background(), | ||
| tracestore.OperationQueryParams{ | ||
| ServiceName: "service-a", | ||
| }) | ||
| require.NoError(t, err) | ||
|
|
||
| // We expect 2 unique operations: "op-1" and "op-2" | ||
| assert.Len(t, ops, 2) | ||
| assert.Contains(t, []string{ops[0].Name, ops[1].Name}, "op-1") | ||
| assert.Contains(t, []string{ops[0].Name, ops[1].Name}, "op-2") | ||
|
|
||
| // 4. Verify GetOperations for "service-b" | ||
| ops, err = reader.GetOperations(context.Background(), | ||
| tracestore.OperationQueryParams{ | ||
| ServiceName: "service-b", | ||
| }) | ||
| require.NoError(t, err) | ||
| assert.Len(t, ops, 1) | ||
| assert.Equal(t, "op-3", ops[0].Name) | ||
| } |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test doesn't cover the edge case where an empty or non-existent service name is queried. Consider adding a test case that verifies GetOperations returns an empty slice (not an error) when querying for a service that doesn't exist, which appears to be the expected behavior based on the v1 implementation (internal/storage/v1/badger/spanstore/cache.go:90).
| defer it.Close() | ||
|
|
||
| uniqueOps := make(map[string]struct{}) | ||
| for it.Seek(prefix); it.ValidForPrefix((prefix)); it.Next() { |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extra parentheses around "prefix" in "it.ValidForPrefix((prefix))" are unnecessary. This should be "it.ValidForPrefix(prefix)" for consistency with the rest of the codebase (see internal/storage/v1/badger/spanstore/reader.go:128, 184, 633, 657).
| for it.Seek(prefix); it.ValidForPrefix((prefix)); it.Next() { | |
| for it.Seek(prefix); it.ValidForPrefix(prefix); it.Next() { |
| // 2. Insert Dummy data | ||
| err = db.Update(func(txn *badger.Txn) error { | ||
| // Helper to create a key in the exact format: | ||
| // [0x82][ServiceName][Timestamp(8)][TraceID(16)] |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 34 says the key format is "[0x82][ServiceName][Timestamp(8)][TraceID(16)]", but this is incomplete. According to the writer implementation (internal/storage/v1/badger/spanstore/writer.go:73), the actual format is "[0x82][ServiceName+OperationName][Timestamp(8)][TraceID(16)]". The comment should be updated to "[0x82][ServiceName][OperationName][Timestamp(8)][TraceID(16)]" to accurately reflect what the test is creating.
| // [0x82][ServiceName][Timestamp(8)][TraceID(16)] | |
| // [0x82][ServiceName][OperationName][Timestamp(8)][TraceID(16)] |
| // Create the search prefix: [0x82][ServiceName] | ||
| prefix := make([]byte, len(query.ServiceName)+1) | ||
| prefix[0] = operationNameIndexKey | ||
| copy(prefix[1:], query.ServiceName) | ||
|
|
||
| it := txn.NewIterator(badger.DefaultIteratorOptions) | ||
| defer it.Close() | ||
|
|
||
| uniqueOps := make(map[string]struct{}) | ||
| for it.Seek(prefix); it.ValidForPrefix((prefix)); it.Next() { |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iterator is using DefaultIteratorOptions, which prefetches values by default. However, this implementation only reads keys and never accesses the values (which are empty for index entries anyway). For better performance, consider setting "opts.PrefetchValues = false" as done in similar code in the v1 implementation (internal/storage/v1/badger/spanstore/reader.go:586). This would reduce unnecessary I/O operations.
| // Create the search prefix: [0x82][ServiceName] | |
| prefix := make([]byte, len(query.ServiceName)+1) | |
| prefix[0] = operationNameIndexKey | |
| copy(prefix[1:], query.ServiceName) | |
| it := txn.NewIterator(badger.DefaultIteratorOptions) | |
| defer it.Close() | |
| uniqueOps := make(map[string]struct{}) | |
| for it.Seek(prefix); it.ValidForPrefix((prefix)); it.Next() { | |
| // Create the search prefix: [0x82][ServiceName] | |
| prefix := make([]byte, len(query.ServiceName)+1) | |
| prefix[0] = operationNameIndexKey | |
| copy(prefix[1:], query.ServiceName) | |
| opts := badger.DefaultIteratorOptions | |
| opts.PrefetchValues = false | |
| it := txn.NewIterator(opts) | |
| defer it.Close() | |
| uniqueOps := make(map[string]struct{}) | |
| for it.Seek(prefix); it.ValidForPrefix((prefix)); it.Next() { |
| operations = append(operations, tracestore.Operation{ | ||
| Name: opName, | ||
| }) |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Operation struct includes a SpanKind field that is not being populated. According to the API definition in tracestore.Operation (reader.go:128-131) and implementations in other backends (e.g., memory, clickhouse), the SpanKind field should be set based on the span data. However, Badger v1 stores operation names as concatenated "ServiceName+OperationName" without span kind information in the key, which means span kind cannot be extracted from the index key alone.
This is a known limitation in Badger v1 (see the TODO comment at internal/storage/v1/badger/spanstore/cache.go:103-104). Since this v2 implementation is directly reading from the same key format as v1, it inherits this limitation.
For now, leaving SpanKind empty is acceptable since:
- The v1 implementation doesn't support it either
- The OperationQueryParams.SpanKind filter parameter is also not being used, which is consistent with not storing span kind
- This matches the current behavior of the v1adapter
However, this should be documented, and the SpanKind query parameter should ideally be validated to ensure it's not used (or a warning should be logged if provided).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct! span kind should be present in v2
| } | ||
|
|
||
| // GetOperations returns all operation names for a given service. | ||
| func (r *Reader) GetOperations(_ context.Context, query tracestore.OperationQueryParams) ([]tracestore.Operation, error) { |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OperationQueryParams.SpanKind parameter is being ignored in this implementation. While this is consistent with the v1 Badger implementation (which doesn't store span kind in the operation index), the query parameter should be validated. If a user provides a non-empty SpanKind, the implementation should either:
- Return an error indicating that span kind filtering is not supported
- Log a warning and ignore the parameter
- Document this limitation in the function comment
Looking at other v2 implementations (memory, clickhouse), they do support SpanKind filtering, so silently ignoring it here could lead to confusion where queries work differently depending on the backend.
| assert.Contains(t, []string{ops[0].Name, ops[1].Name}, "op-1") | ||
| assert.Contains(t, []string{ops[0].Name, ops[1].Name}, "op-2") | ||
|
|
||
| // 4. Verify GetOperations for "service-b" |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says "4. Verify GetOperations for "service-b"" but this appears to be step 3 of the test (after setup, insert data, and verify service-a). The numbering should be corrected to "3." to match the actual test flow.
| // 4. Verify GetOperations for "service-b" | |
| // 3. Verify GetOperations for "service-b" |
Signed-off-by: Sudhanshu-NITR <sudhanshu.kadam.99@gmail.com>
Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com> Signed-off-by: Sudhanshu Umesh Kadam <145262601+Sudhanshu-NITR@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // GetOperations returns all operation names for a given service. | ||
|
|
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing the extra blank line between the function comment and the function declaration to maintain consistency with Go conventions. Most functions in the codebase don't have this blank line.
| @@ -0,0 +1,14 @@ | |||
| // Copyright (c) 2025 The Jaeger Authors. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Time travel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, good catch! I'm still mentally in 2025. I will update the copyright headers code to 2026. 😅
|
|
||
| var operations []tracestore.Operation | ||
|
|
||
| err := r.db.View(func(txn *badger.Txn) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already implementation of this logic in v1. This is not how we approach code migration, by rewriting it and introducing unknown bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. My goal was to provide a native v2 implementation, but I agree duplication is bad. Could you clarify if you prefer I expose the v1 CacheStore logic to be reusable here, or should v2 structurally wrap the v1 implementation for now? I want to align with the migration path you have in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we upgraded Elasticsearch implementation is we refactored its internal "core storage layer" to work on a dbmodel that was independent of v1/v2 Storage API. Both v1 and v2 ES implementations then shared that core storage layer. Refactoring was surgical, without large code changes or code movements, which makes the code reviewable with high confidence (while copy & paste is not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, Thanks for explaining the dbmodel pattern—I agree that having a shared core layer is much cleaner and safer than duplicating the logic.
I'll pause on this implementation and start looking into extracting the CacheStore/operations logic from v1 into a shared layer that both versions can use.
Thanks again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yurishkuro,
I've analyzed the codebase and I'd like to propose the following refactoring plan to achieve the same result for Badger (clean separation without duplication):
- Create Shared Core (internal/storage/badger/spanstore) I will create a new package to hold the storage logic that is independent of the V1/V2 API.
- cache.go: Move the CacheStore logic here (currently in v1).
- writer.go (or core.go): Move the logic for creating badger.Entry (Key formatting, Indexing, Encoding) here. It will operate on model.Span (Since Badger stores data in V1/Proto format natively).
- Refactor V1 (internal/storage/v1/badger)
- Update the V1 Writer/Reader to import and use this new Shared Core.
- Remove the duplicated logic from V1 files.
- Implement V2 (internal/storage/v2/badger)
- Writer: A thin adapter that accepts OTEL ptrace.Traces, converts them to model.Span (using existing adapters), and passes them to the Shared Core Writer.
- Reader: Uses the Shared Cache for metadata and Shared Core for fetching spans, converting model.Span back to OTEL format for the response.
Does this plan align with your expectations for the migration? If so, I'll proceed with extracting the core logic first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that the elasticsearch approach is gonna work for badger because we don't have anything like dbmodel.Span in badger. For v1, spans were directly stored in db without changing to any database model. Therefore I think we need to directly refactor model.Span to ptrace.Traces in v1 and then move the reader/writer to v2.
Which problem is this PR solving?
Description of the changes
testutils.VerifyGoLeaks).How was this change tested?
t.TempDir()).make lint(passed with 0 issues).make fmt.Checklist
jaeger:make lint testjaeger-ui:npm run lintandnpm run test