Skip to content

feat: Add SQL Query Summary - Tokenizer, Parser, Cache#1918

Open
hannahramadan wants to merge 39 commits intoopen-telemetry:mainfrom
hannahramadan:db_query_summary
Open

feat: Add SQL Query Summary - Tokenizer, Parser, Cache#1918
hannahramadan wants to merge 39 commits intoopen-telemetry:mainfrom
hannahramadan:db_query_summary

Conversation

@hannahramadan
Copy link
Member

@hannahramadan hannahramadan commented Jan 6, 2026

This PR adds the framework for a query summary generator, including a tokenizer, parser, and cache. It is meant to generate the db.query.summary attribute for database spans.

Query Pipeline: SQL Query → Cache Lookup → Tokenizer → FSM Parser → Summary

Fast Path & Caching

  • Tiered Processing: High-frequency CRUD queries (e.g., SELECT * FROM table) use pre-compiled regex to bypass tokenization.
  • LRU Cache: Thread-safe caching (default size: 1000) prevents redundant processing of identical SQL fingerprints.

Tokenizer

  • Breaks SQL into typed tokens (:keyword, :identifier, :operator) using StringScanner.
  • Filters sensitive data like string literals, comments, numeric values
  • Optimization: Uses a hash-based keyword lookup and frequency-ordered scanning to minimize

Parser

  • 3 States: PARSING_STATE (verb hunting), EXPECT_COLLECTION_STATE (table gathering), and DDL_BODY_STATE (skipping procedure/trigger internal logic).
  • Manages implicit aliases, JOIN lists, UNION consolidation, and IF EXISTS DDL patterns.

Summary Generation

  • Consolidates operations and table names: "SELECT users profiles".
  • Truncates at 255 characters
  • Automatically removes duplicate table names in complex joins or unions

Resources:

@hannahramadan hannahramadan changed the title Add SQL Query Summary - Tokenizer, Parser, Cache Feat: Add SQL Query Summary - Tokenizer, Parser, Cache Jan 6, 2026
Parser.build_summary_from_tokens(tokens)
end
rescue StandardError
'UNKNOWN'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db.query.summary should be used as a database span name, but if the summary is not available, then the name should be {db.operation.name} {target}. We should keep that in mind when changing db instrumentation. If the return value of the summary is UNKNOWN, then we have more work to do to create the span name.

@hannahramadan hannahramadan marked this pull request as ready for review January 17, 2026 00:24
@hannahramadan hannahramadan changed the title Feat: Add SQL Query Summary - Tokenizer, Parser, Cache feat: Add SQL Query Summary - Tokenizer, Parser, Cache Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant