Skip to content

remove query comments #560#749

Draft
dev-lew wants to merge 9 commits intopgdogdev:mainfrom
dev-lew:dev-lew/Remove-query-comments-#560
Draft

remove query comments #560#749
dev-lew wants to merge 9 commits intopgdogdev:mainfrom
dev-lew:dev-lew/Remove-query-comments-#560

Conversation

@dev-lew
Copy link

@dev-lew dev-lew commented Feb 4, 2026

This PR addresses #560

It adds the remove_comments function, which will remove all comments from a query string except those specified by 0 or more regexes.

If there is a cache miss, we do another cache lookup after removing comments except for those used for pgdog metadata (pgdog_shard, etc) in the hope that we can increase cache hit rate.

Two associated helper functions were added as well.

One minor change was made to existing code in comments.rs to export the compiled regexes.

dev-lew added 9 commits January 30, 2026 15:16
The previous approach is incorrect because we need a
str representing the query instead of just a ScanResult.

This approach reconstructs the query with string slicing by using
the indices where comments are present as returned by the scanner.
This version will accept a list of regexs to keep
Add ability to keep pgdog metadata comments anywhere in the string
Not preserving these would mean queries with comments would
never match their commentless counterparts
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


dev-lew seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

.iter()
.any(|st| st.token == Token::CComment as i32 || st.token == Token::SqlComment as i32))
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that this function is over-generalized and may not need the except parameter.

}
}

if comment::has_comments(query.query(), ctx.sharding_schema.query_parser_engine)? {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is here because we might as well short circuit if there are no comments. There is no need to tokenize the query and iterate through it again for no reason.

let start = st.start as usize;
let end = st.end as usize;

out.push_str(&query[cursor..start]);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code involving cursor keeps non token characters (between and at the end of all tokens) like spaces to preserve the original query. We don't want to do anything extra like normalize it or something.


pub fn has_comments(query: &str, engine: QueryParserEngine) -> Result<bool, Error> {
let result = match engine {
QueryParserEngine::PgQueryProtobuf => scan(query),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small concern I have here is we are calling scan multiple times. It's actually a pretty expensive call, because we have to go all the way to the Postgres parser, so we try to minimize it as much as possible (i.e. that's why we have the cache). It would be great if we could do this all inside fn comment, somehow, calling scan only once. Maybe we need to move comment detection a little higher in the call stack?

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 86.95652% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...dog/src/frontend/router/parser/cache/cache_impl.rs 62.50% 6 Missing ⚠️
pgdog/src/frontend/router/parser/comment.rs 94.33% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants