fix(docScoring): Add boost for nativeRank score#1221
Conversation
Summary of ChangesHello @naSim087, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the document scoring mechanism by introducing application-specific boosting for nativeRank scores. This change allows for more precise control over search relevance, ensuring that documents from particular applications receive appropriate prioritization when queries are scoped. The modification addresses the need for search to function correctly when YQL clauses are specific to individual applications. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughThis PR introduces app-specific ranking boosts across four Vespa schema files. Each schema adds a boost query parameter, app-detection helper function, and conditional multiplier in combined_nativeRank to apply boost when the app type matches (Slack, Google Calendar, Google Drive, or Gmail respectively). Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to boost document scores based on the application source, which is a good improvement for scoped searches. The implementation is consistent across the different Vespa schemas. My review focuses on improving the code's readability and conciseness. I've suggested simplifying boolean expressions and improving the formatting of complex ranking functions for better maintainability. These changes are minor but will enhance the overall code quality.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/vespa/schemas/file.sd (1)
143-150: Clarify handling of unsetquery(fileBoost)inn()functionThe
query(fileBoost) doubleinput (line 149) lacks a default value and is used directly in a multiplicative factor inn()(line 191). WhenfileBoostis not provided in a query and has no default, Vespa does not supply an automatic numeric default, leaving its behavior in expressions undefined. This can distort ranking for google-drive docs whenis_file == 1and the boost parameter is omitted.Consider:
- Declaring a neutral default:
query(fileBoost) double: 1.0- Or guarding the expression:
if(is_file == 1, if(query(fileBoost) != 0, query(fileBoost), 1.0), 1)Verify that all query paths targeting this schema explicitly set
fileBoostor rely on the default you choose.
♻️ Duplicate comments (3)
server/vespa/schemas/mail.sd (1)
140-147: Same default‑boost concern forquery(mailBoost)as infile.sdAs with
query(fileBoost)inserver/vespa/schemas/file.sd,query(mailBoost)(Line 146) is multiplied directly intocombined_nativeRankwhenis_mail == 1. IfmailBoostis not explicitly set on all relevant queries, Vespa’s default0.0(please confirm) will zero out the lexical component forgmailmails on those paths.Please:
- Either give
mailBoosta neutral default (e.g.,query(mailBoost) double: 1.0) or guard against0.0in the expression, and- Verify that all callers that scope YQL to this schema are setting
ranking.features.query(mailBoost)as expected.server/vespa/schemas/chat_message.sd (1)
132-137: Ensurequery(slackBoost)is always set or given a neutral defaultSame pattern as
fileBoost/mailBoost:query(slackBoost)(Line 136) is multiplied intocombined_nativeRankwhenis_slack == 1. If any query path that hits this schema does not setranking.features.query(slackBoost), Vespa’s default value (typically0.0, please verify) will zero out the nativeRank component for Slack messages on those paths.To avoid accidental regressions:
- Either declare a default:
query(slackBoost) double: 1.0, or- Guard inside the expression so an unset/zero boost falls back to
1.0.And confirm all Slack-scoped YQL clauses are sending the intended
slackBoost.server/vespa/schemas/event.sd (1)
159-165:query(eventBoost)should be neutral by default or always suppliedIn line with the other schemas,
query(eventBoost)(Line 164) multipliescombined_nativeRankwhenis_event == 1. IfeventBoostis omitted on some event queries, Vespa’s default0.0(please verify) will zero out the lexical component forgoogle-calendarevents on those paths.Please either:
- Provide a default
: 1.0in the input declaration, or- Add a guard in the expression to treat
0.0/unset as1.0,and confirm all event‑specific YQL clauses set
eventBoostas intended.
🧹 Nitpick comments (3)
server/vespa/schemas/file.sd (1)
187-193:is_fileboost wiring looks correct; consider exposing for observabilityThe
is_file()helper and the updatedcombined_nativeRankexpression correctly gate thefileBoostmultiplier toapp == "google-drive", keeping other apps’ scores unchanged.Two optional improvements:
- If you often debug ranking, add
is_filetomatch-featuresin the relevant rank profiles to verify app detection at query time.- If you expect the same boost to apply to
attachmentRanktraffic, consider mirroring the boost logic intocombined_nativeRank_imageto avoid divergent behavior between text‑only and image‑augmented ranking.These are optional and don’t block the current change.
server/vespa/schemas/mail.sd (1)
173-201: Clarify whethermailBoostshould also apply to intent search pathIn
combined_nativeRank(Lines 195-200), the boost is only applied in the non‑intent branch:if(query(is_intent_search) == 1.0, simplePeopleRank, ((...) / if(matchedFieldCount == 0, 1, matchedFieldCount)) * if(is_mail == 1, query(mailBoost), 1) )If the goal is to uniformly boost Gmail mails whenever
app == "gmail", you may want to applymailBoostto both branches, e.g.:if(is_mail == 1, query(mailBoost), 1) * if(query(is_intent_search) == 1.0, simplePeopleRank, (nativeRank(subject) + nativeRank(chunks) + peopleRank) / if(matchedFieldCount == 0, 1, matchedFieldCount) )or wrap each branch separately. If the current asymmetry is intentional (e.g., you don’t want boosts on intent flows), consider adding a brief comment to document that.
Also optional: expose
is_mailinmatch-featuresif you need to debug app detection in ranking logs.server/vespa/schemas/event.sd (1)
199-221: Event boost wiring is consistent; consider exposingis_eventif you need debuggingThe
is_event()helper and updatedcombined_nativeRank:function is_event() { expression: if(attribute(app) == "google-calendar", 1, 0) } ... ( ( nativeRank(name) + nativeRank(description) + nativeRank(url) ) / if(matchedFieldCount == 0, 1, matchedFieldCount) ) + (META_FIELDS_DECAY * (nativeRank(attachmentFilenames) + nativeRank(attendeesNames))) ) * if(is_event == 1, query(eventBoost), 1)correctly apply the
eventBoostfactor to both the primary and metadata components of the nativeRank score, restricted toapp == "google-calendar". Non‑Calendar events keep their previous behavior.Optional: if you need to inspect how often the boost is active, add
is_eventtomatch-featuresin the relevant rank profiles.Overall, the ranking logic change here looks good.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
server/vespa/schemas/chat_message.sd(4 hunks)server/vespa/schemas/event.sd(3 hunks)server/vespa/schemas/file.sd(2 hunks)server/vespa/schemas/mail.sd(3 hunks)
🔇 Additional comments (1)
server/vespa/schemas/chat_message.sd (1)
162-175: Slack gating and observability for the boost look solidThe new
is_slack()helper and updatedcombined_nativeRank:function is_slack() { expression: if(attribute(app) == "slack", 1, 0) } ... ( nativeRank(text) + nativeRank(username) + nativeRank(name) ) / if(matchedFieldCount == 0, 1, matchedFieldCount) * if(is_slack == 1, query(slackBoost), 1)cleanly constrain the
slackBoostmultiplier to Slack messages only, without altering behavior for other apps. Addingis_slacktomatch-featuresin bothdefault_nativeanddefault_aiis also a nice touch for debugging ranking behavior per app.No issues here from a ranking or syntax perspective.
Also applies to: 199-207, 253-260
| nativeRank(username) + | ||
| nativeRank(name) | ||
| ) / if(matchedFieldCount == 0, 1, matchedFieldCount) | ||
| ) * if(is_slack == 1, query(slackBoost), 1) |
There was a problem hiding this comment.
Could we add a detailed comment explaining why this boost is necessary?
Description
add boost based on app , to ensure the search works correctly when we scoped each yql clause to its own app
Testing
Additional Notes
Summary by CodeRabbit
Improvements