Skip to content

fix(docScoring): Add boost for nativeRank score#1221

Open
naSim087 wants to merge 1 commit intomainfrom
fix/docScoring
Open

fix(docScoring): Add boost for nativeRank score#1221
naSim087 wants to merge 1 commit intomainfrom
fix/docScoring

Conversation

@naSim087
Copy link
Contributor

@naSim087 naSim087 commented Nov 18, 2025

Description

add boost based on app , to ensure the search works correctly when we scoped each yql clause to its own app

Testing

Additional Notes

Summary by CodeRabbit

Improvements

  • Enhanced search result ranking with intelligent content-type recognition for Slack, Calendar events, Files, and Emails. Improved relevance through app-specific optimization to surface the most pertinent results across different content sources.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @naSim087, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the document scoring mechanism by introducing application-specific boosting for nativeRank scores. This change allows for more precise control over search relevance, ensuring that documents from particular applications receive appropriate prioritization when queries are scoped. The modification addresses the need for search to function correctly when YQL clauses are specific to individual applications.

Highlights

  • Application-Specific Boosting: Introduced the ability to boost nativeRank scores based on the originating application (e.g., Slack, Google Calendar, Google Drive, Gmail).
  • New Query Parameters: Added 'slackBoost', 'eventBoost', 'fileBoost', and 'mailBoost' as query inputs to control the magnitude of the application-specific boosts.
  • Application Identification Functions: Implemented 'is_slack()', 'is_event()', 'is_file()', and 'is_mail()' functions to programmatically identify the application associated with a document.
  • Modified Ranking Expressions: Updated the 'combined_nativeRank()' expressions across 'chat_message.sd', 'event.sd', 'file.sd', and 'mail.sd' to incorporate the new boosting logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 18, 2025

Walkthrough

This PR introduces app-specific ranking boosts across four Vespa schema files. Each schema adds a boost query parameter, app-detection helper function, and conditional multiplier in combined_nativeRank to apply boost when the app type matches (Slack, Google Calendar, Google Drive, or Gmail respectively).

Changes

Cohort / File(s) Summary
App-specific ranking boosts
server/vespa/schemas/chat_message.sd, server/vespa/schemas/event.sd, server/vespa/schemas/file.sd, server/vespa/schemas/mail.sd
Each schema adds: (1) a boost input parameter to rank-profile initial (slackBoost, eventBoost, fileBoost, mailBoost), (2) an app-detection function (is_slack, is_event, is_file, is_mail checking for "slack", "google-calendar", "google-drive", "gmail" respectively), (3) a conditional multiplier in combined_nativeRank that applies the boost when the function returns 1, and (4) the detection function exposed as a match-feature.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Highly repetitive pattern: Same structural changes applied consistently across 4 independent schema files.
  • Low logic complexity: Simple conditional multiplication (if(is_X == 1, query(xBoost), 1)) with straightforward app detection.
  • Potential areas for attention: Verify that each app detection condition (attribute value checks) is correctly matched to the intended app, and ensure consistency of function naming and parameter conventions across all four files.

Possibly related PRs

Suggested reviewers

  • zereraz
  • shivamashtikar
  • kalpadhwaryu
  • junaid-shirur

Poem

🐰 Four schemas bloom with boosts so fine,
Each app now ranked in its own design,
Slack hops, Gmail glows, and files take flight,
Calendar events shine ever bright! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(docScoring): Add boost for nativeRank score' accurately describes the main change: adding boost mechanisms to native rank scoring across multiple schema files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/docScoring

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to boost document scores based on the application source, which is a good improvement for scoped searches. The implementation is consistent across the different Vespa schemas. My review focuses on improving the code's readability and conciseness. I've suggested simplifying boolean expressions and improving the formatting of complex ranking functions for better maintainability. These changes are minor but will enhance the overall code quality.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/vespa/schemas/file.sd (1)

143-150: Clarify handling of unset query(fileBoost) in n() function

The query(fileBoost) double input (line 149) lacks a default value and is used directly in a multiplicative factor in n() (line 191). When fileBoost is not provided in a query and has no default, Vespa does not supply an automatic numeric default, leaving its behavior in expressions undefined. This can distort ranking for google-drive docs when is_file == 1 and the boost parameter is omitted.

Consider:

  • Declaring a neutral default: query(fileBoost) double: 1.0
  • Or guarding the expression: if(is_file == 1, if(query(fileBoost) != 0, query(fileBoost), 1.0), 1)

Verify that all query paths targeting this schema explicitly set fileBoost or rely on the default you choose.

♻️ Duplicate comments (3)
server/vespa/schemas/mail.sd (1)

140-147: Same default‑boost concern for query(mailBoost) as in file.sd

As with query(fileBoost) in server/vespa/schemas/file.sd, query(mailBoost) (Line 146) is multiplied directly into combined_nativeRank when is_mail == 1. If mailBoost is not explicitly set on all relevant queries, Vespa’s default 0.0 (please confirm) will zero out the lexical component for gmail mails on those paths.

Please:

  • Either give mailBoost a neutral default (e.g., query(mailBoost) double: 1.0) or guard against 0.0 in the expression, and
  • Verify that all callers that scope YQL to this schema are setting ranking.features.query(mailBoost) as expected.
server/vespa/schemas/chat_message.sd (1)

132-137: Ensure query(slackBoost) is always set or given a neutral default

Same pattern as fileBoost/mailBoost: query(slackBoost) (Line 136) is multiplied into combined_nativeRank when is_slack == 1. If any query path that hits this schema does not set ranking.features.query(slackBoost), Vespa’s default value (typically 0.0, please verify) will zero out the nativeRank component for Slack messages on those paths.

To avoid accidental regressions:

  • Either declare a default: query(slackBoost) double: 1.0, or
  • Guard inside the expression so an unset/zero boost falls back to 1.0.

And confirm all Slack-scoped YQL clauses are sending the intended slackBoost.

server/vespa/schemas/event.sd (1)

159-165: query(eventBoost) should be neutral by default or always supplied

In line with the other schemas, query(eventBoost) (Line 164) multiplies combined_nativeRank when is_event == 1. If eventBoost is omitted on some event queries, Vespa’s default 0.0 (please verify) will zero out the lexical component for google-calendar events on those paths.

Please either:

  • Provide a default : 1.0 in the input declaration, or
  • Add a guard in the expression to treat 0.0/unset as 1.0,

and confirm all event‑specific YQL clauses set eventBoost as intended.

🧹 Nitpick comments (3)
server/vespa/schemas/file.sd (1)

187-193: is_file boost wiring looks correct; consider exposing for observability

The is_file() helper and the updated combined_nativeRank expression correctly gate the fileBoost multiplier to app == "google-drive", keeping other apps’ scores unchanged.

Two optional improvements:

  • If you often debug ranking, add is_file to match-features in the relevant rank profiles to verify app detection at query time.
  • If you expect the same boost to apply to attachmentRank traffic, consider mirroring the boost logic into combined_nativeRank_image to avoid divergent behavior between text‑only and image‑augmented ranking.

These are optional and don’t block the current change.

server/vespa/schemas/mail.sd (1)

173-201: Clarify whether mailBoost should also apply to intent search path

In combined_nativeRank (Lines 195-200), the boost is only applied in the non‑intent branch:

if(query(is_intent_search) == 1.0,
  simplePeopleRank,
  ((...) / if(matchedFieldCount == 0, 1, matchedFieldCount)) * if(is_mail == 1, query(mailBoost), 1)
)

If the goal is to uniformly boost Gmail mails whenever app == "gmail", you may want to apply mailBoost to both branches, e.g.:

if(is_mail == 1, query(mailBoost), 1) *
if(query(is_intent_search) == 1.0,
  simplePeopleRank,
  (nativeRank(subject) + nativeRank(chunks) + peopleRank) / if(matchedFieldCount == 0, 1, matchedFieldCount)
)

or wrap each branch separately. If the current asymmetry is intentional (e.g., you don’t want boosts on intent flows), consider adding a brief comment to document that.

Also optional: expose is_mail in match-features if you need to debug app detection in ranking logs.

server/vespa/schemas/event.sd (1)

199-221: Event boost wiring is consistent; consider exposing is_event if you need debugging

The is_event() helper and updated combined_nativeRank:

function is_event() {
  expression: if(attribute(app) == "google-calendar", 1, 0)
}

...

(
  (
    nativeRank(name) + nativeRank(description) + nativeRank(url)
  ) / if(matchedFieldCount == 0, 1, matchedFieldCount)
)
+
(META_FIELDS_DECAY * (nativeRank(attachmentFilenames) + nativeRank(attendeesNames)))
) * if(is_event == 1, query(eventBoost), 1)

correctly apply the eventBoost factor to both the primary and metadata components of the nativeRank score, restricted to app == "google-calendar". Non‑Calendar events keep their previous behavior.

Optional: if you need to inspect how often the boost is active, add is_event to match-features in the relevant rank profiles.

Overall, the ranking logic change here looks good.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3219a85 and 9d4084c.

📒 Files selected for processing (4)
  • server/vespa/schemas/chat_message.sd (4 hunks)
  • server/vespa/schemas/event.sd (3 hunks)
  • server/vespa/schemas/file.sd (2 hunks)
  • server/vespa/schemas/mail.sd (3 hunks)
🔇 Additional comments (1)
server/vespa/schemas/chat_message.sd (1)

162-175: Slack gating and observability for the boost look solid

The new is_slack() helper and updated combined_nativeRank:

function is_slack() {
  expression: if(attribute(app) == "slack", 1, 0)
}

...

(
  nativeRank(text) + nativeRank(username) + nativeRank(name)
) / if(matchedFieldCount == 0, 1, matchedFieldCount)
* if(is_slack == 1, query(slackBoost), 1)

cleanly constrain the slackBoost multiplier to Slack messages only, without altering behavior for other apps. Adding is_slack to match-features in both default_native and default_ai is also a nice touch for debugging ranking behavior per app.

No issues here from a ranking or syntax perspective.

Also applies to: 199-207, 253-260

nativeRank(username) +
nativeRank(name)
) / if(matchedFieldCount == 0, 1, matchedFieldCount)
) * if(is_slack == 1, query(slackBoost), 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a detailed comment explaining why this boost is necessary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments