fix(docScoring): Add boost for nativeRank score by naSim087 · Pull Request #1221 · xynehq/xyne

naSim087 · 2025-11-18T09:46:35Z

Description

add boost based on app , to ensure the search works correctly when we scoped each yql clause to its own app

Testing

Additional Notes

Summary by CodeRabbit

Improvements

Enhanced search result ranking with intelligent content-type recognition for Slack, Calendar events, Files, and Emails. Improved relevance through app-specific optimization to surface the most pertinent results across different content sources.

gemini-code-assist · 2025-11-18T09:46:51Z

Summary of Changes

Hello @naSim087, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the document scoring mechanism by introducing application-specific boosting for nativeRank scores. This change allows for more precise control over search relevance, ensuring that documents from particular applications receive appropriate prioritization when queries are scoped. The modification addresses the need for search to function correctly when YQL clauses are specific to individual applications.

Highlights

Application-Specific Boosting: Introduced the ability to boost nativeRank scores based on the originating application (e.g., Slack, Google Calendar, Google Drive, Gmail).
New Query Parameters: Added 'slackBoost', 'eventBoost', 'fileBoost', and 'mailBoost' as query inputs to control the magnitude of the application-specific boosts.
Application Identification Functions: Implemented 'is_slack()', 'is_event()', 'is_file()', and 'is_mail()' functions to programmatically identify the application associated with a document.
Modified Ranking Expressions: Updated the 'combined_nativeRank()' expressions across 'chat_message.sd', 'event.sd', 'file.sd', and 'mail.sd' to incorporate the new boosting logic.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2025-11-18T09:47:06Z

Walkthrough

This PR introduces app-specific ranking boosts across four Vespa schema files. Each schema adds a boost query parameter, app-detection helper function, and conditional multiplier in combined_nativeRank to apply boost when the app type matches (Slack, Google Calendar, Google Drive, or Gmail respectively).

Changes

Cohort / File(s)	Summary
App-specific ranking boosts `server/vespa/schemas/chat_message.sd`, `server/vespa/schemas/event.sd`, `server/vespa/schemas/file.sd`, `server/vespa/schemas/mail.sd`	Each schema adds: (1) a boost input parameter to rank-profile initial (slackBoost, eventBoost, fileBoost, mailBoost), (2) an app-detection function (is_slack, is_event, is_file, is_mail checking for "slack", "google-calendar", "google-drive", "gmail" respectively), (3) a conditional multiplier in combined_nativeRank that applies the boost when the function returns 1, and (4) the detection function exposed as a match-feature.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Highly repetitive pattern: Same structural changes applied consistently across 4 independent schema files.
Low logic complexity: Simple conditional multiplication (if(is_X == 1, query(xBoost), 1)) with straightforward app detection.
Potential areas for attention: Verify that each app detection condition (attribute value checks) is correctly matched to the intended app, and ensure consistency of function naming and parameter conventions across all four files.

Possibly related PRs

fix(search): Dynamic average of nativeRank scores using matched field detection #384: Modifies the combined_nativeRank function across multiple schema files, suggesting related ranking enhancements.
feat : robust intent classification #636: Updates mail.sd ranking logic in rank-profile initial, potentially addressing the same area of functionality.

Suggested reviewers

zereraz
shivamashtikar
kalpadhwaryu
junaid-shirur

Poem

🐰 Four schemas bloom with boosts so fine,
Each app now ranked in its own design,
Slack hops, Gmail glows, and files take flight,
Calendar events shine ever bright! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(docScoring): Add boost for nativeRank score' accurately describes the main change: adding boost mechanisms to native rank scoring across multiple schema files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/docScoring

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a mechanism to boost document scores based on the application source, which is a good improvement for scoped searches. The implementation is consistent across the different Vespa schemas. My review focuses on improving the code's readability and conciseness. I've suggested simplifying boolean expressions and improving the formatting of complex ranking functions for better maintainability. These changes are minor but will enhance the overall code quality.

server/vespa/schemas/chat_message.sd

server/vespa/schemas/event.sd

server/vespa/schemas/file.sd

server/vespa/schemas/mail.sd

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

server/vespa/schemas/file.sd (1)

143-150: Clarify handling of unset query(fileBoost) in n() function

The query(fileBoost) double input (line 149) lacks a default value and is used directly in a multiplicative factor in n() (line 191). When fileBoost is not provided in a query and has no default, Vespa does not supply an automatic numeric default, leaving its behavior in expressions undefined. This can distort ranking for google-drive docs when is_file == 1 and the boost parameter is omitted.

Consider:

Declaring a neutral default: query(fileBoost) double: 1.0

Or guarding the expression: if(is_file == 1, if(query(fileBoost) != 0, query(fileBoost), 1.0), 1)

Verify that all query paths targeting this schema explicitly set fileBoost or rely on the default you choose.

♻️ Duplicate comments (3)

server/vespa/schemas/mail.sd (1)

140-147: Same default‑boost concern for query(mailBoost) as in file.sd

As with query(fileBoost) in server/vespa/schemas/file.sd, query(mailBoost) (Line 146) is multiplied directly into combined_nativeRank when is_mail == 1. If mailBoost is not explicitly set on all relevant queries, Vespa’s default 0.0 (please confirm) will zero out the lexical component for gmail mails on those paths.

Please:

Either give mailBoost a neutral default (e.g., query(mailBoost) double: 1.0) or guard against 0.0 in the expression, and

Verify that all callers that scope YQL to this schema are setting ranking.features.query(mailBoost) as expected.

server/vespa/schemas/chat_message.sd (1)

132-137: Ensure query(slackBoost) is always set or given a neutral default

Same pattern as fileBoost/mailBoost: query(slackBoost) (Line 136) is multiplied into combined_nativeRank when is_slack == 1. If any query path that hits this schema does not set ranking.features.query(slackBoost), Vespa’s default value (typically 0.0, please verify) will zero out the nativeRank component for Slack messages on those paths.

To avoid accidental regressions:

Either declare a default: query(slackBoost) double: 1.0, or

Guard inside the expression so an unset/zero boost falls back to 1.0.

And confirm all Slack-scoped YQL clauses are sending the intended slackBoost.

server/vespa/schemas/event.sd (1)

159-165: query(eventBoost) should be neutral by default or always supplied

In line with the other schemas, query(eventBoost) (Line 164) multiplies combined_nativeRank when is_event == 1. If eventBoost is omitted on some event queries, Vespa’s default 0.0 (please verify) will zero out the lexical component for google-calendar events on those paths.

Please either:

Provide a default : 1.0 in the input declaration, or

Add a guard in the expression to treat 0.0/unset as 1.0,

and confirm all event‑specific YQL clauses set eventBoost as intended.

🧹 Nitpick comments (3)

server/vespa/schemas/file.sd (1)

187-193: is_file boost wiring looks correct; consider exposing for observability

The is_file() helper and the updated combined_nativeRank expression correctly gate the fileBoost multiplier to app == "google-drive", keeping other apps’ scores unchanged.

Two optional improvements:

If you often debug ranking, add is_file to match-features in the relevant rank profiles to verify app detection at query time.

If you expect the same boost to apply to attachmentRank traffic, consider mirroring the boost logic into combined_nativeRank_image to avoid divergent behavior between text‑only and image‑augmented ranking.

These are optional and don’t block the current change.
server/vespa/schemas/mail.sd (1)
173-201: Clarify whether mailBoost should also apply to intent search path

In combined_nativeRank (Lines 195-200), the boost is only applied in the non‑intent branch:
if(query(is_intent_search) == 1.0,
  simplePeopleRank,
  ((...) / if(matchedFieldCount == 0, 1, matchedFieldCount)) * if(is_mail == 1, query(mailBoost), 1)
)
If the goal is to uniformly boost Gmail mails whenever app == "gmail", you may want to apply mailBoost to both branches, e.g.:
if(is_mail == 1, query(mailBoost), 1) *
if(query(is_intent_search) == 1.0,
  simplePeopleRank,
  (nativeRank(subject) + nativeRank(chunks) + peopleRank) / if(matchedFieldCount == 0, 1, matchedFieldCount)
)
or wrap each branch separately. If the current asymmetry is intentional (e.g., you don’t want boosts on intent flows), consider adding a brief comment to document that.

Also optional: expose is_mail in match-features if you need to debug app detection in ranking logs.
server/vespa/schemas/event.sd (1)
199-221: Event boost wiring is consistent; consider exposing is_event if you need debugging

The is_event() helper and updated combined_nativeRank:
function is_event() {
  expression: if(attribute(app) == "google-calendar", 1, 0)
}

...

(
  (
    nativeRank(name) + nativeRank(description) + nativeRank(url)
  ) / if(matchedFieldCount == 0, 1, matchedFieldCount)
)
+
(META_FIELDS_DECAY * (nativeRank(attachmentFilenames) + nativeRank(attendeesNames)))
) * if(is_event == 1, query(eventBoost), 1)
correctly apply the eventBoost factor to both the primary and metadata components of the nativeRank score, restricted to app == "google-calendar". Non‑Calendar events keep their previous behavior.

Optional: if you need to inspect how often the boost is active, add is_event to match-features in the relevant rank profiles.

Overall, the ranking logic change here looks good.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3219a85 and 9d4084c.

📒 Files selected for processing (4)

server/vespa/schemas/chat_message.sd (4 hunks)
server/vespa/schemas/event.sd (3 hunks)
server/vespa/schemas/file.sd (2 hunks)
server/vespa/schemas/mail.sd (3 hunks)

🔇 Additional comments (1)

server/vespa/schemas/chat_message.sd (1)
162-175: Slack gating and observability for the boost look solid

The new is_slack() helper and updated combined_nativeRank:
function is_slack() {
  expression: if(attribute(app) == "slack", 1, 0)
}

...

(
  nativeRank(text) + nativeRank(username) + nativeRank(name)
) / if(matchedFieldCount == 0, 1, matchedFieldCount)
* if(is_slack == 1, query(slackBoost), 1)
cleanly constrain the slackBoost multiplier to Slack messages only, without altering behavior for other apps. Adding is_slack to match-features in both default_native and default_ai is also a nice touch for debugging ranking behavior per app.

No issues here from a ranking or syntax perspective.

Also applies to: 199-207, 253-260

junaid-shirur · 2025-11-18T11:55:14Z

server/vespa/schemas/chat_message.sd

+            nativeRank(username) + 
+            nativeRank(name)
+          ) / if(matchedFieldCount == 0, 1, matchedFieldCount)
+        ) * if(is_slack == 1, query(slackBoost), 1)


Could we add a detailed comment explaining why this boost is necessary?

fix(docScoring): Add boost for nativeRank score

9d4084c

naSim087 requested review from devesh-juspay, junaid-shirur, kalpadhwaryu, shivamashtikar and zereraz as code owners November 18, 2025 09:46

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

coderabbitai bot reviewed Nov 18, 2025

View reviewed changes

junaid-shirur reviewed Nov 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(docScoring): Add boost for nativeRank score#1221

fix(docScoring): Add boost for nativeRank score#1221
naSim087 wants to merge 1 commit intomainfrom
fix/docScoring

naSim087 commented Nov 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Nov 18, 2025

Uh oh!

coderabbitai bot commented Nov 18, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

junaid-shirur Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

naSim087 commented Nov 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Additional Notes

Summary by CodeRabbit

Improvements

Uh oh!

gemini-code-assist bot commented Nov 18, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

junaid-shirur Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

naSim087 commented Nov 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 18, 2025 •

edited

Loading