Skip to content

Comments

normalize rank to be 0..1 from database searches#3555

Merged
mouse-reeve merged 2 commits intobookwyrm-social:mainfrom
ilkka-ollakka:tweak/normalize_search_rank
Apr 26, 2025
Merged

normalize rank to be 0..1 from database searches#3555
mouse-reeve merged 2 commits intobookwyrm-social:mainfrom
ilkka-ollakka:tweak/normalize_search_rank

Conversation

@ilkka-ollakka
Copy link
Contributor

Description

This gives the local search ranks in scale of 0..1, same what we assume connector confidence to be.

I'm not sure if this is desired change, as it might rule out some results that were previously found, as now the min_confidence most likely has more relevance that is given as filtering criteria.

value 32 is documented in
https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING

  • Related Issue #
  • Closes #

What type of Pull Request is this?

  • Bug Fix
  • Enhancement
  • Plumbing / Internals / Dependencies
  • Refactor

Does this PR change settings or dependencies, or break something?

  • This PR changes or adds default settings, configuration, or .env values
  • This PR changes or adds dependencies
  • This PR introduces other breaking changes

Details of breaking or configuration changes (if any of above checked)

Documentation

  • New or amended documentation will be required if this PR is merged
  • I have created a matching pull request in the Documentation repository
  • I intend to create a matching pull request in the Documentation repository after this PR is merged

Tests

  • My changes do not need new tests
  • All tests I have added are passing
  • I have written tests but need help to make them pass
  • I have not written tests and need help to write them

@ilkka-ollakka
Copy link
Contributor Author

pytest seems to fail on importer job checks, so most likely timing issue and some other item were also in queue at the same time so index was not the first one.

@ilkka-ollakka
Copy link
Contributor Author

Seem that I'm unable to reproduce the issue locally :/

@mouse-reeve
Copy link
Member

I don't think this issue is related to your code, it seems like an intermittent test failure

@ilkka-ollakka
Copy link
Contributor Author

I added commit that github action reruns failed tests if any found, just to rule out timing issues in the future, and of course now things don't fail anymore ;)

@ilkka-ollakka
Copy link
Contributor Author

I can split the workflow commit to separate PR if anyone sees it useful and doesn't want to check yet the normalization things.

@mouse-reeve
Copy link
Member

A separate commit would be great, I can also re-run the workflows if that would be helpful

@ilkka-ollakka
Copy link
Contributor Author

I splitted it up to #3559 and I'll take the commit away from this PR.

This gives the local search ranks in scale of 0..1, same what we assume connector confidence to be.

value 32 is documented in
https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING
index value is not that strict required to be 0, more relevant is that the titles are correct
@ilkka-ollakka ilkka-ollakka force-pushed the tweak/normalize_search_rank branch from 7424a59 to aa25564 Compare April 26, 2025 18:33
@mouse-reeve
Copy link
Member

This seems like a sensible change -- my understanding from reading the docs is that it will keep the search ranking from prioritizing long titles/author strings over ones that are closer matches, is that right? I've been trying this locally but haven't figured out any good combinations of books and queries to see the differences. Do you have suggestions?

@ilkka-ollakka
Copy link
Contributor Author

This mainly scales the match confidence-values to be under 1 always, so really high score (10000 for example) gets confidence of 0.9999 and score of 10 gets confidence of 0.9090... . So it shouldn't change the search ranking/ordering, just scaling the values to known range.

I didn't yet extensively check any good examples, but I can check out if I can spot any examples.

Mainly I did this so the min_confidence would have effect on local searches too, as the queries have the filtering in place.

@mouse-reeve
Copy link
Member

I see! That makes more sense -- I fully support normalizing the rank values.

@mouse-reeve mouse-reeve merged commit 0627abe into bookwyrm-social:main Apr 26, 2025
10 checks passed
@ilkka-ollakka ilkka-ollakka deleted the tweak/normalize_search_rank branch April 26, 2025 18:59
@hughrun hughrun added the plumbing PR for internal processes or background jobs label Aug 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plumbing PR for internal processes or background jobs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants