normalize rank to be 0..1 from database searches#3555
normalize rank to be 0..1 from database searches#3555mouse-reeve merged 2 commits intobookwyrm-social:mainfrom
Conversation
|
pytest seems to fail on importer job checks, so most likely timing issue and some other item were also in queue at the same time so index was not the first one. |
|
Seem that I'm unable to reproduce the issue locally :/ |
|
I don't think this issue is related to your code, it seems like an intermittent test failure |
|
I added commit that github action reruns failed tests if any found, just to rule out timing issues in the future, and of course now things don't fail anymore ;) |
|
I can split the workflow commit to separate PR if anyone sees it useful and doesn't want to check yet the normalization things. |
|
A separate commit would be great, I can also re-run the workflows if that would be helpful |
|
I splitted it up to #3559 and I'll take the commit away from this PR. |
This gives the local search ranks in scale of 0..1, same what we assume connector confidence to be. value 32 is documented in https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING
index value is not that strict required to be 0, more relevant is that the titles are correct
7424a59 to
aa25564
Compare
|
This seems like a sensible change -- my understanding from reading the docs is that it will keep the search ranking from prioritizing long titles/author strings over ones that are closer matches, is that right? I've been trying this locally but haven't figured out any good combinations of books and queries to see the differences. Do you have suggestions? |
|
This mainly scales the match confidence-values to be under 1 always, so really high score (10000 for example) gets confidence of 0.9999 and score of 10 gets confidence of 0.9090... . So it shouldn't change the search ranking/ordering, just scaling the values to known range. I didn't yet extensively check any good examples, but I can check out if I can spot any examples. Mainly I did this so the min_confidence would have effect on local searches too, as the queries have the filtering in place. |
|
I see! That makes more sense -- I fully support normalizing the rank values. |
Description
This gives the local search ranks in scale of 0..1, same what we assume connector confidence to be.
I'm not sure if this is desired change, as it might rule out some results that were previously found, as now the min_confidence most likely has more relevance that is given as filtering criteria.
value 32 is documented in
https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING
What type of Pull Request is this?
Does this PR change settings or dependencies, or break something?
Details of breaking or configuration changes (if any of above checked)
Documentation
Tests