-
Notifications
You must be signed in to change notification settings - Fork 323
Closed as not planned
Closed as not planned
Copy link
Description
Bug: search() returns non-deterministic results for identical queries
Version: openviking 0.1.17, Python 3.12, Ubuntu 24.04
Description:
Calling client.search(query) multiple times with the same query, on the same client instance, against the same unmodified dataset, returns completely different result sets each time.
5 consecutive calls yield an average Jaccard similarity of only 0.11, with zero URIs appearing in all 5 results.
Reproduction:
import openviking as ov
import os, time
os.environ['OPENVIKING_CONFIG_FILE'] = 'ov.conf'
c = ov.SyncOpenViking(path='./data')
c.initialize()
time.sleep(3)
query = "Docker 容器日志怎么看"
for run in range(5):
r = c.search(query)
uris = [x.uri for x in r.resources]
print(f"Run {run+1}: {uris[:4]}")
time.sleep(0.5)
c.close()Actual result (same client, same query, no restart):
Run 1: ['viking://resources/A/A.md']
Run 2: ['viking://resources/B/B.md', 'viking://resources/C/C.md', ...]
Run 3: ['viking://resources/D/D.md', 'viking://resources/E/E.md', ...]
Run 4: ['viking://resources/F/F.md', 'viking://resources/G/G.md', ...]
Run 5: ['viking://resources/H/H.md', 'viking://resources/I/I.md', ...]
- Average Jaccard similarity across pairs: 0.11
- URIs common to all 5 runs: 0
Expected result:
Same client + same query + same data → identical (or near-identical) result set.
Environment details:
- Embedding model:
BAAI/bge-m3(1024 dim, via OpenAI-compatible API) - Dataset: 38 resources (markdown files)
- No data modifications between calls
find()exhibits the same behavior
Impact:
Downstream logic that depends on search consistency (coverage scoring, dedup, caching) becomes unreliable. Short queries (e.g. 3-character abbreviations) are affected more severely.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Done