Skip to content

[Bug]: search()/find() returns non-deterministic results for identical queries #204

@ponsde

Description

@ponsde

Bug: search() returns non-deterministic results for identical queries

Version: openviking 0.1.17, Python 3.12, Ubuntu 24.04

Description:
Calling client.search(query) multiple times with the same query, on the same client instance, against the same unmodified dataset, returns completely different result sets each time.

5 consecutive calls yield an average Jaccard similarity of only 0.11, with zero URIs appearing in all 5 results.

Reproduction:

import openviking as ov
import os, time

os.environ['OPENVIKING_CONFIG_FILE'] = 'ov.conf'

c = ov.SyncOpenViking(path='./data')
c.initialize()
time.sleep(3)

query = "Docker 容器日志怎么看"
for run in range(5):
    r = c.search(query)
    uris = [x.uri for x in r.resources]
    print(f"Run {run+1}: {uris[:4]}")
    time.sleep(0.5)

c.close()

Actual result (same client, same query, no restart):

Run 1: ['viking://resources/A/A.md']
Run 2: ['viking://resources/B/B.md', 'viking://resources/C/C.md', ...]
Run 3: ['viking://resources/D/D.md', 'viking://resources/E/E.md', ...]
Run 4: ['viking://resources/F/F.md', 'viking://resources/G/G.md', ...]
Run 5: ['viking://resources/H/H.md', 'viking://resources/I/I.md', ...]
  • Average Jaccard similarity across pairs: 0.11
  • URIs common to all 5 runs: 0

Expected result:
Same client + same query + same data → identical (or near-identical) result set.

Environment details:

  • Embedding model: BAAI/bge-m3 (1024 dim, via OpenAI-compatible API)
  • Dataset: 38 resources (markdown files)
  • No data modifications between calls
  • find() exhibits the same behavior

Impact:
Downstream logic that depends on search consistency (coverage scoring, dedup, caching) becomes unreliable. Short queries (e.g. 3-character abbreviations) are affected more severely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions