-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
Description
The current implementation of milvus_search() is inefficient because it opens a new connection to the Milvus database for every single search request and disconnects immediately after returning results. This "connect-per-request" pattern adds significant latency, which is critical for Agentic RAG workflows where multiple tool calls often occur in a single user turn.
Current Behavior
In server/app.py (and similarly in server-https/app.py), the milvus_search function follows this pattern:
def milvus_search(query: str, top_k: int = 5) -> Dict[str, Any]:
try:
# CONNECTS EVERY TIME
connections.connect(alias="default", host=MILVUS_HOST, port=MILVUS_PORT)
# ... performs search ...
finally:
# DISCONNECTS EVERY TIME
connections.disconnect(alias="default")Problem & Impact
- High Latency: Establishing a TCP/gRPC connection to Milvus takes time. Doing this 5-10 times for a complex agentic turn adds significant overhead.
- Resource Exhaustion: Rapidly opening and closing connections can lead to port exhaustion or unnecessary load on the Milvus server.
- Scalability: This pattern prevents the application from scaling efficiently under concurrent load.
Proposed Solution
Refactor the Milvus connection logic to use a persistent connection strategy.
- Global/Lifecycle Initialization: Initialize the Milvus connection once when the application starts.
- For
server/app.py(WebSocket): Initialize inmain()before starting the server loop. - For
server-https/app.py(FastAPI): Use thelifespancontext manager or a startup event handler.
- For
- Reuse Connection: The
milvus_search()function should reuse the existing global connection alias ("default") instead of creating a new one. - Graceful Shutdown: Ensure
connections.disconnect("default")is called only when the application is shutting down.
Technical Details
- Library:
pymilvus - Affected Files:
server/app.pyserver-https/app.py
Acceptance Criteria
- Milvus connection is successfully established only once at application startup.
-
milvus_search()successfully executes queries using the persistent connection. - Connection is properly closed on application shutdown.
- Latency for
milvus_search()calls is reduced (can be verified with simple logging/benchmarking). - Server handles intermittent connection drops gracefully (optional optimization: add reconnection logic).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels