Skip to content

Implement connection pooling for Milvus #28

@haroon0x

Description

@haroon0x

Description

The current implementation of milvus_search() is inefficient because it opens a new connection to the Milvus database for every single search request and disconnects immediately after returning results. This "connect-per-request" pattern adds significant latency, which is critical for Agentic RAG workflows where multiple tool calls often occur in a single user turn.

Current Behavior

In server/app.py (and similarly in server-https/app.py), the milvus_search function follows this pattern:

def milvus_search(query: str, top_k: int = 5) -> Dict[str, Any]:
    try:
        # CONNECTS EVERY TIME
        connections.connect(alias="default", host=MILVUS_HOST, port=MILVUS_PORT)
        # ... performs search ...
    finally:
        # DISCONNECTS EVERY TIME
        connections.disconnect(alias="default")

Problem & Impact

  • High Latency: Establishing a TCP/gRPC connection to Milvus takes time. Doing this 5-10 times for a complex agentic turn adds significant overhead.
  • Resource Exhaustion: Rapidly opening and closing connections can lead to port exhaustion or unnecessary load on the Milvus server.
  • Scalability: This pattern prevents the application from scaling efficiently under concurrent load.

Proposed Solution

Refactor the Milvus connection logic to use a persistent connection strategy.

  1. Global/Lifecycle Initialization: Initialize the Milvus connection once when the application starts.
    • For server/app.py (WebSocket): Initialize in main() before starting the server loop.
    • For server-https/app.py (FastAPI): Use the lifespan context manager or a startup event handler.
  2. Reuse Connection: The milvus_search() function should reuse the existing global connection alias ("default") instead of creating a new one.
  3. Graceful Shutdown: Ensure connections.disconnect("default") is called only when the application is shutting down.

Technical Details

  • Library: pymilvus
  • Affected Files:
    • server/app.py
    • server-https/app.py

Acceptance Criteria

  • Milvus connection is successfully established only once at application startup.
  • milvus_search() successfully executes queries using the persistent connection.
  • Connection is properly closed on application shutdown.
  • Latency for milvus_search() calls is reduced (can be verified with simple logging/benchmarking).
  • Server handles intermittent connection drops gracefully (optional optimization: add reconnection logic).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions