Skip to content

Latest commit

 

History

History
469 lines (357 loc) · 10.2 KB

File metadata and controls

469 lines (357 loc) · 10.2 KB

🚀 Response Optimization Guide

This guide explains how the Dataproc MCP Server's response optimization system works, providing dramatic token reductions while maintaining full data accessibility.

Overview

The response optimization system automatically reduces token usage by 60-96% while storing complete data in Qdrant for later access. This provides faster responses, lower costs, and better user experience without losing any information.

How It Works

1. Intelligent Response Filtering

The system analyzes API responses and extracts only the most essential information:

// Original response (7,651 tokens)
{
  "clusters": [
    {
      "clusterName": "analytics-cluster-prod",
      "status": { "state": "RUNNING", "stateStartTime": "2024-01-01T10:00:00Z" },
      "config": {
        "masterConfig": { "numInstances": 1, "machineTypeUri": "n1-standard-4" },
        "workerConfig": { "numInstances": 4, "machineTypeUri": "n1-standard-4" },
        // ... hundreds more lines of configuration
      }
    }
  ]
}

// Optimized response (292 tokens - 96.2% reduction)
"Found 3 clusters in my-project-123/us-central1:

 analytics-cluster-prod (RUNNING) - n1-standard-4, 5 nodes
 data-pipeline-dev (RUNNING) - n1-standard-2, 3 nodes  
 ml-training-cluster (CREATING) - n1-highmem-8, 10 nodes

💾 Full details stored: dataproc://responses/clusters/list/abc123
📊 Token reduction: 96.2% (7,651  292 tokens)"

2. Automatic Qdrant Storage

Complete data is automatically stored in Qdrant vector database:

// Storage process
1. Response received from Google Cloud API
2. Full data stored in Qdrant with unique ID
3. Optimized summary generated
4. Resource URI provided for full data access

3. Resource URI Access

Access complete data anytime via resource URIs:

# Resource URI format
dataproc://responses/{tool}/{operation}/{unique-id}

# Examples
dataproc://responses/clusters/list/abc123
dataproc://responses/clusters/get/def456
dataproc://responses/jobs/active/ghi789

Performance Results

Token Reduction by Tool

Tool Before After Reduction Example Use Case
list_clusters 7,651 292 96.2% Quick cluster overview
get_cluster 553 199 64.0% Cluster status check
check_active_jobs 1,626 316 80.6% Job monitoring
get_job_status 445 110 75.3% Job progress tracking

Processing Performance

  • Average Processing Time: 9.95ms
  • Memory Usage: <1MB per operation
  • Storage Efficiency: 99.9% compression ratio
  • Qdrant Startup Time: ~2 seconds (auto-managed)

Configuration

Environment Variables

# Core optimization settings
RESPONSE_OPTIMIZATION_ENABLED=true
RESPONSE_TOKEN_LIMIT=500

# Qdrant configuration
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION=dataproc_responses
QDRANT_AUTO_START=true

# Performance tuning
QDRANT_VECTOR_SIZE=384
QDRANT_DISTANCE_METRIC=Cosine

Configuration File

Create config/response-filter.json:

{
  "enabled": true,
  "tokenLimit": 500,
  "tools": {
    "list_clusters": {
      "enabled": true,
      "maxClusters": 10,
      "includeFields": ["name", "status", "machineType", "nodeCount"]
    },
    "get_cluster": {
      "enabled": true,
      "includeFields": ["name", "status", "config.masterConfig", "config.workerConfig"]
    },
    "check_active_jobs": {
      "enabled": true,
      "maxJobs": 20,
      "includeFields": ["jobId", "status", "clusterName", "startTime"]
    }
  },
  "qdrant": {
    "url": "http://localhost:6333",
    "collection": "dataproc_responses",
    "autoStart": true
  }
}

Usage Examples

Basic Usage (Optimized Responses)

// Default behavior - optimized responses
const response = await mcpClient.callTool("list_clusters", {
  filter: "status.state=RUNNING"
});

// Response includes:
// - Concise cluster summary
// - Resource URI for full data
// - Token reduction metrics

Verbose Mode (Full Responses)

// Get full response when needed
const response = await mcpClient.callTool("list_clusters", {
  filter: "status.state=RUNNING",
  verbose: true  // Disable optimization
});

// Response includes:
// - Complete cluster configurations
// - All metadata and properties
// - No token reduction

Accessing Stored Data

// Access full data via resource URI
const fullData = await mcpClient.readResource(
  "dataproc://responses/clusters/list/abc123"
);

// Returns complete original response
console.log(fullData.clusters[0].config.masterConfig);

Qdrant Setup and Management

Automatic Setup (Recommended)

The server automatically manages Qdrant:

# Qdrant starts automatically when needed
npm start

# Check Qdrant status
curl http://localhost:6333/health

Manual Setup

# Option 1: Docker (Recommended)
docker run -p 6333:6333 qdrant/qdrant

# Option 2: Binary installation
wget https://github.com/qdrant/qdrant/releases/latest/download/qdrant-x86_64-unknown-linux-gnu.tar.gz
tar xzf qdrant-x86_64-unknown-linux-gnu.tar.gz
./qdrant

# Option 3: Cloud deployment
# Use Qdrant Cloud or deploy to your cloud provider

Collection Management

# Create collection manually (auto-created by default)
curl -X PUT http://localhost:6333/collections/dataproc_responses \
  -H "Content-Type: application/json" \
  -d '{
    "vectors": {
      "size": 384,
      "distance": "Cosine"
    }
  }'

# Check collection info
curl http://localhost:6333/collections/dataproc_responses

Troubleshooting

Common Issues

1. Qdrant Connection Failed

Symptoms:

  • Responses fall back to verbose mode
  • Warning: "Qdrant storage unavailable"

Solutions:

# Check Qdrant status
curl http://localhost:6333/health

# Restart Qdrant
docker restart qdrant

# Check configuration
echo $QDRANT_URL

2. High Memory Usage

Symptoms:

  • Server memory usage increases over time
  • Slow response times

Solutions:

# Reduce vector size
export QDRANT_VECTOR_SIZE=256

# Enable collection cleanup
export QDRANT_CLEANUP_ENABLED=true
export QDRANT_MAX_POINTS=10000

3. Token Reduction Not Working

Symptoms:

  • Responses are still verbose
  • No resource URIs provided

Solutions:

# Check optimization is enabled
export RESPONSE_OPTIMIZATION_ENABLED=true

# Verify token limit
export RESPONSE_TOKEN_LIMIT=500

# Check tool-specific settings
cat config/response-filter.json

Debug Mode

Enable detailed logging:

export LOG_LEVEL=debug
export RESPONSE_OPTIMIZATION_DEBUG=true

# Start server with debug output
npm start

Performance Monitoring

# Check optimization metrics
curl http://localhost:3000/metrics

# Monitor Qdrant performance
curl http://localhost:6333/metrics

Best Practices

1. Token Limit Configuration

// Conservative (high quality summaries)
RESPONSE_TOKEN_LIMIT=300

// Balanced (recommended)
RESPONSE_TOKEN_LIMIT=500

// Aggressive (maximum reduction)
RESPONSE_TOKEN_LIMIT=200

2. Qdrant Maintenance

# Regular cleanup (weekly)
curl -X POST http://localhost:6333/collections/dataproc_responses/points/delete \
  -H "Content-Type: application/json" \
  -d '{
    "filter": {
      "must": [
        {
          "range": {
            "timestamp": {
              "lt": "2024-01-01T00:00:00Z"
            }
          }
        }
      ]
    }
  }'

# Backup collection
curl http://localhost:6333/collections/dataproc_responses/snapshots

3. Production Deployment

# docker-compose.yml
version: '3.8'
services:
  dataproc-mcp:
    image: dataproc-mcp-server:latest
    environment:
      - RESPONSE_OPTIMIZATION_ENABLED=true
      - QDRANT_URL=http://qdrant:6333
    depends_on:
      - qdrant

  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
    volumes:
      - qdrant_data:/qdrant/storage

volumes:
  qdrant_data:

Advanced Features

1. Semantic Search

Search stored responses by content:

// Search for clusters with specific configurations
const results = await qdrantClient.search("dataproc_responses", {
  vector: await embedText("high memory clusters"),
  limit: 10,
  filter: {
    must: [
      { key: "tool", match: { value: "list_clusters" } }
    ]
  }
});

2. Custom Optimization Rules

{
  "customRules": {
    "list_clusters": {
      "priority": ["name", "status", "nodeCount"],
      "exclude": ["labels", "metadata"],
      "maxItems": 15
    },
    "get_cluster": {
      "priority": ["status", "config.masterConfig", "config.workerConfig"],
      "exclude": ["config.softwareConfig.properties"],
      "includeMetrics": true
    }
  }
}

3. Response Caching

// Enable response caching
export RESPONSE_CACHE_ENABLED=true
export RESPONSE_CACHE_TTL=300  // 5 minutes

// Cache hit example
const response = await mcpClient.callTool("list_clusters", {});
// Subsequent calls return cached optimized response

Migration Guide

From Verbose to Optimized

  1. Enable optimization gradually:
# Start with high token limit
export RESPONSE_TOKEN_LIMIT=1000

# Gradually reduce
export RESPONSE_TOKEN_LIMIT=500
export RESPONSE_TOKEN_LIMIT=300
  1. Test critical workflows:
# Test with verbose mode first
npm run test:optimization:verbose

# Then test optimized mode
npm run test:optimization:default
  1. Monitor performance:
# Track token usage
npm run benchmark:tokens

# Monitor response times
npm run benchmark:performance

Support

Getting Help

Contributing

Help improve response optimization:

  1. Performance Testing: Run benchmarks and report results
  2. Optimization Rules: Suggest better filtering strategies
  3. Qdrant Integration: Improve storage and retrieval
  4. Documentation: Enhance guides and examples

🚀 Achieve 60-96% token reduction while maintaining full data access!