This guide explains how the Dataproc MCP Server's response optimization system works, providing dramatic token reductions while maintaining full data accessibility.
The response optimization system automatically reduces token usage by 60-96% while storing complete data in Qdrant for later access. This provides faster responses, lower costs, and better user experience without losing any information.
The system analyzes API responses and extracts only the most essential information:
// Original response (7,651 tokens)
{
"clusters": [
{
"clusterName": "analytics-cluster-prod",
"status": { "state": "RUNNING", "stateStartTime": "2024-01-01T10:00:00Z" },
"config": {
"masterConfig": { "numInstances": 1, "machineTypeUri": "n1-standard-4" },
"workerConfig": { "numInstances": 4, "machineTypeUri": "n1-standard-4" },
// ... hundreds more lines of configuration
}
}
]
}
// Optimized response (292 tokens - 96.2% reduction)
"Found 3 clusters in my-project-123/us-central1:
• analytics-cluster-prod (RUNNING) - n1-standard-4, 5 nodes
• data-pipeline-dev (RUNNING) - n1-standard-2, 3 nodes
• ml-training-cluster (CREATING) - n1-highmem-8, 10 nodes
💾 Full details stored: dataproc://responses/clusters/list/abc123
📊 Token reduction: 96.2% (7,651 → 292 tokens)"Complete data is automatically stored in Qdrant vector database:
// Storage process
1. Response received from Google Cloud API
2. Full data stored in Qdrant with unique ID
3. Optimized summary generated
4. Resource URI provided for full data accessAccess complete data anytime via resource URIs:
# Resource URI format
dataproc://responses/{tool}/{operation}/{unique-id}
# Examples
dataproc://responses/clusters/list/abc123
dataproc://responses/clusters/get/def456
dataproc://responses/jobs/active/ghi789| Tool | Before | After | Reduction | Example Use Case |
|---|---|---|---|---|
list_clusters |
7,651 | 292 | 96.2% | Quick cluster overview |
get_cluster |
553 | 199 | 64.0% | Cluster status check |
check_active_jobs |
1,626 | 316 | 80.6% | Job monitoring |
get_job_status |
445 | 110 | 75.3% | Job progress tracking |
- Average Processing Time: 9.95ms
- Memory Usage: <1MB per operation
- Storage Efficiency: 99.9% compression ratio
- Qdrant Startup Time: ~2 seconds (auto-managed)
# Core optimization settings
RESPONSE_OPTIMIZATION_ENABLED=true
RESPONSE_TOKEN_LIMIT=500
# Qdrant configuration
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION=dataproc_responses
QDRANT_AUTO_START=true
# Performance tuning
QDRANT_VECTOR_SIZE=384
QDRANT_DISTANCE_METRIC=CosineCreate config/response-filter.json:
{
"enabled": true,
"tokenLimit": 500,
"tools": {
"list_clusters": {
"enabled": true,
"maxClusters": 10,
"includeFields": ["name", "status", "machineType", "nodeCount"]
},
"get_cluster": {
"enabled": true,
"includeFields": ["name", "status", "config.masterConfig", "config.workerConfig"]
},
"check_active_jobs": {
"enabled": true,
"maxJobs": 20,
"includeFields": ["jobId", "status", "clusterName", "startTime"]
}
},
"qdrant": {
"url": "http://localhost:6333",
"collection": "dataproc_responses",
"autoStart": true
}
}// Default behavior - optimized responses
const response = await mcpClient.callTool("list_clusters", {
filter: "status.state=RUNNING"
});
// Response includes:
// - Concise cluster summary
// - Resource URI for full data
// - Token reduction metrics// Get full response when needed
const response = await mcpClient.callTool("list_clusters", {
filter: "status.state=RUNNING",
verbose: true // Disable optimization
});
// Response includes:
// - Complete cluster configurations
// - All metadata and properties
// - No token reduction// Access full data via resource URI
const fullData = await mcpClient.readResource(
"dataproc://responses/clusters/list/abc123"
);
// Returns complete original response
console.log(fullData.clusters[0].config.masterConfig);The server automatically manages Qdrant:
# Qdrant starts automatically when needed
npm start
# Check Qdrant status
curl http://localhost:6333/health# Option 1: Docker (Recommended)
docker run -p 6333:6333 qdrant/qdrant
# Option 2: Binary installation
wget https://github.com/qdrant/qdrant/releases/latest/download/qdrant-x86_64-unknown-linux-gnu.tar.gz
tar xzf qdrant-x86_64-unknown-linux-gnu.tar.gz
./qdrant
# Option 3: Cloud deployment
# Use Qdrant Cloud or deploy to your cloud provider# Create collection manually (auto-created by default)
curl -X PUT http://localhost:6333/collections/dataproc_responses \
-H "Content-Type: application/json" \
-d '{
"vectors": {
"size": 384,
"distance": "Cosine"
}
}'
# Check collection info
curl http://localhost:6333/collections/dataproc_responsesSymptoms:
- Responses fall back to verbose mode
- Warning: "Qdrant storage unavailable"
Solutions:
# Check Qdrant status
curl http://localhost:6333/health
# Restart Qdrant
docker restart qdrant
# Check configuration
echo $QDRANT_URLSymptoms:
- Server memory usage increases over time
- Slow response times
Solutions:
# Reduce vector size
export QDRANT_VECTOR_SIZE=256
# Enable collection cleanup
export QDRANT_CLEANUP_ENABLED=true
export QDRANT_MAX_POINTS=10000Symptoms:
- Responses are still verbose
- No resource URIs provided
Solutions:
# Check optimization is enabled
export RESPONSE_OPTIMIZATION_ENABLED=true
# Verify token limit
export RESPONSE_TOKEN_LIMIT=500
# Check tool-specific settings
cat config/response-filter.jsonEnable detailed logging:
export LOG_LEVEL=debug
export RESPONSE_OPTIMIZATION_DEBUG=true
# Start server with debug output
npm start# Check optimization metrics
curl http://localhost:3000/metrics
# Monitor Qdrant performance
curl http://localhost:6333/metrics// Conservative (high quality summaries)
RESPONSE_TOKEN_LIMIT=300
// Balanced (recommended)
RESPONSE_TOKEN_LIMIT=500
// Aggressive (maximum reduction)
RESPONSE_TOKEN_LIMIT=200# Regular cleanup (weekly)
curl -X POST http://localhost:6333/collections/dataproc_responses/points/delete \
-H "Content-Type: application/json" \
-d '{
"filter": {
"must": [
{
"range": {
"timestamp": {
"lt": "2024-01-01T00:00:00Z"
}
}
}
]
}
}'
# Backup collection
curl http://localhost:6333/collections/dataproc_responses/snapshots# docker-compose.yml
version: '3.8'
services:
dataproc-mcp:
image: dataproc-mcp-server:latest
environment:
- RESPONSE_OPTIMIZATION_ENABLED=true
- QDRANT_URL=http://qdrant:6333
depends_on:
- qdrant
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
volumes:
qdrant_data:Search stored responses by content:
// Search for clusters with specific configurations
const results = await qdrantClient.search("dataproc_responses", {
vector: await embedText("high memory clusters"),
limit: 10,
filter: {
must: [
{ key: "tool", match: { value: "list_clusters" } }
]
}
});{
"customRules": {
"list_clusters": {
"priority": ["name", "status", "nodeCount"],
"exclude": ["labels", "metadata"],
"maxItems": 15
},
"get_cluster": {
"priority": ["status", "config.masterConfig", "config.workerConfig"],
"exclude": ["config.softwareConfig.properties"],
"includeMetrics": true
}
}
}// Enable response caching
export RESPONSE_CACHE_ENABLED=true
export RESPONSE_CACHE_TTL=300 // 5 minutes
// Cache hit example
const response = await mcpClient.callTool("list_clusters", {});
// Subsequent calls return cached optimized response- Enable optimization gradually:
# Start with high token limit
export RESPONSE_TOKEN_LIMIT=1000
# Gradually reduce
export RESPONSE_TOKEN_LIMIT=500
export RESPONSE_TOKEN_LIMIT=300- Test critical workflows:
# Test with verbose mode first
npm run test:optimization:verbose
# Then test optimized mode
npm run test:optimization:default- Monitor performance:
# Track token usage
npm run benchmark:tokens
# Monitor response times
npm run benchmark:performance- GitHub Issues: Report optimization issues
- Documentation: Complete optimization docs
- Performance: Benchmark results
Help improve response optimization:
- Performance Testing: Run benchmarks and report results
- Optimization Rules: Suggest better filtering strategies
- Qdrant Integration: Improve storage and retrieval
- Documentation: Enhance guides and examples
🚀 Achieve 60-96% token reduction while maintaining full data access!