Dataproc MCP Server Configuration Guide

This guide explains how to configure the Dataproc MCP server for different use cases, including the new intelligent default parameter management system.

✨ New: Default Parameter Management

The MCP server now supports intelligent default parameters that dramatically improve user experience by automatically injecting common parameters (like projectId and region) when they're not explicitly provided.

Quick Setup: Default Parameters

Create config/default-params.json in your MCP server directory:

{
  "defaultEnvironment": "production",
  "parameters": [
    {"name": "projectId", "type": "string", "required": true},
    {"name": "region", "type": "string", "required": true, "defaultValue": "us-central1"}
  ],
  "environments": [
    {
      "environment": "production",
      "parameters": {
        "projectId": "your-project-id",
        "region": "us-central1"
      }
    },
    {
      "environment": "development",
      "parameters": {
        "projectId": "your-dev-project-id",
        "region": "us-west1"
      }
    }
  ]
}

Benefits of Default Parameters

🎯 Simplified Tool Usage: Call get_job_status with just jobId instead of projectId, region, and jobId
🔄 Backward Compatibility: Still accepts explicit parameters when provided
🌍 Multi-Environment Support: Different defaults per environment
📊 Resource Integration: Default configuration accessible via dataproc://config/defaults resource

Configuration Hierarchy (Priority Order)

Explicit Tool Parameters (highest priority)
Default Parameter Configuration (config/default-params.json)
MCP Environment Variables (from global MCP settings)
Built-in Defaults (lowest priority)

Global MCP Configuration (Recommended)

For most users, configure the MCP server globally in your MCP settings. This is the cleanest approach.

Global MCP Settings Example

{
  "dataproc-server1": {
    "command": "node",
    "args": [
      "@dipseth/dataproc-mcp-server"
    ],
    "disabled": false,
    "timeout": 60,
    "alwaysAllow": [
      "start_dataproc_cluster",
      "create_cluster_from_yaml",
      "create_cluster_from_profile",
      "list_clusters",
      "list_tracked_clusters",
      "list_profiles",
      "get_profile",
      "get_cluster",
      "submit_hive_query",
      "get_query_status",
      "get_query_results",
      "delete_cluster",
      "submit_dataproc_job",
      "get_job_status",
      "get_job_results",
      "get_zeppelin_url"
    ],
    "env": {
      "LOG_LEVEL": "error"
    }
  }
}

Optional: Custom Profile Directory

If you want to use a different profile directory, add this to the env section:

"env": {
  "LOG_LEVEL": "error",
  "MCP_CONFIG": "{\"profileManager\":{\"rootConfigPath\":\"/path/to/your/profiles\"}}"
}

Default Configuration

The MCP server uses these defaults:

Profile Directory: ./profiles (relative to MCP server directory)
State File: ./state/dataproc-state.json
Profile Scan Interval: 5 minutes
State Save Interval: 1 minute
Authentication: Environment-independent service account impersonation
Default Parameters: Loaded from config/default-params.json (if exists)
Default Environment: production (configurable)

Project-Specific Configuration (Optional)

Only create project-specific configurations when you need to override defaults for a specific project.

When to Use Project-Specific Config

Different service account per project
Custom profile directories per project
Different state file locations

Creating Project-Specific Config

Create config directory manually (only when needed):
```
mkdir -p /path/to/your/project/config
```

Create server.json with your overrides:

{
  "authentication": {
    "impersonateServiceAccount": "project-specific-sa@your-project.iam.gserviceaccount.com",
    "fallbackKeyPath": "/absolute/path/to/source-service-account-key.json",
    "preferImpersonation": true,
    "useApplicationDefaultFallback": false
  },
  "profileManager": {
    "rootConfigPath": "./custom-profiles"
  }
}

Update your MCP settings to point to the project:

"env": {
  "MCP_CONFIG": "{\"profileManager\":{\"rootConfigPath\":\"/path/to/your/project/custom-profiles\"}}"
}

Profile Management

Default Profiles

The MCP server includes default profiles in ./profiles/:

profiles/
├── development/
│   └── small.yaml
└── production/
    ├── cool-idea-promotions.yaml
    └── high-memory/
        └── analysis.yaml

Using Profiles Across Projects

Option 1: Use Default Profiles (Recommended)

Keep your common profiles in the MCP server's ./profiles/ directory
All projects can access these profiles
No need to copy profiles to each project

Option 2: Project-Specific Profiles

Create profiles in your project directory
Configure MCP to point to that directory
Useful when profiles contain project-specific configurations

Environment-Independent Authentication

For detailed information on environment-independent authentication and service account impersonation, refer to the Authentication Implementation Guide.

Best Practices

✅ Recommended Approach

Configure default parameters in config/default-params.json for improved user experience
Use environment-independent authentication with service account impersonation
Use global MCP configuration for most settings
Keep common profiles in the MCP server's ./profiles/ directory
Only create project-specific configs when you need different service accounts or custom settings
Don't auto-create directories - create them manually when needed
Always specify fallbackKeyPath for impersonation to ensure environment independence

✅ Authentication Best Practices

For detailed authentication best practices, refer to the Authentication Implementation Guide.

❌ Avoid

Don't rely on environment variables like GOOGLE_APPLICATION_CREDENTIALS
Don't copy profiles to every project - use the centralized profiles
Don't create unnecessary config directories - use defaults when possible
Don't use complex configuration hierarchies - keep it simple
Don't enable useApplicationDefaultFallback unless you specifically need environment variable fallbacks

Troubleshooting

Profile Not Found

If you get "Profile not found" errors:

Check the profile exists in the configured directory
Verify the MCP_CONFIG environment variable (if used)
Use list_profiles tool to see available profiles

Configuration Issues

Check log level: Set LOG_LEVEL=debug to see configuration loading
Verify paths: Ensure profile paths are correct
Test with defaults: Remove custom configs to test with defaults

Migration from Old System

If you were using the old auto-creating system:

Remove auto-created directories: Delete empty configs/ directories
Consolidate profiles: Move profiles to the central ./profiles/ directory
Simplify MCP settings: Remove unnecessary MCP_CONFIG overrides
Test with defaults: Verify everything works with the simplified setup

Examples

Simple Global Setup

{
  "dataproc-server1": {
    "command": "node",
    "args": ["/path/to/dataproc-server/build/index.js"],
    "disabled": false,
    "timeout": 60,
    "alwaysAllow": ["*"]
  }
}

Multi-Project Setup

{
  "dataproc-server-project-a": {
    "command": "node",
    "args": ["/path/to/dataproc-server/build/index.js"],
    "env": {
      "MCP_CONFIG": "{\"authentication\":{\"impersonateServiceAccount\":\"project-a-sa@project-a.iam.gserviceaccount.com\"}}"
    }
  },
  "dataproc-server-project-b": {
    "command": "node",
    "args": ["/path/to/dataproc-server/build/index.js"],
    "env": {
      "MCP_CONFIG": "{\"authentication\":{\"impersonateServiceAccount\":\"project-b-sa@project-b.iam.gserviceaccount.com\"}}"
    }
  }
}

Default Parameter Examples

Basic Default Parameters Setup

{
  "defaultEnvironment": "production",
  "parameters": [
    {"name": "projectId", "type": "string", "required": true},
    {"name": "region", "type": "string", "required": true, "defaultValue": "us-central1"}
  ],
  "environments": [
    {
      "environment": "production",
      "parameters": {
        "projectId": "your-project-id",
        "region": "us-central1"
      }
    }
  ]
}

Multi-Environment Setup

{
  "defaultEnvironment": "production",
  "parameters": [
    {"name": "projectId", "type": "string", "required": true},
    {"name": "region", "type": "string", "required": true, "defaultValue": "us-central1"},
    {"name": "zone", "type": "string", "required": false, "defaultValue": "us-central1-a"}
  ],
  "environments": [
    {
      "environment": "development",
      "parameters": {
        "projectId": "dev-project-123",
        "region": "us-west1",
        "zone": "us-west1-a"
      }
    },
    {
      "environment": "staging",
      "parameters": {
        "projectId": "staging-project-456",
        "region": "us-central1",
        "zone": "us-central1-b"
      }
    },
    {
      "environment": "production",
      "parameters": {
        "projectId": "your-project-id",
        "region": "us-central1",
        "zone": "us-central1-a"
      }
    }
  ]
}

Usage Examples

Before (required explicit parameters):

{
  "projectId": "your-project-id",
  "region": "us-central1",
  "jobId": "my-job-id"
}

After (with defaults configured):

{
  "jobId": "my-job-id"
}

Override defaults when needed:

{
  "projectId": "different-project",
  "region": "us-west1",
  "jobId": "my-job-id"
}

Knowledge Base and Semantic Search Configuration

Qdrant Vector Database Setup (Optional)

The MCP server supports optional semantic search capabilities through Qdrant integration:

# Start Qdrant vector database
docker run -p 6334:6333 qdrant/qdrant

# Verify connection
curl http://localhost:6334/health

Response Filter Configuration

Configure semantic search in config/response-filter.json:

{
  "qdrant": {
    "url": "http://localhost:6334",
    "collectionName": "dataproc_knowledge",
    "vectorSize": 384,
    "distance": "Cosine"
  },
  "tokenLimits": {
    "list_clusters": 500,
    "get_cluster": 300,
    "default": 400
  },
  "extractionRules": {
    "list_clusters": {
      "maxClusters": 10,
      "essentialFields": ["clusterName", "status", "machineType"],
      "summaryFormat": "table"
    }
  }
}

Semantic Search Benefits

With Qdrant Enabled:

Natural language queries: "clusters with pip packages"
Intelligent data extraction and indexing
Vector similarity search with confidence scores
Enhanced filtering and discovery capabilities

Without Qdrant (Graceful Degradation):

All core functionality remains available
Standard data retrieval and management
Helpful setup guidance when semantic features are requested
No breaking changes or dependencies

Troubleshooting Qdrant Setup

# Check if Qdrant is running
docker ps | grep qdrant

# Test connection
curl http://localhost:6334/health

# Check collections
curl http://localhost:6334/collections

# View logs
docker logs $(docker ps -q --filter ancestor=qdrant/qdrant)

This approach keeps configuration simple while providing flexibility and dramatically improved user experience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataproc MCP Server Configuration Guide

✨ New: Default Parameter Management

Quick Setup: Default Parameters

Benefits of Default Parameters

Configuration Hierarchy (Priority Order)

Global MCP Configuration (Recommended)

Global MCP Settings Example

Optional: Custom Profile Directory

Default Configuration

Project-Specific Configuration (Optional)

When to Use Project-Specific Config

Creating Project-Specific Config

Profile Management

Default Profiles

Using Profiles Across Projects

Environment-Independent Authentication

Best Practices

✅ Recommended Approach

✅ Authentication Best Practices

❌ Avoid

Troubleshooting

Profile Not Found

Configuration Issues

Migration from Old System

Examples

Simple Global Setup

Multi-Project Setup

Default Parameter Examples

Basic Default Parameters Setup

Multi-Environment Setup

Usage Examples

Knowledge Base and Semantic Search Configuration

Qdrant Vector Database Setup (Optional)

Response Filter Configuration

Semantic Search Benefits

Troubleshooting Qdrant Setup

FilesExpand file tree

CONFIGURATION_GUIDE.md

Latest commit

History

CONFIGURATION_GUIDE.md

File metadata and controls

Dataproc MCP Server Configuration Guide

✨ New: Default Parameter Management

Quick Setup: Default Parameters

Benefits of Default Parameters

Configuration Hierarchy (Priority Order)

Global MCP Configuration (Recommended)

Global MCP Settings Example

Optional: Custom Profile Directory

Default Configuration

Project-Specific Configuration (Optional)

When to Use Project-Specific Config

Creating Project-Specific Config

Profile Management

Default Profiles

Using Profiles Across Projects

Environment-Independent Authentication

Best Practices

✅ Recommended Approach

✅ Authentication Best Practices

❌ Avoid

Troubleshooting

Profile Not Found

Configuration Issues

Migration from Old System

Examples

Simple Global Setup

Multi-Project Setup

Default Parameter Examples

Basic Default Parameters Setup

Multi-Environment Setup

Usage Examples

Knowledge Base and Semantic Search Configuration

Qdrant Vector Database Setup (Optional)

Response Filter Configuration

Semantic Search Benefits

Troubleshooting Qdrant Setup