This guide explains how to configure the Dataproc MCP server for different use cases, including the new intelligent default parameter management system.
The MCP server now supports intelligent default parameters that dramatically improve user experience by automatically injecting common parameters (like projectId and region) when they're not explicitly provided.
Create config/default-params.json in your MCP server directory:
{
"defaultEnvironment": "production",
"parameters": [
{"name": "projectId", "type": "string", "required": true},
{"name": "region", "type": "string", "required": true, "defaultValue": "us-central1"}
],
"environments": [
{
"environment": "production",
"parameters": {
"projectId": "your-project-id",
"region": "us-central1"
}
},
{
"environment": "development",
"parameters": {
"projectId": "your-dev-project-id",
"region": "us-west1"
}
}
]
}- 🎯 Simplified Tool Usage: Call
get_job_statuswith justjobIdinstead ofprojectId,region, andjobId - 🔄 Backward Compatibility: Still accepts explicit parameters when provided
- 🌍 Multi-Environment Support: Different defaults per environment
- 📊 Resource Integration: Default configuration accessible via
dataproc://config/defaultsresource
- Explicit Tool Parameters (highest priority)
- Default Parameter Configuration (
config/default-params.json) - MCP Environment Variables (from global MCP settings)
- Built-in Defaults (lowest priority)
For most users, configure the MCP server globally in your MCP settings. This is the cleanest approach.
{
"dataproc-server1": {
"command": "node",
"args": [
"@dipseth/dataproc-mcp-server"
],
"disabled": false,
"timeout": 60,
"alwaysAllow": [
"start_dataproc_cluster",
"create_cluster_from_yaml",
"create_cluster_from_profile",
"list_clusters",
"list_tracked_clusters",
"list_profiles",
"get_profile",
"get_cluster",
"submit_hive_query",
"get_query_status",
"get_query_results",
"delete_cluster",
"submit_dataproc_job",
"get_job_status",
"get_job_results",
"get_zeppelin_url"
],
"env": {
"LOG_LEVEL": "error"
}
}
}If you want to use a different profile directory, add this to the env section:
"env": {
"LOG_LEVEL": "error",
"MCP_CONFIG": "{\"profileManager\":{\"rootConfigPath\":\"/path/to/your/profiles\"}}"
}The MCP server uses these defaults:
- Profile Directory:
./profiles(relative to MCP server directory) - State File:
./state/dataproc-state.json - Profile Scan Interval: 5 minutes
- State Save Interval: 1 minute
- Authentication: Environment-independent service account impersonation
- Default Parameters: Loaded from
config/default-params.json(if exists) - Default Environment:
production(configurable)
Only create project-specific configurations when you need to override defaults for a specific project.
- Different service account per project
- Custom profile directories per project
- Different state file locations
-
Create config directory manually (only when needed):
mkdir -p /path/to/your/project/config
-
Create server.json with your overrides:
{ "authentication": { "impersonateServiceAccount": "project-specific-sa@your-project.iam.gserviceaccount.com", "fallbackKeyPath": "/absolute/path/to/source-service-account-key.json", "preferImpersonation": true, "useApplicationDefaultFallback": false }, "profileManager": { "rootConfigPath": "./custom-profiles" } } -
Update your MCP settings to point to the project:
"env": { "MCP_CONFIG": "{\"profileManager\":{\"rootConfigPath\":\"/path/to/your/project/custom-profiles\"}}" }
The MCP server includes default profiles in ./profiles/:
profiles/
├── development/
│ └── small.yaml
└── production/
├── cool-idea-promotions.yaml
└── high-memory/
└── analysis.yaml
Option 1: Use Default Profiles (Recommended)
- Keep your common profiles in the MCP server's
./profiles/directory - All projects can access these profiles
- No need to copy profiles to each project
Option 2: Project-Specific Profiles
- Create profiles in your project directory
- Configure MCP to point to that directory
- Useful when profiles contain project-specific configurations
For detailed information on environment-independent authentication and service account impersonation, refer to the Authentication Implementation Guide.
- Configure default parameters in
config/default-params.jsonfor improved user experience - Use environment-independent authentication with service account impersonation
- Use global MCP configuration for most settings
- Keep common profiles in the MCP server's
./profiles/directory - Only create project-specific configs when you need different service accounts or custom settings
- Don't auto-create directories - create them manually when needed
- Always specify
fallbackKeyPathfor impersonation to ensure environment independence
For detailed authentication best practices, refer to the Authentication Implementation Guide.
- Don't rely on environment variables like
GOOGLE_APPLICATION_CREDENTIALS - Don't copy profiles to every project - use the centralized profiles
- Don't create unnecessary config directories - use defaults when possible
- Don't use complex configuration hierarchies - keep it simple
- Don't enable
useApplicationDefaultFallbackunless you specifically need environment variable fallbacks
If you get "Profile not found" errors:
- Check the profile exists in the configured directory
- Verify the MCP_CONFIG environment variable (if used)
- Use
list_profilestool to see available profiles
- Check log level: Set
LOG_LEVEL=debugto see configuration loading - Verify paths: Ensure profile paths are correct
- Test with defaults: Remove custom configs to test with defaults
If you were using the old auto-creating system:
- Remove auto-created directories: Delete empty
configs/directories - Consolidate profiles: Move profiles to the central
./profiles/directory - Simplify MCP settings: Remove unnecessary MCP_CONFIG overrides
- Test with defaults: Verify everything works with the simplified setup
{
"dataproc-server1": {
"command": "node",
"args": ["/path/to/dataproc-server/build/index.js"],
"disabled": false,
"timeout": 60,
"alwaysAllow": ["*"]
}
}{
"dataproc-server-project-a": {
"command": "node",
"args": ["/path/to/dataproc-server/build/index.js"],
"env": {
"MCP_CONFIG": "{\"authentication\":{\"impersonateServiceAccount\":\"project-a-sa@project-a.iam.gserviceaccount.com\"}}"
}
},
"dataproc-server-project-b": {
"command": "node",
"args": ["/path/to/dataproc-server/build/index.js"],
"env": {
"MCP_CONFIG": "{\"authentication\":{\"impersonateServiceAccount\":\"project-b-sa@project-b.iam.gserviceaccount.com\"}}"
}
}
}{
"defaultEnvironment": "production",
"parameters": [
{"name": "projectId", "type": "string", "required": true},
{"name": "region", "type": "string", "required": true, "defaultValue": "us-central1"}
],
"environments": [
{
"environment": "production",
"parameters": {
"projectId": "your-project-id",
"region": "us-central1"
}
}
]
}{
"defaultEnvironment": "production",
"parameters": [
{"name": "projectId", "type": "string", "required": true},
{"name": "region", "type": "string", "required": true, "defaultValue": "us-central1"},
{"name": "zone", "type": "string", "required": false, "defaultValue": "us-central1-a"}
],
"environments": [
{
"environment": "development",
"parameters": {
"projectId": "dev-project-123",
"region": "us-west1",
"zone": "us-west1-a"
}
},
{
"environment": "staging",
"parameters": {
"projectId": "staging-project-456",
"region": "us-central1",
"zone": "us-central1-b"
}
},
{
"environment": "production",
"parameters": {
"projectId": "your-project-id",
"region": "us-central1",
"zone": "us-central1-a"
}
}
]
}Before (required explicit parameters):
{
"projectId": "your-project-id",
"region": "us-central1",
"jobId": "my-job-id"
}After (with defaults configured):
{
"jobId": "my-job-id"
}Override defaults when needed:
{
"projectId": "different-project",
"region": "us-west1",
"jobId": "my-job-id"
}The MCP server supports optional semantic search capabilities through Qdrant integration:
# Start Qdrant vector database
docker run -p 6334:6333 qdrant/qdrant
# Verify connection
curl http://localhost:6334/healthConfigure semantic search in config/response-filter.json:
{
"qdrant": {
"url": "http://localhost:6334",
"collectionName": "dataproc_knowledge",
"vectorSize": 384,
"distance": "Cosine"
},
"tokenLimits": {
"list_clusters": 500,
"get_cluster": 300,
"default": 400
},
"extractionRules": {
"list_clusters": {
"maxClusters": 10,
"essentialFields": ["clusterName", "status", "machineType"],
"summaryFormat": "table"
}
}
}With Qdrant Enabled:
- Natural language queries: "clusters with pip packages"
- Intelligent data extraction and indexing
- Vector similarity search with confidence scores
- Enhanced filtering and discovery capabilities
Without Qdrant (Graceful Degradation):
- All core functionality remains available
- Standard data retrieval and management
- Helpful setup guidance when semantic features are requested
- No breaking changes or dependencies
# Check if Qdrant is running
docker ps | grep qdrant
# Test connection
curl http://localhost:6334/health
# Check collections
curl http://localhost:6334/collections
# View logs
docker logs $(docker ps -q --filter ancestor=qdrant/qdrant)This approach keeps configuration simple while providing flexibility and dramatically improved user experience.