| layout | title | description | permalink |
|---|---|---|---|
default |
Quick Start Guide |
Get up and running with the Dataproc MCP Server in just 5 minutes |
/QUICK_START/ |
Get up and running with the Dataproc MCP Server in just 5 minutes!
- Node.js 18+ - Download here
- Google Cloud Project with Dataproc API enabled
- Authentication - Service account key or gcloud CLI
# Install globally for easy access
npm install -g @dataproc/mcp-server
# Or install locally in your project
npm install @dataproc/mcp-server# Run the interactive setup
dataproc-mcp --setup
# This will create:
# - config/server.json (server configuration)
# - config/default-params.json (default parameters)
# - profiles/ (cluster profile directory)For detailed authentication setup, refer to the Authentication Implementation Guide.
Edit config/default-params.json:
{
"defaultEnvironment": "development",
"parameters": [
{"name": "projectId", "type": "string", "required": true},
{"name": "region", "type": "string", "required": true, "defaultValue": "us-central1"}
],
"environments": [
{
"environment": "development",
"parameters": {
"projectId": "your-project-id",
"region": "us-central1"
}
}
]
}For enhanced natural language queries (optional):
# Install and start Qdrant vector database
docker run -p 6334:6333 qdrant/qdrant
# Verify Qdrant is running
curl http://localhost:6334/healthBenefits of Semantic Search:
- Natural language cluster queries: "show me clusters with pip packages"
- Intelligent data extraction and filtering
- Enhanced search capabilities with confidence scoring
Note: This is completely optional - all core functionality works without Qdrant.
# Start the MCP server
dataproc-mcp
# Or run directly with Node.js
node /path/to/dataproc-mcp/build/index.jsNEW: Full Claude.ai compatibility is now available!
For Claude.ai web app integration, see our dedicated guides:
- Complete Claude.ai Integration Guide - Detailed setup with troubleshooting
Key Features:
- โ All 22 MCP tools available in Claude.ai
- โ HTTPS tunneling with Cloudflare
- โ OAuth authentication with GitHub
- โ Secure WebSocket connections
Add to your Claude Desktop configuration:
File: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"dataproc": {
"command": "npx",
"args": [
"@dipseth/dataproc-mcp-server@latest"
],
"env": {
"LOG_LEVEL": "info",
"DATAPROC_CONFIG_PATH": "/path/to/your/config/server.json"
}
}
}
}Add to your Roo MCP settings:
File: .roo/mcp.json
{
"mcpServers": {
"dataproc": {
"command": "npx",
"args": [
"@dipseth/dataproc-mcp-server@latest"
],
"env": {
"LOG_LEVEL": "info",
"DATAPROC_CONFIG_PATH": "/path/to/your/config/server.json"
},
"alwaysAllow": []
}
}
}Once connected, try these commands in your MCP client:
What Dataproc tools are available?
Create a small Dataproc cluster named "test-cluster" in my project
Show me all my Dataproc clusters
Submit a Spark job to process data from gs://my-bucket/data.csv
Cancel the job with ID "my-long-running-job-12345"
Check the status of job "my-job-67890"
Show me clusters with machine learning packages installed
Find clusters using high-memory configurations
Create a custom cluster profile in profiles/my-cluster.yaml:
my-project-dev-cluster:
region: us-central1
tags:
- development
- testing
labels:
environment: dev
team: data-engineering
cluster_config:
master_config:
num_instances: 1
machine_type_uri: n1-standard-4
disk_config:
boot_disk_type: pd-standard
boot_disk_size_gb: 100
worker_config:
num_instances: 2
machine_type_uri: n1-standard-4
disk_config:
boot_disk_type: pd-standard
boot_disk_size_gb: 100
is_preemptible: true # Cost savings for dev
software_config:
image_version: 2.1.1-debian10
optional_components:
- JUPYTER
properties:
dataproc:dataproc.allow.zero.workers: "true"
lifecycle_config:
idle_delete_ttl:
seconds: 1800 # 30 minutes# Check if the server starts correctly
dataproc-mcp --test
# Verify authentication
dataproc-mcp --verify-auth
# List available profiles
dataproc-mcp --list-profiles# Run comprehensive health check
npm run pre-flight # If installed from source
# Or basic connectivity test
curl -X POST http://localhost:3000/health # If running as HTTP server# Check your credentials
gcloud auth list
gcloud config list project
# Verify service account permissions
gcloud projects get-iam-policy YOUR_PROJECT_ID# Enable required APIs
gcloud services enable dataproc.googleapis.com
gcloud services enable compute.googleapis.com
gcloud services enable storage.googleapis.com# Check network connectivity
ping google.com
# Verify firewall rules
gcloud compute firewall-rules list- Check the logs: Look for error messages in the console output
- Verify configuration: Ensure all required fields are filled
- Test authentication: Use
gcloud auth application-default print-access-token - Check permissions: Verify your service account has Dataproc Admin role
- API Reference - Complete tool documentation
- Configuration Examples - Real-world setups
- Security Guide - Best practices
- Testing Guide - Testing and debugging information
- Multi-environment setup for dev/staging/production
- Custom cluster profiles for different workloads
- Automated job scheduling with cron-like syntax
- Performance monitoring and alerting
- Cost optimization with preemptible instances
- GitHub Issues - Bug reports and feature requests
- Community Support - Community Q&A
- Contributing Guide - How to contribute
Your Dataproc MCP Server is now configured and ready to use. Start by creating your first cluster and exploring the available tools through your MCP client.
Happy data processing! ๐
Need help? Check our testing guide or open an issue.