BrowserControl

Give your AI agent real browser superpowers.
_{Vision-first browser automation for any MCP-compatible AI agent.}

Quick Start • How It Works • Tools • Configuration • Examples • Contributing

Ever wished Claude or Gemini could actually browse the web? Not just fetch URLs, but truly see, click, type, and interact with any website like a human?

BrowserControl is an MCP server that gives your AI agent full browser access with a vision-first approach—no CSS selectors, no XPath, no guessing. Just point at numbers.

✨ What Makes This Different

❌ Traditional Approach

"Find the button with class 'btn-primary'
that contains 'Submit' and is inside
form#contact-form..."

Parse complex DOM structures
Guess at CSS selectors
No JavaScript support
No login persistence
No debugging tools

✅ BrowserControl

"click(7)"

See the rendered page with numbered elements
Just say "click 5" or "type in 3"
Full dynamic JavaScript support
Persistent sessions across restarts
Complete DevTools access

🎯 The Secret: Set of Marks (SoM)

Every screenshot comes annotated with numbered red boxes on interactive elements:

Found 15 interactive elements:
  [1] button - Sign In
  [2] input - Search...
  [3] a - Products
  [4] a - Pricing
  [5] button - Get Started

Your agent sees the numbers and simply calls click(1) to sign in. No CSS selectors. No XPath. No guessing.

🚀 Quick Start

Installation

# Using pip
pip install browsercontrol

# Or with uv (recommended for faster installs)
uv add browsercontrol

# Chromium is auto-installed on first run—no extra steps needed!

Run the Server

# Using the CLI
browsercontrol

# Or as a Python module
python -m browsercontrol

# Or with FastMCP
fastmcp run browsercontrol.server:mcp

Connect to Your AI Agent

BrowserControl works with any MCP-compatible AI agent or IDE. Choose your platform:

Claude Desktop

Add to your Claude configuration file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Linux: ~/.config/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "browsercontrol": {
      "command": "browsercontrol"
    }
  }
}

Restart Claude Desktop, then ask:

"Go to GitHub and star the browsercontrol repo"

� Gemini CLI / Google AI Studio

If using the Gemini CLI or Google AI Studio with MCP support:

# Set up MCP configuration
export MCP_SERVERS='{"browsercontrol": {"command": "browsercontrol"}}'

# Or add to your Gemini config file

For Google AI Studio, configure in the MCP settings panel.

🔧 Cline (VS Code Extension)

Install the Cline extension
Open Cline settings (gear icon)
Navigate to "MCP Servers"
Add a new server:

{
  "browsercontrol": {
    "command": "browsercontrol"
  }
}

🤖 Continue.dev (VS Code/JetBrains)

Add to your Continue configuration (~/.continue/config.json):

{
  "mcpServers": [
    {
      "name": "browsercontrol",
      "command": "browsercontrol"
    }
  ]
}

🎯 Cursor IDE

Open Cursor Settings
Navigate to "Features" → "Model Context Protocol"
Add server configuration:

{
  "browsercontrol": {
    "command": "browsercontrol"
  }
}

🔌 Zed Editor

Add to your Zed settings (~/.config/zed/settings.json):

{
  "context_servers": {
    "browsercontrol": {
      "command": {
        "path": "browsercontrol"
      }
    }
  }
}

🐍 Custom Python Integration

Use the MCP Python SDK to integrate BrowserControl into your own agent:

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

# Connect to BrowserControl
server_params = StdioServerParameters(
    command="browsercontrol",
    args=[],
)

async with stdio_client(server_params) as (read, write):
    async with ClientSession(read, write) as session:
        # Initialize
        await session.initialize()

        # List available tools
        tools = await session.list_tools()

        # Call a tool
        result = await session.call_tool("navigate_to", {
            "url": "https://github.com"
        })

🚀 Using with uv or pipx

If you installed with uv or pipx, use the full path:

{
  "mcpServers": {
    "browsercontrol": {
      "command": "uvx",
      "args": ["browsercontrol"]
    }
  }
}

Or with pipx:

{
  "mcpServers": {
    "browsercontrol": {
      "command": "pipx",
      "args": ["run", "browsercontrol"]
    }
  }
}

🔧 Advanced Configuration

You can pass environment variables to customize BrowserControl:

{
  "mcpServers": {
    "browsercontrol": {
      "command": "browsercontrol",
      "env": {
        "BROWSER_HEADLESS": "false",
        "BROWSER_VIEWPORT_WIDTH": "1920",
        "BROWSER_VIEWPORT_HEIGHT": "1080",
        "LOG_LEVEL": "DEBUG"
      }
    }
  }
}

See Configuration for all available options.

🥊 Head-to-Head Comparison

Feature	BrowserControl	Playwright MCP	Stagehand	Browser-Use	AgentQL
Vision-First (SoM)	✅ Numbered boxes	❌ Text tree	⚠️ AI vision	⚠️ AI vision	❌ Selectors
Multi-Tab Support	✅ Full control	⚠️ Implicit	⚠️ Implicit	⚠️ Basic	❌ None
Cookie Management	✅ Direct tools	⚠️ JS only	⚠️ JS only	⚠️ Basic	❌ None
File Uploads	✅ Native tool	⚠️ Manual	❌ No	❌ No	❌ No
Developer Tools	✅ 8 tools	❌ None	❌ None	❌ None	❌ None
Session Recording	✅ Built-in	⚠️ Manual	❌ None	❌ None	❌ None
Persistent Sessions	✅ Automatic	⚠️ Manual	❌ None	❌ None	❌ None
Token Efficiency	✅ Tiny IDs	⚠️ Large tree	❌ Full images	❌ Full images	⚠️ Query results
100% Local/Offline	✅ Yes	✅ Yes	❌ Needs LLM API	❌ Needs LLM API	❌ Cloud only
Monthly Cost (1k actions)	$0	$0	~$30-50	~$20-40	~$50+

💪 Key Advantages

1. Multi-Tab Orchestration

Unlike other tools that get "lost" when a new window opens:

list_tabs() — See every open page, title, and URL
switch_tab(index) — Multitask between different sites
create_tab(url) — Open references or parallel workflows

2. Session & Cookie Management

Stop fighting with login forms. Inject or inspect session state directly:

set_cookie() — Log in instantly by injecting an auth token
get_cookies() — Debug session issues or export state
clear_cookies() — Fresh start without clearing the whole profile

3. Reliable File Uploads

Most AI agents fail when they hit a <input type="file">. BrowserControl uses native browser engine hooks:

upload_file(id, path) — Just point at the button and the local file

4. Developer Tools Suite

Debug like a pro with tools no one else provides:

get_console_logs()      # See browser errors
get_network_requests()  # Monitor API calls
get_page_errors()       # Catch JS exceptions
run_in_console(code)    # Debug in real-time
inspect_element(5)      # Get computed styles
get_page_performance()  # Core Web Vitals

5. Session Recording

start_recording()  →  Browse around  →  stop_recording()
                                              ↓
                               session_20260202.zip
                         (View with Playwright trace viewer)

6. Dynamic Viewport Control

Test responsive designs or emulate mobile screens on the fly:

set_viewport(width, height) — Change resolution without restarting

7. True Persistence

What Persists	BrowserControl	Others
Cookies	✅	❌
localStorage	✅	❌
Session tokens	✅	❌
Login state	✅	❌
Browser history	✅	❌

Result: Log in once, stay logged in across sessions.

🛠️ Available Tools

Navigation

Tool	Description
`navigate_to(url)`	Go to a URL
`go_back()`	Navigate back
`go_forward()`	Navigate forward
`refresh_page()`	Reload the page
`scroll(direction, amount)`	Scroll up/down/left/right

Interaction

Tool	Description
`click(element_id)`	Click element by number
`click_at(x, y)`	Click at coordinates
`type_text(element_id, text)`	Type into input field
`press_key(key)`	Press keyboard key (Enter, Tab, etc.)
`hover(element_id)`	Hover over element
`scroll_to_element(element_id)`	Scroll element into view
`upload_file(element_id, path)`	Upload a file to an input
`wait(seconds)`	Wait for page loading

Tab Management

Tool	Description
`create_tab(url)`	Open a new browser tab
`switch_tab(index)`	Switch to a tab by its index
`close_tab(index)`	Close a specific tab
`list_tabs()`	List all open tabs and URLs

Forms

Tool	Description
`select_option(element_id, option)`	Select dropdown option
`check_checkbox(element_id)`	Toggle checkbox
`upload_file(element_id, file_path)`	Upload file to input

Content Extraction

Tool	Description
`get_page_content()`	Get page as markdown
`get_text(element_id)`	Get element text
`get_page_info()`	Get URL and title
`run_javascript(script)`	Execute JavaScript
`screenshot(annotate, full_page)`	Take screenshot

Developer Tools

Tool	Description
`get_console_logs()`	Browser console output
`get_network_requests()`	API calls and responses
`get_page_errors()`	JavaScript errors
`run_in_console(code)`	Execute JS in console
`inspect_element(id)`	Element styles/properties
`get_cookies()`	List browser cookies
`set_cookie(name, value, ...)`	Set a cookie
`delete_cookie(name)`	Remove a cookie
`clear_cookies()`	Clear all cookies
`set_viewport(width, height)`	Change window size
`get_page_performance()`	Load times, Web Vitals

Recording

Tool	Description
`start_recording()`	Begin session recording
`stop_recording()`	Save recording
`take_snapshot()`	Save screenshot + HTML
`list_recordings()`	View saved sessions

⚙️ Configuration

Configure via environment variables:

Variable	Default	Description
`BROWSER_HEADLESS`	`true`	Run without visible window
`BROWSER_VIEWPORT_WIDTH`	`1280`	Viewport width in pixels
`BROWSER_VIEWPORT_HEIGHT`	`720`	Viewport height in pixels
`BROWSER_TIMEOUT`	`30000`	Navigation timeout (ms)
`BROWSER_USER_DATA_DIR`	`~/.browsercontrol/user_data`	Browser profile path
`BROWSER_EXTENSION_PATH`	—	Path to browser extension
`LOG_LEVEL`	`INFO`	Logging verbosity

Examples:

# Run with visible browser (for debugging)
BROWSER_HEADLESS=false browsercontrol

# Mobile viewport emulation
BROWSER_VIEWPORT_WIDTH=375 BROWSER_VIEWPORT_HEIGHT=812 browsercontrol

# Verbose logging
LOG_LEVEL=DEBUG browsercontrol

📖 Examples

Web Research

You: "Go to Wikipedia and find out when Python was created"

Claude: *navigates to wikipedia.org*
        *types "Python programming language" in search*
        *clicks search result*
        *reads the page*

        "Python was created by Guido van Rossum and first released in 1991."

Debug a Web App

You: "Go to my app at localhost:3000 and check for any errors"

Claude: *navigates to localhost:3000*
        *calls get_console_logs()*

        "I found 2 errors:
         [ERROR] Uncaught TypeError: Cannot read property 'map' of undefined
         [ERROR] Failed to load resource: 404 /api/users"

        *calls get_network_requests()*

        "The /api/users endpoint returned 404. Check your API route."

Automated Testing with Recording

You: "Test the login flow on my site. Start recording."

Claude: *calls start_recording()*
        *navigates to site*
        *types email in element 3*
        *types password in element 4*
        *clicks Login button (element 5)*
        *verifies successful redirect*
        *calls stop_recording()*

        "Login test passed! Recording saved to session_20260202.zip"

Form Automation

You: "Fill out the contact form on example.com"

Claude: *navigates to example.com/contact*
        *types name in element 2*
        *types email in element 3*
        *types message in element 4*
        *clicks Submit (element 5)*

        "Form submitted successfully!"

🏗️ Architecture

┌─────────────────┐     ┌──────────────────┐     ┌─────────────┐
│   AI Agent      │────▶│  BrowserControl  │────▶│   Browser   │
│ (Claude/Gemini) │◀────│   MCP Server     │◀────│ (Chromium)  │
└─────────────────┘     └──────────────────┘     └─────────────┘
        │                        │                      │
        │   "click(5)"           │   mouse.click()      │
        │◀───────────────────────│◀─────────────────────│
        │   [annotated           │   [screenshot +      │
        │    screenshot]         │    element map]      │

How It Works

AI sends command — click(5)
Server finds element — Looks up element #5 from the last screenshot
Browser acts — Clicks at the element's coordinates
Capture state — Takes new screenshot, detects elements
Annotate — Draws numbered boxes on interactive elements
Return to AI — Sends annotated image + element list

📁 Project Structure

browsercontrol/
├── __init__.py          # Package exports
├── __main__.py          # CLI entry point
├── server.py            # MCP server setup
├── browser.py           # BrowserManager with SoM
├── config.py            # Environment configuration
└── tools/
    ├── navigation.py    # Navigation tools
    ├── interaction.py   # Click, type, hover tools
    ├── forms.py         # Form handling tools
    ├── content.py       # Content extraction tools
    ├── devtools.py      # Developer tools
    ├── recording.py     # Session recording tools
    └── tabs.py          # Tab management tools

🔧 Troubleshooting

"Missing X server" Error

Set BROWSER_HEADLESS=true or run with xvfb:

xvfb-run browsercontrol

Browser Not Starting

Chromium auto-installs on first run. If it fails, install manually:

python -m playwright install chromium

Session Not Persisting

Check that BROWSER_USER_DATA_DIR is writable:

ls -la ~/.browsercontrol/

Connection Refused

Ensure no other instance is running:

pkill -f browsercontrol
browsercontrol

View Session Recordings

Open recordings in the Playwright trace viewer:

npx playwright show-trace ~/.browsercontrol/recordings/session.zip

🤝 Contributing

Contributions are welcome! Check out our Contributing Guide for details.

Ideas for contributions:

# Clone and install
git clone https://github.com/adityasasidhar/browsercontrol
cd browsercontrol
uv sync

# Run tests
uv run pytest

# Run in development
uv run fastmcp dev browsercontrol/server.py

📄 License

MIT License — Use it however you want.

🙏 Acknowledgments

Vision-first approach inspired by Google's AntiGravity IDE
Built with FastMCP and Playwright
Thanks to the MCP community for making AI-tool integration accessible

Built for AI agents that need to see the web.

⭐ Star on GitHub • 🐛 Report Bug • 💡 Request Feature

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
assets		assets
browsercontrol		browsercontrol
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_QUALITY.md		CODE_QUALITY.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

adityasasidhar/browsercontrol

Folders and files

Latest commit

History

Repository files navigation