Give your AI agent real browser superpowers.
Vision-first browser automation for any MCP-compatible AI agent.
Quick Start • How It Works • Tools • Configuration • Examples • Contributing
Ever wished Claude or Gemini could actually browse the web? Not just fetch URLs, but truly see, click, type, and interact with any website like a human?
BrowserControl is an MCP server that gives your AI agent full browser access with a vision-first approach—no CSS selectors, no XPath, no guessing. Just point at numbers.
|
|
Every screenshot comes annotated with numbered red boxes on interactive elements:
Found 15 interactive elements:
[1] button - Sign In
[2] input - Search...
[3] a - Products
[4] a - Pricing
[5] button - Get Started
Your agent sees the numbers and simply calls click(1) to sign in. No CSS selectors. No XPath. No guessing.
# Using pip
pip install browsercontrol
# Or with uv (recommended for faster installs)
uv add browsercontrol
# Chromium is auto-installed on first run—no extra steps needed!# Using the CLI
browsercontrol
# Or as a Python module
python -m browsercontrol
# Or with FastMCP
fastmcp run browsercontrol.server:mcpBrowserControl works with any MCP-compatible AI agent or IDE. Choose your platform:
Claude Desktop
Add to your Claude configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"browsercontrol": {
"command": "browsercontrol"
}
}
}Restart Claude Desktop, then ask:
"Go to GitHub and star the browsercontrol repo"
� Gemini CLI / Google AI Studio
If using the Gemini CLI or Google AI Studio with MCP support:
# Set up MCP configuration
export MCP_SERVERS='{"browsercontrol": {"command": "browsercontrol"}}'
# Or add to your Gemini config fileFor Google AI Studio, configure in the MCP settings panel.
🔧 Cline (VS Code Extension)
- Install the Cline extension
- Open Cline settings (gear icon)
- Navigate to "MCP Servers"
- Add a new server:
{
"browsercontrol": {
"command": "browsercontrol"
}
}🤖 Continue.dev (VS Code/JetBrains)
Add to your Continue configuration (~/.continue/config.json):
{
"mcpServers": [
{
"name": "browsercontrol",
"command": "browsercontrol"
}
]
}🎯 Cursor IDE
- Open Cursor Settings
- Navigate to "Features" → "Model Context Protocol"
- Add server configuration:
{
"browsercontrol": {
"command": "browsercontrol"
}
}🔌 Zed Editor
Add to your Zed settings (~/.config/zed/settings.json):
{
"context_servers": {
"browsercontrol": {
"command": {
"path": "browsercontrol"
}
}
}
}🐍 Custom Python Integration
Use the MCP Python SDK to integrate BrowserControl into your own agent:
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
# Connect to BrowserControl
server_params = StdioServerParameters(
command="browsercontrol",
args=[],
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Initialize
await session.initialize()
# List available tools
tools = await session.list_tools()
# Call a tool
result = await session.call_tool("navigate_to", {
"url": "https://github.com"
})🚀 Using with uv or pipx
If you installed with uv or pipx, use the full path:
{
"mcpServers": {
"browsercontrol": {
"command": "uvx",
"args": ["browsercontrol"]
}
}
}Or with pipx:
{
"mcpServers": {
"browsercontrol": {
"command": "pipx",
"args": ["run", "browsercontrol"]
}
}
}🔧 Advanced Configuration
You can pass environment variables to customize BrowserControl:
{
"mcpServers": {
"browsercontrol": {
"command": "browsercontrol",
"env": {
"BROWSER_HEADLESS": "false",
"BROWSER_VIEWPORT_WIDTH": "1920",
"BROWSER_VIEWPORT_HEIGHT": "1080",
"LOG_LEVEL": "DEBUG"
}
}
}
}See Configuration for all available options.
| Feature | BrowserControl | Playwright MCP | Stagehand | Browser-Use | AgentQL |
|---|---|---|---|---|---|
| Vision-First (SoM) | ✅ Numbered boxes | ❌ Text tree | ❌ Selectors | ||
| Multi-Tab Support | ✅ Full control | ❌ None | |||
| Cookie Management | ✅ Direct tools | ❌ None | |||
| File Uploads | ✅ Native tool | ❌ No | ❌ No | ❌ No | |
| Developer Tools | ✅ 8 tools | ❌ None | ❌ None | ❌ None | ❌ None |
| Session Recording | ✅ Built-in | ❌ None | ❌ None | ❌ None | |
| Persistent Sessions | ✅ Automatic | ❌ None | ❌ None | ❌ None | |
| Token Efficiency | ✅ Tiny IDs | ❌ Full images | ❌ Full images | ||
| 100% Local/Offline | ✅ Yes | ✅ Yes | ❌ Needs LLM API | ❌ Needs LLM API | ❌ Cloud only |
| Monthly Cost (1k actions) | $0 | $0 | ~$30-50 | ~$20-40 | ~$50+ |
Unlike other tools that get "lost" when a new window opens:
list_tabs()— See every open page, title, and URLswitch_tab(index)— Multitask between different sitescreate_tab(url)— Open references or parallel workflows
Stop fighting with login forms. Inject or inspect session state directly:
set_cookie()— Log in instantly by injecting an auth tokenget_cookies()— Debug session issues or export stateclear_cookies()— Fresh start without clearing the whole profile
Most AI agents fail when they hit a <input type="file">. BrowserControl uses native browser engine hooks:
upload_file(id, path)— Just point at the button and the local file
Debug like a pro with tools no one else provides:
get_console_logs() # See browser errors
get_network_requests() # Monitor API calls
get_page_errors() # Catch JS exceptions
run_in_console(code) # Debug in real-time
inspect_element(5) # Get computed styles
get_page_performance() # Core Web Vitalsstart_recording() → Browse around → stop_recording()
↓
session_20260202.zip
(View with Playwright trace viewer)
Test responsive designs or emulate mobile screens on the fly:
set_viewport(width, height)— Change resolution without restarting
| What Persists | BrowserControl | Others |
|---|---|---|
| Cookies | ✅ | ❌ |
| localStorage | ✅ | ❌ |
| Session tokens | ✅ | ❌ |
| Login state | ✅ | ❌ |
| Browser history | ✅ | ❌ |
Result: Log in once, stay logged in across sessions.
| Tool | Description |
|---|---|
navigate_to(url) |
Go to a URL |
go_back() |
Navigate back |
go_forward() |
Navigate forward |
refresh_page() |
Reload the page |
scroll(direction, amount) |
Scroll up/down/left/right |
| Tool | Description |
|---|---|
click(element_id) |
Click element by number |
click_at(x, y) |
Click at coordinates |
type_text(element_id, text) |
Type into input field |
press_key(key) |
Press keyboard key (Enter, Tab, etc.) |
hover(element_id) |
Hover over element |
scroll_to_element(element_id) |
Scroll element into view |
upload_file(element_id, path) |
Upload a file to an input |
wait(seconds) |
Wait for page loading |
| Tool | Description |
|---|---|
create_tab(url) |
Open a new browser tab |
switch_tab(index) |
Switch to a tab by its index |
close_tab(index) |
Close a specific tab |
list_tabs() |
List all open tabs and URLs |
| Tool | Description |
|---|---|
select_option(element_id, option) |
Select dropdown option |
check_checkbox(element_id) |
Toggle checkbox |
upload_file(element_id, file_path) |
Upload file to input |
| Tool | Description |
|---|---|
get_page_content() |
Get page as markdown |
get_text(element_id) |
Get element text |
get_page_info() |
Get URL and title |
run_javascript(script) |
Execute JavaScript |
screenshot(annotate, full_page) |
Take screenshot |
| Tool | Description |
|---|---|
get_console_logs() |
Browser console output |
get_network_requests() |
API calls and responses |
get_page_errors() |
JavaScript errors |
run_in_console(code) |
Execute JS in console |
inspect_element(id) |
Element styles/properties |
get_cookies() |
List browser cookies |
set_cookie(name, value, ...) |
Set a cookie |
delete_cookie(name) |
Remove a cookie |
clear_cookies() |
Clear all cookies |
set_viewport(width, height) |
Change window size |
get_page_performance() |
Load times, Web Vitals |
| Tool | Description |
|---|---|
start_recording() |
Begin session recording |
stop_recording() |
Save recording |
take_snapshot() |
Save screenshot + HTML |
list_recordings() |
View saved sessions |
Configure via environment variables:
| Variable | Default | Description |
|---|---|---|
BROWSER_HEADLESS |
true |
Run without visible window |
BROWSER_VIEWPORT_WIDTH |
1280 |
Viewport width in pixels |
BROWSER_VIEWPORT_HEIGHT |
720 |
Viewport height in pixels |
BROWSER_TIMEOUT |
30000 |
Navigation timeout (ms) |
BROWSER_USER_DATA_DIR |
~/.browsercontrol/user_data |
Browser profile path |
BROWSER_EXTENSION_PATH |
— | Path to browser extension |
LOG_LEVEL |
INFO |
Logging verbosity |
Examples:
# Run with visible browser (for debugging)
BROWSER_HEADLESS=false browsercontrol
# Mobile viewport emulation
BROWSER_VIEWPORT_WIDTH=375 BROWSER_VIEWPORT_HEIGHT=812 browsercontrol
# Verbose logging
LOG_LEVEL=DEBUG browsercontrolYou: "Go to Wikipedia and find out when Python was created"
Claude: *navigates to wikipedia.org*
*types "Python programming language" in search*
*clicks search result*
*reads the page*
"Python was created by Guido van Rossum and first released in 1991."
You: "Go to my app at localhost:3000 and check for any errors"
Claude: *navigates to localhost:3000*
*calls get_console_logs()*
"I found 2 errors:
[ERROR] Uncaught TypeError: Cannot read property 'map' of undefined
[ERROR] Failed to load resource: 404 /api/users"
*calls get_network_requests()*
"The /api/users endpoint returned 404. Check your API route."
You: "Test the login flow on my site. Start recording."
Claude: *calls start_recording()*
*navigates to site*
*types email in element 3*
*types password in element 4*
*clicks Login button (element 5)*
*verifies successful redirect*
*calls stop_recording()*
"Login test passed! Recording saved to session_20260202.zip"
You: "Fill out the contact form on example.com"
Claude: *navigates to example.com/contact*
*types name in element 2*
*types email in element 3*
*types message in element 4*
*clicks Submit (element 5)*
"Form submitted successfully!"
┌─────────────────┐ ┌──────────────────┐ ┌─────────────┐
│ AI Agent │────▶│ BrowserControl │────▶│ Browser │
│ (Claude/Gemini) │◀────│ MCP Server │◀────│ (Chromium) │
└─────────────────┘ └──────────────────┘ └─────────────┘
│ │ │
│ "click(5)" │ mouse.click() │
│◀───────────────────────│◀─────────────────────│
│ [annotated │ [screenshot + │
│ screenshot] │ element map] │
- AI sends command —
click(5) - Server finds element — Looks up element #5 from the last screenshot
- Browser acts — Clicks at the element's coordinates
- Capture state — Takes new screenshot, detects elements
- Annotate — Draws numbered boxes on interactive elements
- Return to AI — Sends annotated image + element list
browsercontrol/
├── __init__.py # Package exports
├── __main__.py # CLI entry point
├── server.py # MCP server setup
├── browser.py # BrowserManager with SoM
├── config.py # Environment configuration
└── tools/
├── navigation.py # Navigation tools
├── interaction.py # Click, type, hover tools
├── forms.py # Form handling tools
├── content.py # Content extraction tools
├── devtools.py # Developer tools
├── recording.py # Session recording tools
└── tabs.py # Tab management tools
"Missing X server" Error
Set BROWSER_HEADLESS=true or run with xvfb:
xvfb-run browsercontrolBrowser Not Starting
Chromium auto-installs on first run. If it fails, install manually:
python -m playwright install chromiumSession Not Persisting
Check that BROWSER_USER_DATA_DIR is writable:
ls -la ~/.browsercontrol/Connection Refused
Ensure no other instance is running:
pkill -f browsercontrol
browsercontrolView Session Recordings
Open recordings in the Playwright trace viewer:
npx playwright show-trace ~/.browsercontrol/recordings/session.zipContributions are welcome! Check out our Contributing Guide for details.
Ideas for contributions:
- Firefox/WebKit support
- DOM diffing (detect changes)
- Accessibility audit tools
- Mobile emulation presets
- Cookie import/export files
# Clone and install
git clone https://github.com/adityasasidhar/browsercontrol
cd browsercontrol
uv sync
# Run tests
uv run pytest
# Run in development
uv run fastmcp dev browsercontrol/server.pyMIT License — Use it however you want.
- Vision-first approach inspired by Google's AntiGravity IDE
- Built with FastMCP and Playwright
- Thanks to the MCP community for making AI-tool integration accessible
Built for AI agents that need to see the web.
