A natural language browser automation agent powered by ConnectOnion and Playwright.
# Clone the repository
git clone https://github.com/openonion/browser-agent.git
cd browser-agent
# Run the setup script (installs everything)
./setup.sh
# Test it - provide a natural language command
python cli.py run "Go to news.ycombinator.com and find the top story"That's it! The agent will open a browser, perform the task, and report back.
# 1. Install Python dependencies
pip install -r requirements.txt
# 2. Initialize ConnectOnion project
co init
# 3. Install Playwright browsers
playwright install
# 4. Authenticate with ConnectOnion
co authThe setup.sh script automatically:
- Installs all Python dependencies from
requirements.txt - Initializes the ConnectOnion project (creates
.co/directory) - Installs Playwright browsers (Chrome, Firefox, etc.)
- Sets up authentication (creates
.envwith your API key)
The agent.py module now exports a pre-configured agent instance.
from agent import agent
# Just give it natural language commands
result = agent.input("Go to google.com and search for AI news")
print(result)- 🌐 Natural language browser control - Just describe what you want
- 📸 Automatic screenshots - Capture any page state
- 🔍 Smart element finding - No CSS selectors needed
- 📝 Form automation - Fill and submit forms intelligently
- 🎯 Multi-step workflows - Complex automation sequences
- 🔐 Chrome profile support - Use your cookies, sessions, and login states
- 🖼️ Vision support - LLM can see and analyze screenshots automatically
- 🧠 Deep Research Mode - Spawn sub-agents for exhaustive research tasks
For complex information gathering tasks, the agent automatically spawns a specialized sub-agent that shares the browser session but is optimized for exhaustive research.
Simply ask for a research task:
python cli.py run "Deep research 'ConnectOnion' and find the top 3 competitors"browser-agent/
├── cli.py # Command Line Interface (CLI) entry point
├── agent.py # Agent configuration and initialization
├── main.py # HTTP/WebSocket host entry point
├── tools/ # Shared browser tools
│ ├── __init__.py
│ ├── web_automation.py # Browser automation implementation
│ └── scroll_strategies.py # Scrolling logic
├── agents/ # Sub-agents
│ ├── __init__.py
│ └── deep_research.py # Deep research specialist
├── prompts/ # System prompts
│ ├── browser_agent.md # Main agent personality
│ └── deep_research.md # Research sub-agent prompt
├── requirements.txt # Python dependencies
├── setup.sh # Automated setup script
├── tests/ # Test suite
│ ├── test_all.py
│ └── ...
├── screenshots/ # Auto-generated screenshots
├── chromium_automation_profile/ # Chrome profile copy
├── .co/ # ConnectOnion project config
├── .env # API keys
└── README.md # This file
- Natural Language Input: You describe what you want in plain English
- AI Planning: The agent understands and plans the browser actions
- Tool Execution: Playwright performs the actual browser control
- Result Reporting: Agent reports what was done at each step
The browser agent uses the image_result_formatter plugin to automatically convert screenshots to vision format. When a tool returns a base64-encoded screenshot, the plugin:
- Detects the base64 image data
- Converts it to multimodal message format
- Allows the LLM to see and analyze the screenshot visually
🖼️ Formatted 'take_screenshot' result as image
This enables powerful visual workflows:
- Visual verification - LLM can confirm if actions succeeded by seeing the page
- Content extraction - Read text, identify elements from screenshots
- Error detection - Spot visual problems like missing buttons or error messages
- Automatic analysis - LLM describes what it sees in the screenshot
from connectonion import Agent
from connectonion.useful_plugins import image_result_formatter
from tools.web_automation import WebAutomation
web = WebAutomation()
agent = Agent(
name="browser",
tools=web,
plugins=[image_result_formatter] # Auto-format screenshots for vision
)
agent.input("Go to example.com, take a screenshot, and describe what you see")
# The LLM will actually SEE the screenshot and describe:
# "I can see a simple webpage with the heading 'Example Domain' and
# some descriptive text about this domain being used in examples..."See tests/test_image_plugin.py for a working demo.
# See examples/ folder for more
# python examples/demo_image_plugin.pyBy default, the agent uses your Chrome profile data (cookies, sessions, logins). This means:
- ✅ Stay logged in - Access sites where you're already authenticated
- ✅ No conflicts - Your regular Chrome can stay open while agent runs
- ✅ Fast - First run copies profile (~50s), subsequent runs are instant
- ✅ Private - Profile copy stored locally in
chromium_automation_profile/(gitignored)
On first run, after a manual login, the agent copies essential Chrome profile data to ./chromium_automation_profile/:
- Cookies and sessions
- Saved passwords (encrypted)
- Bookmarks and history
- Extensions (skips cache for speed)
Subsequent runs reuse this copy, so startup is fast.
To use a fresh browser without your Chrome data:
# In agent.py
web = WebAutomation(profile_path=None) # or pass headless=True/Falsepython tests/test_all.py