Website | Guides | API Docs | Chat
The fastest web crawler written in Rust. Primitives for data curation workloads at scale.
- Fast by Design: Concurrent crawling with streaming responses at scale
- Flexible Rendering: HTTP, Chrome DevTools Protocol (CDP), or WebDriver (Selenium/remote browsers)
- Production Ready: Battle-tested with anti-bot mitigation, caching, and distributed crawling
- Concurrent & streaming crawls
- Decentralized crawling for horizontal scaling
- Caching (memory, disk, or hybrid)
- Proxy support with rotation
- Cron job scheduling
- Chrome DevTools Protocol (CDP) for local Chrome
- WebDriver support for Selenium Grid, remote browsers, and cross-browser testing
- AI-powered automation workflows
- Web challenge solving (deterministic + AI built-in)
- HTML transformations
- CSS/XPath scraping with spider_utils
- Smart mode for JS-rendered content detection
- Anti-bot mitigation
- Ad blocking
- Firewall
- Blacklisting, whitelisting, and depth budgeting
- spider_agent - Concurrent-safe multimodal agent for web automation and research
- Multiple LLM providers (OpenAI, OpenAI-compatible APIs)
- Multiple search providers (Serper, Brave, Bing, Tavily)
- HTML extraction and research synthesis
The fastest way to get started is with Spider Cloud - no infrastructure to manage. Pay-per-use at $1/GB data transfer, designed to keep crawling costs low.
For local development:
[dependencies]
spider = "2"Process pages as they're crawled with real-time subscriptions:
use spider::tokio;
use spider::website::Website;
#[tokio::main]
async fn main() {
let mut website = Website::new("https://spider.cloud");
let mut rx = website.subscribe(0).unwrap();
tokio::spawn(async move {
while let Ok(page) = rx.recv().await {
println!("- {}", page.get_url());
}
});
website.crawl().await;
website.unsubscribe();
}Render JavaScript-heavy pages with stealth mode and request interception:
[dependencies]
spider = { version = "2", features = ["chrome"] }use spider::features::chrome_common::RequestInterceptConfiguration;
use spider::website::Website;
#[tokio::main]
async fn main() {
let mut website = Website::new("https://spider.cloud")
.with_chrome_intercept(RequestInterceptConfiguration::new(true))
.with_stealth(true)
.build()
.unwrap();
website.crawl().await;
}Connect to remote browsers, Selenium Grid, or any W3C WebDriver-compatible service:
[dependencies]
spider = { version = "2", features = ["webdriver"] }use spider::features::webdriver_common::{WebDriverConfig, WebDriverBrowser};
use spider::website::Website;
#[tokio::main]
async fn main() {
let mut website = Website::new("https://spider.cloud")
.with_webdriver(
WebDriverConfig::new()
.with_server_url("http://localhost:4444")
.with_browser(WebDriverBrowser::Chrome)
.with_headless(true)
)
.build()
.unwrap();
website.crawl().await;
}| Method | Best For |
|---|---|
| Spider Cloud | Production workloads, no setup required |
| spider | Rust applications |
| spider_agent | AI-powered web automation and research |
| spider_cli | Command-line usage |
| spider-nodejs | Node.js projects |
| spider-py | Python projects |
- Examples - Code samples for common use cases
- Benchmarks - Performance comparisons
- Changelog - Version history
See CONTRIBUTING.