Skip to content

spider-rs/spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,576 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spider

Build Status Crates.io Documentation Rust Discord chat

Website | Guides | API Docs | Chat

The fastest web crawler written in Rust. Primitives for data curation workloads at scale.

Why Spider?

  • Fast by Design: Concurrent crawling with streaming responses at scale
  • Flexible Rendering: HTTP, Chrome DevTools Protocol (CDP), or WebDriver (Selenium/remote browsers)
  • Production Ready: Battle-tested with anti-bot mitigation, caching, and distributed crawling

Features

Core

  • Concurrent & streaming crawls
  • Decentralized crawling for horizontal scaling
  • Caching (memory, disk, or hybrid)
  • Proxy support with rotation
  • Cron job scheduling

Browser Automation

  • Chrome DevTools Protocol (CDP) for local Chrome
  • WebDriver support for Selenium Grid, remote browsers, and cross-browser testing
  • AI-powered automation workflows
  • Web challenge solving (deterministic + AI built-in)

Data Processing

Security & Control

AI Agent

  • spider_agent - Concurrent-safe multimodal agent for web automation and research
  • Multiple LLM providers (OpenAI, OpenAI-compatible APIs)
  • Multiple search providers (Serper, Brave, Bing, Tavily)
  • HTML extraction and research synthesis

Quick Start

The fastest way to get started is with Spider Cloud - no infrastructure to manage. Pay-per-use at $1/GB data transfer, designed to keep crawling costs low.

For local development:

[dependencies]
spider = "2"

Streaming Pages

Process pages as they're crawled with real-time subscriptions:

use spider::tokio;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://spider.cloud");
    let mut rx = website.subscribe(0).unwrap();

    tokio::spawn(async move {
        while let Ok(page) = rx.recv().await {
            println!("- {}", page.get_url());
        }
    });

    website.crawl().await;
    website.unsubscribe();
}

Chrome (CDP)

Render JavaScript-heavy pages with stealth mode and request interception:

[dependencies]
spider = { version = "2", features = ["chrome"] }
use spider::features::chrome_common::RequestInterceptConfiguration;
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://spider.cloud")
        .with_chrome_intercept(RequestInterceptConfiguration::new(true))
        .with_stealth(true)
        .build()
        .unwrap();

    website.crawl().await;
}

WebDriver (Selenium Grid)

Connect to remote browsers, Selenium Grid, or any W3C WebDriver-compatible service:

[dependencies]
spider = { version = "2", features = ["webdriver"] }
use spider::features::webdriver_common::{WebDriverConfig, WebDriverBrowser};
use spider::website::Website;

#[tokio::main]
async fn main() {
    let mut website = Website::new("https://spider.cloud")
        .with_webdriver(
            WebDriverConfig::new()
                .with_server_url("http://localhost:4444")
                .with_browser(WebDriverBrowser::Chrome)
                .with_headless(true)
        )
        .build()
        .unwrap();

    website.crawl().await;
}

Get Spider

Method Best For
Spider Cloud Production workloads, no setup required
spider Rust applications
spider_agent AI-powered web automation and research
spider_cli Command-line usage
spider-nodejs Node.js projects
spider-py Python projects

Resources

License

MIT

Contributing

See CONTRIBUTING.