Releases: spider-rs/spider
v2.45.20
What's New
Relevance Gate for Remote Multimodal Crawling
Added a relevance_gate config that instructs the LLM to return a "relevant": true|false field in its JSON response. When a page is deemed irrelevant, its wildcard budget credit is refunded so the crawler discovers more relevant content.
New config fields:
relevance_gate: bool— enables the featurerelevance_prompt: Option<String>— optional custom relevance criteria
How it works:
- When enabled, the system prompt instructs the LLM to include
"relevant": true|false - If the model returns
false, a budget credit is atomically accumulated - Credits are drained in the crawl loop to restore the wildcard budget
- Default fallback is
true(assume relevant) if the model omits the field
Example:
let cfgs = RemoteMultimodalConfigs::new(api_url, model)
.with_relevance_gate(Some("Only pages about Rust programming".into()));Full Changelog
- feat(agent): add
relevance_gateandrelevance_prompttoRemoteMultimodalConfig - feat(agent): add atomic
relevance_creditscounter toRemoteMultimodalConfigs - feat(agent): add
relevant: Option<bool>toAutomationResultandAutomationResults - feat(agent): extend system prompt and extraction with relevance gate instructions
- feat(spider): add
restore_wildcard_budget()for budget refund - feat(spider): drain relevance credits in crawl loop dequeue
v2.44.13
What's New
- Spider Cloud integration (
spider_cloudfeature) — optional proxy rotation, anti-bot bypass, and intelligent fallback via spider.cloud- Modes: Proxy, Api, Unblocker, Fallback, Smart
- Smart mode auto-detects Cloudflare challenges, CAPTCHAs, and bot protection then retries via
/unblocker
- S3 skills loading (
skills_s3feature) — load agent skills from S3-compatible storage (AWS, MinIO, R2) - CLI:
--spider-cloud-keyand--spider-cloud-modeflags
Crates
spiderv2.44.13spider_agentv2.44.13spider_cliv2.44.13spider_utilsv2.44.13spider_workerv2.44.13
spider v2.43.20
Spider v2.43.20
Changes
- fix(spider): Fix doctest and update chromey for adblock compatibility
- fix(search): Use reqwest::Client directly for cache feature compatibility
- chore(spider): Update spider_agent dependency to 0.4
spider_agent Integration
The agent feature now uses spider_agent v0.4.0, which includes:
- Smart caching with size-aware LRU eviction
- High-performance chain execution with parallel step support
- Batch processing for multiple items
- Prefetch management for predictive page loading
- Smart model routing based on task complexity
Full Changelog
spider_agent v0.4.0
Spider Agent v0.4.0
Performance Optimizations
This release adds several performance optimizations for automation workflows:
Smart Caching
- SmartCache: Size-aware LRU cache with automatic cleanup
- Bounded memory usage with configurable limits
- TTL-based expiration
- Automatic cleanup on memory pressure
- Statistics tracking (hits, misses, evictions)
High-Performance Execution
-
ChainExecutor: Parallel step execution for automation chains
- Analyzes dependencies for optimal parallelization
- Response caching with TTL
- Configurable concurrency limits
- Step timeout support
-
BatchExecutor: Efficient batch processing
- Process multiple items with configurable batch sizes
- Parallel execution within batches
- Index-aware processing option
-
PrefetchManager: Predictive page loading
- Prefetch URLs in the background
- Automatic cache management
- Concurrent prefetch limits
Smart Model Routing
- ModelRouter: Intelligent model selection based on task complexity
- Task analysis for complexity scoring
- User-configurable model policies
- Cost tier constraints (Low/Medium/High)
- Latency-aware routing
Other Changes
- Added
MessageContenthelper methods:as_text(),full_text(),is_text(),has_images() - Default
ModelPolicynow allows High tier routing - Fixed compilation warnings
Full Changelog
v2.43.18 - Web Search Integration
Features
Web Search Integration
Add web search capabilities to Spider's RemoteMultimodalEngine with support for multiple search providers.
Supported Providers
- Serper (
search_serper) - Google SERP API - Brave (
search_brave) - Privacy-focused search - Bing (
search_bing) - Microsoft Bing Web Search - Tavily (
search_tavily) - AI-optimized search
New Methods
search()- Search the web and return structured resultssearch_and_extract()- Search + fetch pages + LLM extractionresearch()- Search + extract + synthesize findings into summary
Setup
Cargo.toml
[dependencies]
spider = { version = "2.43.18", features = ["search_serper"] }Configuration
use spider::configuration::{SearchConfig, SearchProviderType};
use spider::features::automation::RemoteMultimodalEngine;
let mut engine = RemoteMultimodalEngine::new(api_url, model, None);
engine.with_search_config(Some(
SearchConfig::new(SearchProviderType::Serper, "your-api-key")
// Optional: custom API endpoint
.with_api_url("https://custom.api.com/search")
));
// Simple search
let results = engine.search("rust web crawler", None, None).await?;
// Search + extract
let data = engine.search_and_extract(
"best rust frameworks",
"Extract name and description",
None,
None,
).await?;
// Research with synthesis
use spider::features::automation::ResearchOptions;
let research = engine.research(
"How do async runtimes work?",
ResearchOptions::new().with_max_pages(5).with_synthesis(),
None,
).await?;
println!("Summary: {}", research.summary.unwrap());Custom API Endpoints
All providers support custom API URLs for self-hosted or alternative endpoints:
SearchConfig::new(SearchProviderType::Brave, "api-key")
.with_api_url("https://my-brave-proxy.example.com/search")Full Changelog
v2.43.13 - Advanced Agentic Automation
🤖 Advanced Agentic Automation Features
This release adds comprehensive agentic automation capabilities to spider, making it a powerful tool for autonomous web interactions.
Phase 1: Simplified Agentic APIs
act(page, instruction)- Execute single actions with natural languageobserve(page)- Analyze page state and get structured observationsextract_page(page, prompt, schema)- Extract structured data from pagesAutomationMemory- In-memory state management for multi-round automationrun_with_memory()- Stateful automation with persistent context
Phase 2: Self-Healing & Discovery
SelectorCache- Self-healing selector cache with LRU evictionact_cached(page, instruction, cache)- Actions with automatic selector cachingStructuredOutputConfig- Native JSON schema enforcement for reliable outputsextract_structured(page, prompt, config)- Schema-validated data extractionmap(page, prompt)- AI-powered URL discovery and categorizationMapResult/DiscoveredUrl- Relevance-scored URL discovery
Phase 3: Autonomous Agent Execution
execute(page, config)- Full autonomous goal-oriented executionagent(page, goal)- Simple goal execution with defaultsagent_extract(page, goal, prompt)- Goal execution with data extractionchain(page, steps)- Sequential action composition with conditionsAgentConfig- Comprehensive agent configuration (max_steps, timeout, recovery, etc.)RecoveryStrategy- Error handling strategies (Retry, Alternative, Skip, Abort)ChainStep/ChainCondition- Conditional action executionAgentEvent- Real-time progress tracking eventsAgentResult/ChainResult- Detailed execution results with history
Example Usage
// Autonomous agent
let config = AgentConfig::new("Find and add the cheapest laptop to cart")
.with_max_steps(30)
.with_success_url("/cart")
.with_extraction("Extract cart total");
let result = engine.execute(&page, config).await?;
// Action chaining
let steps = vec![
ChainStep::new("click Login"),
ChainStep::new("type email").when(ChainCondition::ElementExists("#email")),
ChainStep::new("click Submit").then_extract("Extract any errors"),
];
let result = engine.chain(&page, steps).await?;
// Self-healing cache
let mut cache = SelectorCache::new();
engine.act_cached(&page, "click submit", &mut cache).await?;Full Changelog
- feat(automation): add Phase 3 agentic features - autonomous agent, action chaining, error recovery
- feat(automation): add Phase 2 agentic features - selector cache, structured outputs, map API
- feat(automation): add simplified agentic APIs - act(), observe(), extract()
- feat(automation): add agentic memory for multi-round automation
v2.43.3
Bug Fix
- fix(automation): Improve
best_effort_parse_json_objectparsing to handle LLM responses with reasoning text before JSON code blocks- Find ```json blocks anywhere in response (not just at boundaries)
- Support JSON arrays in addition to objects
- Better fallback parsing for various LLM response formats
Full Changelog: v2.43.2...v2.43.3
v2.43.2
New Feature: Extraction Schema Support
Add JSON Schema support for structured extraction in RemoteMultimodalEngine.
ExtractionSchema Struct
pub struct ExtractionSchema {
pub name: String, // Schema name (e.g., "products")
pub description: Option<String>, // What to extract
pub schema: String, // JSON Schema definition
pub strict: bool, // Enforce strict adherence
}Example Usage
use spider::features::automation::{RemoteMultimodalConfigs, ExtractionSchema};
let schema = ExtractionSchema::new_with_description(
"products",
"Extract product information",
r#"{
"type": "object",
"properties": {
"products": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": { "type": "string" },
"price": { "type": "number" }
},
"required": ["name", "price"]
}
}
}
}"#,
).with_strict(true);
let mm = RemoteMultimodalConfigs::new("http://localhost:11434/v1/chat/completions", "model")
.with_extra_ai_data(true)
.with_extraction_schema(Some(schema));Full Changelog: v2.43.1...v2.43.2
v2.43.1
Bug Fix
- fix(page): Add missing
remote_multimodal_usageandextra_remote_multimodal_datafields to the decentralizedPagestruct for feature parity with the standardPagestruct.
Full Changelog: v2.43.0...v2.43.1
v2.43.0
What's New
Token Usage Tracking for RemoteMultimodalEngine
The remote multimodal automation engine now tracks and returns token usage conforming to the OpenAI API format:
AutomationUsagestruct withprompt_tokens,completion_tokens,total_tokens- Usage is accumulated across all inference rounds
- Stored on
Page.remote_multimodal_usage
Extraction Support
New extraction capabilities for RemoteMultimodalEngine, similar to the OpenAI integration:
extra_ai_data- Enable extraction modeextraction_prompt- Custom extraction instructionsscreenshot- Capture final screenshot
Extracted data is automatically stored on Page.extra_remote_multimodal_data as AutomationResults.
Example Usage
use spider::features::automation::RemoteMultimodalConfigs;
let mm = RemoteMultimodalConfigs::new(
"http://localhost:11434/v1/chat/completions",
"qwen2.5-vl",
)
.with_extra_ai_data(true)
.with_extraction_prompt(Some("Extract all product names and prices"))
.with_screenshot(true);
website.configuration.remote_multimodal = Some(Box::new(mm));
// After crawling, access on page:
for page in website.get_pages().await {
if let Some(usage) = &page.remote_multimodal_usage {
println!("Tokens: {:?}", usage);
}
if let Some(data) = &page.extra_remote_multimodal_data {
for result in data {
println!("Extracted: {:?}", result.content_output);
}
}
}Full Changelog: v2.42.0...v2.43.0