feat: add Venice AI as first-class LLM backend by jesseproudman · Pull Request #95 · nearai/ironclaw

jesseproudman · 2026-02-14T13:05:29Z

Summary

Adds a native VeniceProvider (src/llm/venice.rs) implementing LlmProvider with full support for Venice-specific venice_parameters (web search, web scraping, system prompt control)
Dynamic model pricing fetched from Venice GET /models API with 1-hour TTL cache and graceful fallback to default_cost()
Native tool calling support (no flatten_tool_messages hack needed — Venice handles role: "tool" natively)
VeniceConfig with VENICE_* env vars following existing provider patterns, including input validation for VENICE_WEB_SEARCH (off/on/auto)
api.venice.ai added to sandbox network allowlist
12 unit tests covering message conversion, parameter serialization, and cost calculation

Files changed

File	Change
`src/llm/venice.rs`	New provider (881 lines incl. tests)
`src/config.rs`	`LlmBackend::Venice`, `VeniceConfig`, env var resolution
`src/llm/mod.rs`	Module registration, `create_venice_provider()`
`src/sandbox/config.rs`	`api.venice.ai` in default allowlist
`src/setup/wizard.rs`	`venice: None` in struct literal
`.env.example`	Document Venice env vars

Design notes

cost_per_token() returns default_cost() until list_models() or model_metadata() triggers the first catalog fetch (lazy initialization by design — the hot complete() path doesn't block on catalog refresh)
Uses std::sync::RwLock (not tokio) with documented invariant that guards are never held across .await points

Test plan

cargo check — compiles with Venice additions
cargo test venice — 12 venice module tests pass
cargo test — no regressions in existing tests
cargo clippy --all --all-features — no new warnings
Manual: LLM_BACKEND=venice VENICE_API_KEY=<key> cargo run — chat completion works

🤖 Generated with Claude Code

Add a native reqwest-based Venice inference API provider that supports Venice-specific venice_parameters (web search, web scraping, thinking control) and dynamic model pricing fetched from the /models API. Changes: - src/config.rs: LlmBackend::Venice variant, VeniceConfig struct, VENICE_* env var resolution with validation - src/llm/venice.rs: VeniceProvider with LlmProvider impl, cached model catalog with 1-hour TTL, 12 unit tests - src/llm/mod.rs: register Venice module and create_venice_provider - src/sandbox/config.rs: add api.venice.ai to network allowlist - src/setup/wizard.rs: add venice field to LlmConfig struct literal - .env.example: document Venice env vars Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

gemini-code-assist · 2026-02-14T13:05:49Z

Summary of Changes

Hello @jesseproudman, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the application's LLM capabilities by integrating Venice AI as a new, fully-featured backend. This addition provides users with an alternative powerful model provider, complete with advanced functionalities like web interaction and precise cost management. The implementation ensures seamless configuration and robust error handling, making Venice AI a viable and performant option for various AI-driven tasks.

Highlights

New LLM Backend Integration: Introduced Venice AI as a first-class LLM backend, enabling users to leverage its inference capabilities alongside existing providers.
Venice-Specific Features Support: Added support for Venice AI's unique parameters, including web search, web scraping, and control over the system prompt, enhancing model interaction.
Dynamic Model Pricing: Implemented dynamic model pricing for Venice AI, fetching costs from its API with a 1-hour TTL cache and graceful fallback to default pricing.
Native Tool Calling: Enabled native tool calling support for Venice AI, simplifying integration by directly handling role: "tool" messages without requiring workarounds.
Configuration and Environment Variables: Integrated Venice AI configuration through a new VeniceConfig struct and corresponding VENICE_* environment variables, including input validation for VENICE_WEB_SEARCH.
Network Allowlist Update: Updated the sandbox network allowlist to include api.venice.ai, ensuring secure communication with the new backend.

Changelog

.env.example
- Added example environment variables for configuring the Venice AI provider, including API key, model, base URL, web search, and web scraping settings.
src/config.rs
- Added Venice variant to the LlmBackend enum.
- Updated the FromStr implementation for LlmBackend to parse 'venice' and related aliases.
- Updated the Display implementation for LlmBackend to output 'venice'.
- Defined a new VeniceConfig struct to hold Venice AI specific configuration parameters like API key, base URL, model, web search, web scraping, and system prompt inclusion.
- Added venice: Option<VeniceConfig> field to the LlmConfig struct.
- Implemented logic within LlmConfig::from_env() to parse and validate Venice AI environment variables, including VENICE_API_KEY, VENICE_BASE_URL, VENICE_MODEL, VENICE_WEB_SEARCH, VENICE_WEB_SCRAPING, and VENICE_INCLUDE_SYSTEM_PROMPT.
src/llm/mod.rs
- Updated module documentation to include Venice AI as a supported provider.
- Declared and imported the new venice module.
- Exported VeniceProvider for external use.
- Added a match arm for LlmBackend::Venice in create_llm_provider to instantiate VeniceProvider.
- Added create_venice_provider function to handle Venice AI provider initialization.
src/llm/venice.rs
- Added a new file implementing the VeniceProvider struct, which conforms to the LlmProvider trait.
- Implemented methods for complete and complete_with_tools using Venice AI's chat completions API.
- Included logic for fetching and caching model catalog information, including dynamic pricing and context lengths, with a 1-hour TTL.
- Provided utility functions for constructing Venice-specific API URLs and handling API keys.
- Defined internal structs for Venice API request and response serialization/deserialization, including VeniceChatRequest, VeniceParameters, ChatCompletionMessage, ChatCompletionTool, ChatCompletionResponse, and VeniceModelsResponse.
- Implemented message conversion logic from generic ChatMessage to Venice-specific ChatCompletionMessage format.
- Included comprehensive unit tests for message conversion, parameter serialization, and cost calculation logic.
src/sandbox/config.rs
- Added api.venice.ai to the list of default allowed network domains.
src/setup/wizard.rs
- Initialized the venice field to None in the LlmConfig struct literal within the setup wizard.

Activity

The pull request was created by jesseproudman.
No human activity (comments, reviews) has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request integrates Venice AI as a first-class LLM provider. Feedback includes a critical security issue regarding insecure handling of the API key in memory by bypassing SecretString protections. Additionally, issues were identified with including full API response bodies in error messages, which can lead to information leakage and resource exhaustion; truncation using character-aware methods is recommended. Other feedback covers a potential race condition in lazy initialization, improved error handling for tool argument parsing, and consistency in the .env.example file. Addressing these points will enhance both the functionality and security of the new provider.

gemini-code-assist · 2026-02-14T13:13:40Z

src/llm/venice.rs

+    fn api_key(&self) -> String {
+        self.config.api_key.expose_secret().to_string()
+    }


The VeniceProvider implementation defeats the purpose of using SecretString for the API key by converting it to a regular heap-allocated String via to_string() in the api_key() helper method. This creates a non-zeroed copy of the secret in memory that persists until dropped and can be captured in core dumps. Additionally, the Authorization header is constructed using format!, creating another unprotected copy of the secret. The header is also not marked as sensitive, which could lead to the API key being logged if reqwest debug logging is enabled.

Remediation: Avoid converting SecretString to String. Use expose_secret() directly when constructing the header. Use reqwest::header::HeaderValue::from_str and call set_sensitive(true) on the resulting value to ensure it is handled securely by the HTTP client and excluded from any debug logging.

gemini-code-assist · 2026-02-14T13:13:40Z

.env.example

+# VENICE_MODEL=zai-org-glm-5
+# VENICE_BASE_URL=https://api.venice.ai/api/v1
+# VENICE_WEB_SEARCH=off
+# VENICE_WEB_SCRAPING=false


The VENICE_INCLUDE_SYSTEM_PROMPT environment variable is supported in the configuration logic but is missing from this example file. Adding it here helps users discover all available Venice-specific options.

# VENICE_WEB_SCRAPING=false # VENICE_INCLUDE_SYSTEM_PROMPT=false

gemini-code-assist · 2026-02-14T13:13:40Z

src/llm/venice.rs

+            return Err(LlmError::RequestFailed {
+                provider: "venice".to_string(),
+                reason: format!("HTTP {}: {}", status, response_text),
+            });


Including the full response body in error messages can lead to excessive memory usage and poor log readability if the API returns a large HTML error page (e.g., from a proxy or WAF). It is recommended to truncate the response text using a character-aware method to avoid panics on multi-byte characters.

References

Always truncate tool output or external API responses for previews or status updates to a reasonable maximum length.

When truncating a UTF-8 string, use character-aware methods to avoid panics caused by slicing in the middle of a multi-byte character.

gemini-code-assist · 2026-02-14T13:13:40Z

src/llm/venice.rs

+    async fn refresh_catalog_if_stale(&self) {
+        let is_stale = {
+            let catalog = self
+                .model_catalog
+                .read()
+                .expect("model_catalog lock poisoned");
+            match catalog.fetched_at {
+                None => true,
+                Some(t) => t.elapsed() > CATALOG_TTL,
+            }
+        };


This lazy initialization pattern contains a race condition: multiple concurrent requests arriving while the catalog is stale will all pass the is_stale check and trigger redundant fetch_models() network calls. While the final update is thread-safe, you might consider a mechanism to throttle or deduplicate these initial fetch attempts.

gemini-code-assist · 2026-02-14T13:13:40Z

src/llm/venice.rs

+                let arguments = serde_json::from_str(&tc.function.arguments)
+                    .unwrap_or(serde_json::Value::Object(Default::default()));


Silently falling back to an empty object when tool arguments fail to parse can make debugging difficult. If the LLM produces malformed JSON, it's better to log a warning or propagate the error so the failure is transparent.

let arguments = serde_json::from_str(&tc.function.arguments).unwrap_or_else(|e| { tracing::warn!("Failed to parse Venice tool arguments: {}. Raw: {}", e, tc.function.arguments); serde_json::Value::Object(Default::default()) });

Add VeniceEmbeddings as a third embeddings provider alongside OpenAI and NEAR AI, allowing users who set LLM_BACKEND=venice to use Venice for semantic search without needing a separate OpenAI API key. Changes: - Add VeniceEmbeddings struct with OpenAI-compatible API support - Add venice_api_key and venice_base_url to EmbeddingsConfig - Add "venice" match arms in both embedding creation sites in main.rs - Add Venice option to the setup wizard embeddings step - Auto-enable embeddings when VENICE_API_KEY is set - Add EMBEDDING_PROVIDER=venice to .env.example Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

tribendu · 2026-02-15T00:57:34Z

Summary

This PR adds Venice AI as a fully-featured LLM provider in the IronClaw project, including configuration support (800+ lines in src/llm/venice.rs), embeddings provider, and setup wizard integration. The implementation follows existing provider patterns, supports Venice-specific features (web search, web scraping), includes lazy-loading model catalog with pricing, and has comprehensive unit tests (15+ tests covering message conversion, parameter serialization, cost calculation).

Pros

Comprehensive test coverage: 15+ unit tests covering message conversion, tool calls, parameter serialization, cost calculation, and edge cases
Good documentation: Inline comments justify design decisions (e.g., std::sync::RwLock vs tokio::sync::RwLock)
Security-first: Uses secrecy::SecretString for API keys, proper error types for auth failures
Lazy catalog loading: Model catalog fetched only when needed with 1-hour TTL, graceful fallback on errors
Venice-specific features: Supports web search ("off"/"on"/"auto"), web scraping, and system prompt control
Complete integration: Includes embeddings provider, sandbox allowlist, setup wizard, and environment variable support
Consistent with codebase: Follows same patterns as NearAI/OpenAI providers
Cost calculation: Parses dynamic pricing from /models endpoint with per-token rate calculation

Concerns

Lock poisoning: Uses expect("model_catalog lock poisoned") - will panic on double panic instead of graceful degradation
Request type duplication: Defines many Venice-specific request/response types (VeniceChatRequest, ChatCompletionMessage, etc.) that could be reused with OpenAI-compatible provider
Hardcoded timeout: 120-second timeout in src/llm/venice.rs:67 should be configurable
No integration tests: Only unit tests; no mock server tests for API interactions
Stale cache silent: Failed catalog refresh keeps stale cache without user notification - could return outdated pricing
Rate limiting: Logs warning but provides no user-facing guidance on rate limits or backoff strategy
Embeddings dimension: Hardcoded dimension mapping (1536 for small, 3072 for large) embedded in main.rs setup/118 instead of derived from model

Suggestions

Lock safety: Replace expect() poisoned lock handling with graceful fallback or use poison::into_inner()
Request type reuse: Investigate if OpenAI-compatible provider can handle Venice's venice_parameters to reduce duplication
Configurable timeout: Export timeout as config field in VeniceConfig
Add integration tests: Use wiremock or similar to test actual request/response cycles, especially for /models pricing parsing
Cache notifications: Log cached catalog age in warnings, add telemetry for stale cache hits
Rate limit handling: Consider adding retry-after header parsing and user-facing rate limit errors with guidance
Dimension mapping: Extract dimension mapping to a helper function or derive from response metadata instead of hardcoded values
Request deduplication: Consider deduplicating simultaneous catalog refresh requests to reduce API load

Overall: This is a well-structured PR that thoughtfully integrates Venice AI as a first-class provider. The code quality is high, with excellent documentation and test coverage. The concerns listed are minor and mostly about robustness in edge cases. This PR is ready to merge once suggestions are considered (especially lock safety handling).

zmanian

Solid work adding Venice AI as a first-class provider. The implementation follows the existing provider patterns well, and the test coverage (15+ unit tests) is appreciated. A few things to address before merging:

Must fix:

Lock poisoning -- expect("model_catalog lock poisoned") in the catalog cache will panic on double-panic scenarios. Replace with graceful fallback:

match self.model_catalog.read() {
    Ok(guard) => { /* use guard */ }
    Err(poisoned) => {
        tracing::warn!("model catalog lock was poisoned, recovering");
        let guard = poisoned.into_inner();
        // use guard
    }
}

Hardcoded timeout -- The 120-second timeout should be configurable via VeniceConfig (add a timeout_secs field, default to 120). Other providers will likely want this too eventually.

Should fix:

Request type duplication -- VeniceChatRequest, ChatCompletionMessage, ChatCompletionTool, etc. are very close to what an OpenAI-compatible provider would use. Consider whether the existing OpenAI-compat types could be reused with Venice-specific extensions (the venice_parameters field). This would reduce maintenance surface.
Embeddings dimension mapping -- Hardcoded dimension mapping (1536/3072) in main.rs should be derived from the model or moved to a helper function rather than inlined in startup logic.

Nice to have:

Integration tests -- Unit tests cover serialization and cost math well, but a wiremock test for the /models catalog fetch + cache TTL behavior would catch edge cases in the HTTP layer.
Stale cache visibility -- When catalog refresh fails and stale cache is used, log the cache age so operators can tell if pricing data is hours or days old.

The architecture is sound and Venice-specific features (web search modes, web scraping, system prompt control) are cleanly modeled. Looking forward to the next revision.

ilblackdragon

Code Review: feat: add Venice AI as first-class LLM backend

I reviewed the full diff (1205 additions, 6 deletions) across all 10 changed files. Overall this is a well-structured PR that follows the established provider patterns closely. Below is my detailed analysis.

LlmProvider Trait Implementation

The VeniceProvider correctly implements all required trait methods (complete, complete_with_tools, model_name, cost_per_token) and overrides the appropriate optional methods (list_models, model_metadata, active_model_name, set_model). The implementation mirrors the patterns established by NearAiChatProvider and OpenAiCompatibleChatProvider.

The complete_with_tools implementation handles tool calling natively (no flatten_tool_messages hack) which is the correct approach for a provider that supports role: "tool" messages.

Correctness Issues

1. Duplicate embeddings setup in src/main.rs (minor, not a regression)

There are two nearly identical blocks setting up Venice embeddings -- one around line 115 (inside the memory CLI subcommand path) and another around line 489 (in the main agent startup path). This isn't introduced by this PR (the same duplication exists for OpenAI/NearAI), but the PR propagates it. Consider extracting a helper function that maps (provider_name, config) -> Option<Arc<dyn EmbeddingProvider>> to avoid the 4x duplication. Not a blocker.

2. Missing "venice" arm in second embeddings block has fallthrough to OpenAI

In the main agent embeddings setup (around the _ => fallthrough arm at line 578 of main.rs), if someone sets EMBEDDING_PROVIDER=venice but there's a code path that doesn't match the "venice" string, it would silently fall through to trying OpenAI. The PR does add the "venice" arm correctly in both locations, so this is fine as-is. Just noting the fragile pattern.

`.unwrap()` / `.expect()` Audit

Production code:

.expect("model_catalog lock poisoned") and .expect("active_model lock poisoned") -- Acceptable. This is the established pattern used by NearAiProvider and NearAiChatProvider for std::sync::RwLock guards. The documented invariant (guards never held across .await) makes poisoning a genuine "should never happen" scenario.
.unwrap_or_else(|_| Client::new()) in VeniceProvider::new -- Acceptable. Reasonable fallback if Client::builder() fails.
.unwrap_or_default() on response.text().await and choice.message.content -- Acceptable. Same pattern as existing providers.
serde_json::from_str(&tc.function.arguments).unwrap_or(serde_json::Value::Object(Default::default())) -- Acceptable. Same pattern as nearai_chat.rs and openai_compatible_chat.rs.

Test code:

All .unwrap() / .expect() in #[cfg(test)] mod tests -- Fine per project rules.

No violations found.

Import Style

Uses crate:: imports throughout (crate::config::VeniceConfig, crate::error::LlmError, crate::llm::provider::..., crate::llm::costs). Correct.
use super::*; only in #[cfg(test)] mod tests -- acceptable and consistent with other providers.

Error Handling

Error handling is thorough and follows established patterns:

HTTP 401 maps to LlmError::AuthFailed
HTTP 429 maps to LlmError::RateLimited (with retry_after: None -- Venice doesn't seem to provide this header, which is fine)
All other non-success responses map to LlmError::RequestFailed with status and body
JSON parse failures map to LlmError::InvalidResponse
The VeniceEmbeddings provider uses the same error hierarchy (EmbeddingError::AuthFailed, HttpError, RateLimited) correctly

Design Quality

Model catalog with TTL cache -- Clean design. The lazy fetch with 1-hour TTL and graceful fallback to default_cost() is a nice touch. The cost_per_token() sync method correctly reads from the cached catalog without async overhead, and list_models()/model_metadata() trigger refresh when stale.

VeniceParameters -- Good use of #[serde(skip_serializing_if = "Option::is_none")] to keep them out of the request JSON when unconfigured. The build_venice_parameters() method correctly returns None when no Venice-specific options are set.

Config validation -- The VENICE_WEB_SEARCH validation (must be "off"/"on"/"auto") with a clear error message is solid.

Suggestions (Non-blocking)

Consider tracing::warn! for tool call argument parse failures -- The line serde_json::from_str(&tc.function.arguments).unwrap_or(serde_json::Value::Object(Default::default())) silently swallows parse errors. Adding a tracing::warn! here would help debugging. (Same issue exists in existing providers, so this is a codebase-wide improvement, not specific to this PR.)
The VeniceEmbeddings struct stores api_key as a plain String rather than SecretString. In the LLM provider, VeniceConfig.api_key is correctly typed as SecretString, but the embeddings provider receives it after .expose_secret() is called in main.rs. This is the same pattern used by OpenAiEmbeddings so it's consistent, but worth noting.
The VeniceEmbeddings response types (VeniceEmbeddingRequest, VeniceEmbeddingResponse, VeniceEmbeddingData) are identical to the OpenAI embedding types. If Venice truly uses an OpenAI-compatible embedding endpoint, consider reusing the existing OpenAI embedding types or extracting a shared type. Not a blocker since the types are small.

Test Coverage

12 unit tests covering:

Message conversion (user, system, assistant, tool, assistant with tool calls)
Tool call argument serialization
Venice parameter serialization (with all fields, with none, with partial fields)
Cost calculation from catalog (with model, without model/fallback)
Embeddings config

Good coverage of the core logic. The tests correctly use #[cfg(test)] and test-only super::* imports.

Files Review Summary

File	Status
`src/llm/venice.rs`	Clean, follows existing patterns
`src/config.rs`	Correct config struct, validation, env var handling
`src/llm/mod.rs`	Correct module registration and factory
`src/main.rs`	Venice embeddings wired in both code paths
`src/workspace/embeddings.rs`	Clean `VeniceEmbeddings` implementation
`src/workspace/mod.rs`	Correct pub use addition
`src/sandbox/config.rs`	`api.venice.ai` in allowlist
`src/settings.rs`	Doc comment updated
`src/setup/wizard.rs`	Venice option in setup wizard
`.env.example`	Well-documented env vars

Verdict

This is a solid, well-implemented PR that follows the project's established patterns for adding a new LLM provider. The code is clean, error handling is thorough, and tests cover the key logic paths. No blocking issues found.

ilblackdragon

Approving. Clean implementation that follows the established provider patterns. No .unwrap()/.expect() violations in production code, correct crate:: imports, thorough error handling, and good test coverage. The suggestions in my detailed comment are all non-blocking.

tribendu · 2026-02-17T13:19:01Z

Batch 2 PR Review: nearai/ironclaw PRs 111, 110, 109, 103, 95, 74

Reviewer: AI Sub-Agent
Date: February 17, 2026
Review Method: Manual diff analysis with focus on code quality, bugs, security, performance, test coverage, and Rust best practices

PR #111: Fix backwards compatibility for nearai.session_token

Summary

This PR adds backwards compatibility for the nearai session management by implementing a fallback mechanism. When nearai.session_token is not found, the system now attempts to fall back to the legacy nearai.session setting. The change is minimal and focused on the session token retrieval logic in SessionManager::renew_session(). The implementation uses a graceful fallback pattern with proper error handling and logging.

Pros

Graceful degradation: Implements backwards compatibility without breaking existing functionality
Clear error handling: Preserves original error handling for both primary and fallback paths
Informative logging: Adds a warning message when falling back to legacy session storage
Minimal change scope: Only affects session token retrieval logic, reducing risk of regression
Proper pattern: Uses idiomatic Rust if let Some(...) for optional value handling

Concerns

Technical debt: This adds maintenance burden by keeping two parallel session storage schemas
No migration path: There's no clear deprecation timeline or migration strategy for the legacy schema
Silent fallback: The warning log may be missed in production, leading to continued use of deprecated schema
Error propagation: The error messages remain generic - they could be more specific about whether the primary or fallback path failed

Suggestions

Add a migration function to convert legacy sessions to the new format periodically
Consider adding telemetry/metrics to track usage of the legacy fallback path
Document the expected lifecycle and deprecation timeline for nearai.session
Add a configuration option to disable the fallback after migration is complete
Consider making the fallback opt-in rather than silent to encourage migration

PR #110: Add env docs for local LLM providers

Summary

This PR updates the .env.example file to document environment variables for local LLM providers including Ollama, LM Studio, vLLM, and other OpenAI-compatible endpoints. The changes are purely documentation-focused, providing clear examples of environment variable configurations for users who want to use local models instead of the default NEAR AI backend.

Pros

User-friendly: Makes it easy for new users to configure local LLM providers
Comprehensive: Covers multiple backend options (Ollama, OpenAI-compatible)
Clear defaults: Shows default values for URLs and ports
Well-organized: Groups related configurations logically
No code changes: Documentation-only PR, zero risk of breaking functionality

Concerns

No validation: Environment variables are documented but there's no validation that the backend value matches the selected provider
Incomplete examples: Example model names are specific but may not match what users have installed
Missing authentication notes: Doesn't document that LLM_API_KEY is optional for local servers

Suggestions

Add comments explaining when LLM_API_KEY is optional vs required
Include a link to documentation for each provider
Add a note about model availability requiring local installation
Consider adding environment variable validation logic in a future PR
Document how to switch between providers at runtime
Add example commands to test connections (e.g., curl http://localhost:11434/api/tags)

PR #109: Normalize memory search query and update marked.js

Summary

This PR addresses two security and stability issues in the web interface. First, it adds input normalization for memory search queries to prevent excessive query lengths and invalid input types. Second, it updates the marked.js dependency to a specific version with integrity hashing for supply chain security. The changes are defensive in nature, preventing potential DoS attacks and ensuring the integrity of third-party JavaScript dependencies.

Pros

Input validation: Adds MEMORY_SEARCH_QUERY_MAX_LENGTH constant (100 chars) to prevent excessively long queries
Type safety: Normalizes queries to string type, preventing crashes from non-string input
Supply chain security: Uses SRI (Subresource Integrity) hashing for marked.js CDN dependency
Consistent application: Applies normalization throughout search functions (searchMemory, snippetAround, highlightQuery)
Clear error handling: Early return when normalized query is empty

Concerns

Arbitrary limit: 100-character limit may be too restrictive for complex semantic search queries
Silent truncation: Queries longer than the limit are silently truncated without user feedback
CDN dependency: Still relies on external CDN for marked.js despite SRI - could use bundled version
No rate limiting: Query length limit doesn't prevent rapid-fire spam queries
Client-side only: Validation happens on client side only - server should also validate

Suggestions

Increase the limit or make it configurable (e.g., 200-500 characters)
Add user feedback when query is truncated (toast message or inline warning)
Implement rate limiting on the /api/memory/search endpoint
Add server-side query validation to defend against bypassed client checks
Consider bundling marked.js with the application to eliminate CDN dependency
Add unit tests for the normalization function covering edge cases (null, undefined, empty string, very long string)

PR #103: Per-request model override for OpenAI-compatible API

Summary

This is a significant feature PR that adds per-request model override capability across the entire LLM provider ecosystem. Previously, all requests used the active model, but now clients can specify a different model per request. The changes span multiple modules: request structs now include optional model fields, providers check for request-level models before falling back to defaults, OpenAI-compatible endpoint validates model names, and comprehensive integration tests verify functionality. The PR includes migration documentation and removes the previous strict model validation that returned 404 for non-active models.

Pros

Well-architected: Uses Option with builder pattern (with_model()) for clean API
Comprehensive testing: Updated integration tests to mock model tracking and verify model override works
Full stack coverage: Changes propagate from API entry points through worker layer to actual providers
Good error handling: Validates model name length (MAX_MODEL_NAME_BYTES: 256) before processing
Backwards compatible: Existing behavior preserved when model field is not provided
Code quality: Consistent pattern across CompletionRequest and ToolCompletionRequest
Documentation: Updates FEATURE_PARITY.md to reflect new capability

Concerns

No model validation: The endpoint now accepts ANY model name without checking if it exists or is configured
Validation bypasses providers: Validation happens at HTTP layer, but providers don't validate model availability
Security implications: Users can request models they shouldn't have access to; cost tracking may break
Incomplete fallback: If provider doesn't support the requested model, behavior is undefined
Missing permissions: No RBAC or policy checking for model override capability
Potential for confusion: Active model concept remains but can be overridden per request

Suggestions

Add a method to validate if a model exists/can be used before making the request
Implement optional model allowlist/denylist configuration per user or API key
Add telemetry to track model usage patterns across different requests
Document the precedence: request model > active model > provider default
Consider adding a force_model flag that fails fast if model is unavailable
Add tests for error cases (non-existent model, invalid characters, empty string)
Update cost tracking to handle per-request model pricing differences
Consider adding model availability check to LlmProvider trait's set_model() method

PR #95: Add Venice AI provider and embeddings

Summary

This PR adds comprehensive support for Venice AI as both an LLM provider and an embeddings provider. It introduces a new VeniceProvider module with full API integration including Venice-specific features like web search, web scraping, and dynamic model pricing fetched from the /models endpoint. The PR also adds VeniceEmbeddings for semantic search, updates configuration handling, and integrates Venice into the setup wizard. The implementation includes extensive unit tests, proper error handling, caching of model catalogs with TTL, and follows the existing provider patterns.

Pros

Fully featured: Implements complete Venice API including proprietary parameters (web search, scraping, system prompt)
Dynamic pricing: Fetches model catalog with pricing from Venice API, updates cost tracking automatically
Good caching: Uses 1-hour TTL for model catalog to reduce API calls
Comprehensive testing: Extensive unit tests for message conversion, serialization, cost calculation
Type safety:Uses properly typed structures and enums for Venice-specific functionality
Consistent patterns: Follows existing provider patterns (NearAiProvider, etc.)
Error handling: Proper HTTP error mapping (401→AuthFailed, 429→RateLimited)
Config validation: Validates Venice-specific config values (web_search must be "off"/"on"/"auto")

Concerns

Large file: venice.rs is 880 lines - consider splitting into smaller modules
Lock choice: Uses std::sync::RwLock with comment about async safety - risk if async code is added later
Secret handling: API key exposed as String in multiple places despite being wrapped in SecretString in Config
No retry logic: Network requests don't have exponential backoff for transient failures
Hardcoded timeout: 120-second timeout may be too long or too short for different use cases
Cache misses on errors: If catalog fetch fails, stale cache is kept indefinitely
Embeddings dimension hardcoding: Magic numbers for dimensions (1536, 3072) scattered in code

Suggestions

Split venice.rs into multiple modules: provider.rs, models.rs, api.rs
Consider using tokio::sync::RwLock for future-proofing with async code
Add retry logic with exponential backoff for API requests
Make timeout configurable via VeniceConfig
Add metrics for cache hit/miss ratio
Document the security model: when are secrets exposed?
Extract dimension mapping to a constant or helper function
Add integration test that calls actual Venice API with test credentials
Consider adding health check endpoint for provider connectivity
Add support for streaming responses (if Venice API supports it)

PR #74: Security fix: Enhanced HTTP response size validation

Summary

This PR strengthens the HTTP tool's defense against OOM attacks by implementing two-stage response size validation. First, it checks the Content-Length header before downloading any content to immediately reject oversized responses. Second, it streams the response body with a hard size cap during download, protecting against malicious or misconfigured servers that may send incorrect or missing Content-Length headers. The implementation uses streaming with futures::StreamExt and enforces a consistent 5 MB limit across both stages.

Pros

Defense in depth: Two-stage validation (header check + streaming cap) protects against multiple attack vectors
Early rejection: Aborts request before downloading potentially malicious content
Streaming approach: Memory-efficient, doesn't load entire response before checking size
Informative logging: Warns with URL and size details when rejecting responses
Consistent constant: Uses same 5 MB limit as WASM HTTP wrapper
Well-documented: Clear comments explain the security rationale and 5 MB justification
Test coverage: Includes test verifying the constant value is reasonable

Concerns

Header spoofing: Relies on server sending accurate Content-Length - malicious servers can send small value then send large body
No byte limit on individual chunks: Only checks cumulative size; individual chunks could still be large
Error message exposure: Returns detailed size information in errors which might leak information
No rate limiting: An attacker could still send many small requests to exhaust resources
Hardcoded limit: 5 MB may be too small for some legitimate use cases (e.g., downloading large JSON datasets)

Suggestions

Add per-chunk size limit (e.g., 1 MB max per chunk) in addition to cumulative limit
Consider making MAX_RESPONSE_SIZE configurable per tool or per request
Add telemetry/metrics for rejected responses to detect DDoS patterns
Implement request rate limiting in addition to size limiting
Add option to override limit for authorized users/admins
Consider adding a timeout for the streaming operation
Add test cases for Content-Length header spoofing attempt
Document how users can work around the limit if needed (e.g., using multiple smaller requests with pagination)

Overall Recommendations

High Priority

PR feat: support per-request model override in /v1/chat/completions #103: Address the security implications of unrestricted model override before merging - add allowlist/denylist functionality
PR fix: check Content-Length before downloading HTTP response body #74: Consider making the size limit configurable for flexibility while maintaining security

Medium Priority

PR feat: add Venice AI as first-class LLM backend #95: Refactor the large venice.rs file and add retry logic for network resilience
PR web: add integrity check for marked CDN and cap highlight regex input #109: Add server-side validation to complement client-side checks
PR llm: fallback to legacy nearai.session key when loading DB session #111: Add migration strategy for legacy session tokens

Low Priority

PR docs: add .env.example examples for Ollama and OpenAI-compatible #110: Add validation documentation and testing commands
PR feat: add Venice AI as first-class LLM backend #95: Extract magic numbers and add integration tests
PR web: add integrity check for marked CDN and cap highlight regex input #109: Consider removing CDN dependency for marked.js

General Observations

All PRs follow Rust best practices and maintain code quality
Test coverage is generally good, though some integration tests could be expanded
Error handling is consistent across the codebase
Documentation (comments and inline docs) is clear and helpful
Most PRs are backwards compatible, which is good for production deployments

gemini-code-assist bot reviewed Feb 14, 2026

View reviewed changes

zmanian reviewed Feb 16, 2026

View reviewed changes

ilblackdragon reviewed Feb 17, 2026

View reviewed changes

ilblackdragon approved these changes Feb 17, 2026

View reviewed changes

This was referenced Feb 17, 2026

llm: fallback to legacy nearai.session key when loading DB session #111

Merged

docs: add .env.example examples for Ollama and OpenAI-compatible #110

Merged

web: add integrity check for marked CDN and cap highlight regex input #109

Merged

This was referenced Feb 17, 2026

feat: support per-request model override in /v1/chat/completions #103

Open

fix: check Content-Length before downloading HTTP response body #74

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Venice AI as first-class LLM backend#95

feat: add Venice AI as first-class LLM backend#95
jesseproudman wants to merge 2 commits intonearai:mainfrom
veniceai:feat/venice-llm-provider

jesseproudman commented Feb 14, 2026

Uh oh!

gemini-code-assist bot commented Feb 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 14, 2026

Uh oh!

gemini-code-assist bot Feb 14, 2026

Uh oh!

gemini-code-assist bot Feb 14, 2026

Uh oh!

gemini-code-assist bot Feb 14, 2026

Uh oh!

gemini-code-assist bot Feb 14, 2026

Uh oh!

tribendu commented Feb 15, 2026

Uh oh!

zmanian left a comment

Uh oh!

ilblackdragon left a comment

Uh oh!

ilblackdragon left a comment

Uh oh!

tribendu commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

		let arguments = serde_json::from_str(&tc.function.arguments)
		.unwrap_or(serde_json::Value::Object(Default::default()));

Conversation

jesseproudman commented Feb 14, 2026

Summary

Files changed

Design notes

Test plan

Uh oh!

gemini-code-assist bot commented Feb 14, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

tribendu commented Feb 15, 2026

Summary

Pros

Concerns

Suggestions

Uh oh!

zmanian left a comment

Choose a reason for hiding this comment

Uh oh!

ilblackdragon left a comment

Choose a reason for hiding this comment

Code Review: feat: add Venice AI as first-class LLM backend

LlmProvider Trait Implementation

Correctness Issues

.unwrap() / .expect() Audit

Import Style

Error Handling

Design Quality

Suggestions (Non-blocking)

Test Coverage

Files Review Summary

Verdict

Uh oh!

ilblackdragon left a comment

Choose a reason for hiding this comment

Uh oh!

tribendu commented Feb 17, 2026

Batch 2 PR Review: nearai/ironclaw PRs 111, 110, 109, 103, 95, 74

PR #111: Fix backwards compatibility for nearai.session_token

Summary

Pros

Concerns

Suggestions

PR #110: Add env docs for local LLM providers

Summary

Pros

Concerns

Suggestions

PR #109: Normalize memory search query and update marked.js

Summary

Pros

Concerns

Suggestions

PR #103: Per-request model override for OpenAI-compatible API

Summary

Pros

Concerns

Suggestions

PR #95: Add Venice AI provider and embeddings

`.unwrap()` / `.expect()` Audit