feat: add Venice AI as first-class LLM backend#95
feat: add Venice AI as first-class LLM backend#95jesseproudman wants to merge 2 commits intonearai:mainfrom
Conversation
Add a native reqwest-based Venice inference API provider that supports Venice-specific venice_parameters (web search, web scraping, thinking control) and dynamic model pricing fetched from the /models API. Changes: - src/config.rs: LlmBackend::Venice variant, VeniceConfig struct, VENICE_* env var resolution with validation - src/llm/venice.rs: VeniceProvider with LlmProvider impl, cached model catalog with 1-hour TTL, 12 unit tests - src/llm/mod.rs: register Venice module and create_venice_provider - src/sandbox/config.rs: add api.venice.ai to network allowlist - src/setup/wizard.rs: add venice field to LlmConfig struct literal - .env.example: document Venice env vars Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Summary of ChangesHello @jesseproudman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the application's LLM capabilities by integrating Venice AI as a new, fully-featured backend. This addition provides users with an alternative powerful model provider, complete with advanced functionalities like web interaction and precise cost management. The implementation ensures seamless configuration and robust error handling, making Venice AI a viable and performant option for various AI-driven tasks. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request integrates Venice AI as a first-class LLM provider. Feedback includes a critical security issue regarding insecure handling of the API key in memory by bypassing SecretString protections. Additionally, issues were identified with including full API response bodies in error messages, which can lead to information leakage and resource exhaustion; truncation using character-aware methods is recommended. Other feedback covers a potential race condition in lazy initialization, improved error handling for tool argument parsing, and consistency in the .env.example file. Addressing these points will enhance both the functionality and security of the new provider.
| fn api_key(&self) -> String { | ||
| self.config.api_key.expose_secret().to_string() | ||
| } |
There was a problem hiding this comment.
The VeniceProvider implementation defeats the purpose of using SecretString for the API key by converting it to a regular heap-allocated String via to_string() in the api_key() helper method. This creates a non-zeroed copy of the secret in memory that persists until dropped and can be captured in core dumps. Additionally, the Authorization header is constructed using format!, creating another unprotected copy of the secret. The header is also not marked as sensitive, which could lead to the API key being logged if reqwest debug logging is enabled.
Remediation: Avoid converting SecretString to String. Use expose_secret() directly when constructing the header. Use reqwest::header::HeaderValue::from_str and call set_sensitive(true) on the resulting value to ensure it is handled securely by the HTTP client and excluded from any debug logging.
| # VENICE_MODEL=zai-org-glm-5 | ||
| # VENICE_BASE_URL=https://api.venice.ai/api/v1 | ||
| # VENICE_WEB_SEARCH=off | ||
| # VENICE_WEB_SCRAPING=false |
There was a problem hiding this comment.
| return Err(LlmError::RequestFailed { | ||
| provider: "venice".to_string(), | ||
| reason: format!("HTTP {}: {}", status, response_text), | ||
| }); |
There was a problem hiding this comment.
Including the full response body in error messages can lead to excessive memory usage and poor log readability if the API returns a large HTML error page (e.g., from a proxy or WAF). It is recommended to truncate the response text using a character-aware method to avoid panics on multi-byte characters.
References
- Always truncate tool output or external API responses for previews or status updates to a reasonable maximum length.
- When truncating a UTF-8 string, use character-aware methods to avoid panics caused by slicing in the middle of a multi-byte character.
| async fn refresh_catalog_if_stale(&self) { | ||
| let is_stale = { | ||
| let catalog = self | ||
| .model_catalog | ||
| .read() | ||
| .expect("model_catalog lock poisoned"); | ||
| match catalog.fetched_at { | ||
| None => true, | ||
| Some(t) => t.elapsed() > CATALOG_TTL, | ||
| } | ||
| }; |
There was a problem hiding this comment.
This lazy initialization pattern contains a race condition: multiple concurrent requests arriving while the catalog is stale will all pass the is_stale check and trigger redundant fetch_models() network calls. While the final update is thread-safe, you might consider a mechanism to throttle or deduplicate these initial fetch attempts.
| let arguments = serde_json::from_str(&tc.function.arguments) | ||
| .unwrap_or(serde_json::Value::Object(Default::default())); |
There was a problem hiding this comment.
Silently falling back to an empty object when tool arguments fail to parse can make debugging difficult. If the LLM produces malformed JSON, it's better to log a warning or propagate the error so the failure is transparent.
let arguments = serde_json::from_str(&tc.function.arguments).unwrap_or_else(|e| {
tracing::warn!("Failed to parse Venice tool arguments: {}. Raw: {}", e, tc.function.arguments);
serde_json::Value::Object(Default::default())
});Add VeniceEmbeddings as a third embeddings provider alongside OpenAI and NEAR AI, allowing users who set LLM_BACKEND=venice to use Venice for semantic search without needing a separate OpenAI API key. Changes: - Add VeniceEmbeddings struct with OpenAI-compatible API support - Add venice_api_key and venice_base_url to EmbeddingsConfig - Add "venice" match arms in both embedding creation sites in main.rs - Add Venice option to the setup wizard embeddings step - Auto-enable embeddings when VENICE_API_KEY is set - Add EMBEDDING_PROVIDER=venice to .env.example Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
SummaryThis PR adds Venice AI as a fully-featured LLM provider in the IronClaw project, including configuration support (800+ lines in src/llm/venice.rs), embeddings provider, and setup wizard integration. The implementation follows existing provider patterns, supports Venice-specific features (web search, web scraping), includes lazy-loading model catalog with pricing, and has comprehensive unit tests (15+ tests covering message conversion, parameter serialization, cost calculation). Pros
Concerns
Suggestions
Overall: This is a well-structured PR that thoughtfully integrates Venice AI as a first-class provider. The code quality is high, with excellent documentation and test coverage. The concerns listed are minor and mostly about robustness in edge cases. This PR is ready to merge once suggestions are considered (especially lock safety handling). |
zmanian
left a comment
There was a problem hiding this comment.
Solid work adding Venice AI as a first-class provider. The implementation follows the existing provider patterns well, and the test coverage (15+ unit tests) is appreciated. A few things to address before merging:
Must fix:
-
Lock poisoning --
expect("model_catalog lock poisoned")in the catalog cache will panic on double-panic scenarios. Replace with graceful fallback:match self.model_catalog.read() { Ok(guard) => { /* use guard */ } Err(poisoned) => { tracing::warn!("model catalog lock was poisoned, recovering"); let guard = poisoned.into_inner(); // use guard } }
-
Hardcoded timeout -- The 120-second timeout should be configurable via
VeniceConfig(add atimeout_secsfield, default to 120). Other providers will likely want this too eventually.
Should fix:
-
Request type duplication --
VeniceChatRequest,ChatCompletionMessage,ChatCompletionTool, etc. are very close to what an OpenAI-compatible provider would use. Consider whether the existing OpenAI-compat types could be reused with Venice-specific extensions (thevenice_parametersfield). This would reduce maintenance surface. -
Embeddings dimension mapping -- Hardcoded dimension mapping (1536/3072) in
main.rsshould be derived from the model or moved to a helper function rather than inlined in startup logic.
Nice to have:
-
Integration tests -- Unit tests cover serialization and cost math well, but a
wiremocktest for the/modelscatalog fetch + cache TTL behavior would catch edge cases in the HTTP layer. -
Stale cache visibility -- When catalog refresh fails and stale cache is used, log the cache age so operators can tell if pricing data is hours or days old.
The architecture is sound and Venice-specific features (web search modes, web scraping, system prompt control) are cleanly modeled. Looking forward to the next revision.
ilblackdragon
left a comment
There was a problem hiding this comment.
Code Review: feat: add Venice AI as first-class LLM backend
I reviewed the full diff (1205 additions, 6 deletions) across all 10 changed files. Overall this is a well-structured PR that follows the established provider patterns closely. Below is my detailed analysis.
LlmProvider Trait Implementation
The VeniceProvider correctly implements all required trait methods (complete, complete_with_tools, model_name, cost_per_token) and overrides the appropriate optional methods (list_models, model_metadata, active_model_name, set_model). The implementation mirrors the patterns established by NearAiChatProvider and OpenAiCompatibleChatProvider.
The complete_with_tools implementation handles tool calling natively (no flatten_tool_messages hack) which is the correct approach for a provider that supports role: "tool" messages.
Correctness Issues
1. Duplicate embeddings setup in src/main.rs (minor, not a regression)
There are two nearly identical blocks setting up Venice embeddings -- one around line 115 (inside the memory CLI subcommand path) and another around line 489 (in the main agent startup path). This isn't introduced by this PR (the same duplication exists for OpenAI/NearAI), but the PR propagates it. Consider extracting a helper function that maps (provider_name, config) -> Option<Arc<dyn EmbeddingProvider>> to avoid the 4x duplication. Not a blocker.
2. Missing "venice" arm in second embeddings block has fallthrough to OpenAI
In the main agent embeddings setup (around the _ => fallthrough arm at line 578 of main.rs), if someone sets EMBEDDING_PROVIDER=venice but there's a code path that doesn't match the "venice" string, it would silently fall through to trying OpenAI. The PR does add the "venice" arm correctly in both locations, so this is fine as-is. Just noting the fragile pattern.
.unwrap() / .expect() Audit
Production code:
.expect("model_catalog lock poisoned")and.expect("active_model lock poisoned")-- Acceptable. This is the established pattern used byNearAiProviderandNearAiChatProviderforstd::sync::RwLockguards. The documented invariant (guards never held across.await) makes poisoning a genuine "should never happen" scenario..unwrap_or_else(|_| Client::new())inVeniceProvider::new-- Acceptable. Reasonable fallback ifClient::builder()fails..unwrap_or_default()onresponse.text().awaitandchoice.message.content-- Acceptable. Same pattern as existing providers.serde_json::from_str(&tc.function.arguments).unwrap_or(serde_json::Value::Object(Default::default()))-- Acceptable. Same pattern asnearai_chat.rsandopenai_compatible_chat.rs.
Test code:
- All
.unwrap()/.expect()in#[cfg(test)] mod tests-- Fine per project rules.
No violations found.
Import Style
- Uses
crate::imports throughout (crate::config::VeniceConfig,crate::error::LlmError,crate::llm::provider::...,crate::llm::costs). Correct. use super::*;only in#[cfg(test)] mod tests-- acceptable and consistent with other providers.
Error Handling
Error handling is thorough and follows established patterns:
- HTTP 401 maps to
LlmError::AuthFailed - HTTP 429 maps to
LlmError::RateLimited(withretry_after: None-- Venice doesn't seem to provide this header, which is fine) - All other non-success responses map to
LlmError::RequestFailedwith status and body - JSON parse failures map to
LlmError::InvalidResponse - The
VeniceEmbeddingsprovider uses the same error hierarchy (EmbeddingError::AuthFailed,HttpError,RateLimited) correctly
Design Quality
Model catalog with TTL cache -- Clean design. The lazy fetch with 1-hour TTL and graceful fallback to default_cost() is a nice touch. The cost_per_token() sync method correctly reads from the cached catalog without async overhead, and list_models()/model_metadata() trigger refresh when stale.
VeniceParameters -- Good use of #[serde(skip_serializing_if = "Option::is_none")] to keep them out of the request JSON when unconfigured. The build_venice_parameters() method correctly returns None when no Venice-specific options are set.
Config validation -- The VENICE_WEB_SEARCH validation (must be "off"/"on"/"auto") with a clear error message is solid.
Suggestions (Non-blocking)
-
Consider
tracing::warn!for tool call argument parse failures -- The lineserde_json::from_str(&tc.function.arguments).unwrap_or(serde_json::Value::Object(Default::default()))silently swallows parse errors. Adding atracing::warn!here would help debugging. (Same issue exists in existing providers, so this is a codebase-wide improvement, not specific to this PR.) -
The
VeniceEmbeddingsstruct storesapi_keyas a plainStringrather thanSecretString. In the LLM provider,VeniceConfig.api_keyis correctly typed asSecretString, but the embeddings provider receives it after.expose_secret()is called inmain.rs. This is the same pattern used byOpenAiEmbeddingsso it's consistent, but worth noting. -
The
VeniceEmbeddingsresponse types (VeniceEmbeddingRequest,VeniceEmbeddingResponse,VeniceEmbeddingData) are identical to the OpenAI embedding types. If Venice truly uses an OpenAI-compatible embedding endpoint, consider reusing the existing OpenAI embedding types or extracting a shared type. Not a blocker since the types are small.
Test Coverage
12 unit tests covering:
- Message conversion (user, system, assistant, tool, assistant with tool calls)
- Tool call argument serialization
- Venice parameter serialization (with all fields, with none, with partial fields)
- Cost calculation from catalog (with model, without model/fallback)
- Embeddings config
Good coverage of the core logic. The tests correctly use #[cfg(test)] and test-only super::* imports.
Files Review Summary
| File | Status |
|---|---|
src/llm/venice.rs |
Clean, follows existing patterns |
src/config.rs |
Correct config struct, validation, env var handling |
src/llm/mod.rs |
Correct module registration and factory |
src/main.rs |
Venice embeddings wired in both code paths |
src/workspace/embeddings.rs |
Clean VeniceEmbeddings implementation |
src/workspace/mod.rs |
Correct pub use addition |
src/sandbox/config.rs |
api.venice.ai in allowlist |
src/settings.rs |
Doc comment updated |
src/setup/wizard.rs |
Venice option in setup wizard |
.env.example |
Well-documented env vars |
Verdict
This is a solid, well-implemented PR that follows the project's established patterns for adding a new LLM provider. The code is clean, error handling is thorough, and tests cover the key logic paths. No blocking issues found.
ilblackdragon
left a comment
There was a problem hiding this comment.
Approving. Clean implementation that follows the established provider patterns. No .unwrap()/.expect() violations in production code, correct crate:: imports, thorough error handling, and good test coverage. The suggestions in my detailed comment are all non-blocking.
Batch 2 PR Review: nearai/ironclaw PRs 111, 110, 109, 103, 95, 74Reviewer: AI Sub-Agent PR #111: Fix backwards compatibility for nearai.session_tokenSummaryThis PR adds backwards compatibility for the nearai session management by implementing a fallback mechanism. When Pros
Concerns
Suggestions
PR #110: Add env docs for local LLM providersSummaryThis PR updates the Pros
Concerns
Suggestions
PR #109: Normalize memory search query and update marked.jsSummaryThis PR addresses two security and stability issues in the web interface. First, it adds input normalization for memory search queries to prevent excessive query lengths and invalid input types. Second, it updates the marked.js dependency to a specific version with integrity hashing for supply chain security. The changes are defensive in nature, preventing potential DoS attacks and ensuring the integrity of third-party JavaScript dependencies. Pros
Concerns
Suggestions
PR #103: Per-request model override for OpenAI-compatible APISummaryThis is a significant feature PR that adds per-request model override capability across the entire LLM provider ecosystem. Previously, all requests used the active model, but now clients can specify a different model per request. The changes span multiple modules: request structs now include optional Pros
Concerns
Suggestions
PR #95: Add Venice AI provider and embeddingsSummaryThis PR adds comprehensive support for Venice AI as both an LLM provider and an embeddings provider. It introduces a new Pros
Concerns
Suggestions
PR #74: Security fix: Enhanced HTTP response size validationSummaryThis PR strengthens the HTTP tool's defense against OOM attacks by implementing two-stage response size validation. First, it checks the Pros
Concerns
Suggestions
Overall RecommendationsHigh Priority
Medium Priority
Low Priority
General Observations
|
Summary
VeniceProvider(src/llm/venice.rs) implementingLlmProviderwith full support for Venice-specificvenice_parameters(web search, web scraping, system prompt control)GET /modelsAPI with 1-hour TTL cache and graceful fallback todefault_cost()flatten_tool_messageshack needed — Venice handlesrole: "tool"natively)VeniceConfigwithVENICE_*env vars following existing provider patterns, including input validation forVENICE_WEB_SEARCH(off/on/auto)api.venice.aiadded to sandbox network allowlistFiles changed
src/llm/venice.rssrc/config.rsLlmBackend::Venice,VeniceConfig, env var resolutionsrc/llm/mod.rscreate_venice_provider()src/sandbox/config.rsapi.venice.aiin default allowlistsrc/setup/wizard.rsvenice: Nonein struct literal.env.exampleDesign notes
cost_per_token()returnsdefault_cost()untillist_models()ormodel_metadata()triggers the first catalog fetch (lazy initialization by design — the hotcomplete()path doesn't block on catalog refresh)std::sync::RwLock(not tokio) with documented invariant that guards are never held across.awaitpointsTest plan
cargo check— compiles with Venice additionscargo test venice— 12 venice module tests passcargo test— no regressions in existing testscargo clippy --all --all-features— no new warningsLLM_BACKEND=venice VENICE_API_KEY=<key> cargo run— chat completion works🤖 Generated with Claude Code