Skip to content

feat: add Venice AI as first-class LLM backend#95

Draft
jesseproudman wants to merge 2 commits intonearai:mainfrom
veniceai:feat/venice-llm-provider
Draft

feat: add Venice AI as first-class LLM backend#95
jesseproudman wants to merge 2 commits intonearai:mainfrom
veniceai:feat/venice-llm-provider

Conversation

@jesseproudman
Copy link

Summary

  • Adds a native VeniceProvider (src/llm/venice.rs) implementing LlmProvider with full support for Venice-specific venice_parameters (web search, web scraping, system prompt control)
  • Dynamic model pricing fetched from Venice GET /models API with 1-hour TTL cache and graceful fallback to default_cost()
  • Native tool calling support (no flatten_tool_messages hack needed — Venice handles role: "tool" natively)
  • VeniceConfig with VENICE_* env vars following existing provider patterns, including input validation for VENICE_WEB_SEARCH (off/on/auto)
  • api.venice.ai added to sandbox network allowlist
  • 12 unit tests covering message conversion, parameter serialization, and cost calculation

Files changed

File Change
src/llm/venice.rs New provider (881 lines incl. tests)
src/config.rs LlmBackend::Venice, VeniceConfig, env var resolution
src/llm/mod.rs Module registration, create_venice_provider()
src/sandbox/config.rs api.venice.ai in default allowlist
src/setup/wizard.rs venice: None in struct literal
.env.example Document Venice env vars

Design notes

  • cost_per_token() returns default_cost() until list_models() or model_metadata() triggers the first catalog fetch (lazy initialization by design — the hot complete() path doesn't block on catalog refresh)
  • Uses std::sync::RwLock (not tokio) with documented invariant that guards are never held across .await points

Test plan

  • cargo check — compiles with Venice additions
  • cargo test venice — 12 venice module tests pass
  • cargo test — no regressions in existing tests
  • cargo clippy --all --all-features — no new warnings
  • Manual: LLM_BACKEND=venice VENICE_API_KEY=<key> cargo run — chat completion works

🤖 Generated with Claude Code

Add a native reqwest-based Venice inference API provider that supports
Venice-specific venice_parameters (web search, web scraping, thinking
control) and dynamic model pricing fetched from the /models API.

Changes:
- src/config.rs: LlmBackend::Venice variant, VeniceConfig struct,
  VENICE_* env var resolution with validation
- src/llm/venice.rs: VeniceProvider with LlmProvider impl, cached
  model catalog with 1-hour TTL, 12 unit tests
- src/llm/mod.rs: register Venice module and create_venice_provider
- src/sandbox/config.rs: add api.venice.ai to network allowlist
- src/setup/wizard.rs: add venice field to LlmConfig struct literal
- .env.example: document Venice env vars

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @jesseproudman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the application's LLM capabilities by integrating Venice AI as a new, fully-featured backend. This addition provides users with an alternative powerful model provider, complete with advanced functionalities like web interaction and precise cost management. The implementation ensures seamless configuration and robust error handling, making Venice AI a viable and performant option for various AI-driven tasks.

Highlights

  • New LLM Backend Integration: Introduced Venice AI as a first-class LLM backend, enabling users to leverage its inference capabilities alongside existing providers.
  • Venice-Specific Features Support: Added support for Venice AI's unique parameters, including web search, web scraping, and control over the system prompt, enhancing model interaction.
  • Dynamic Model Pricing: Implemented dynamic model pricing for Venice AI, fetching costs from its API with a 1-hour TTL cache and graceful fallback to default pricing.
  • Native Tool Calling: Enabled native tool calling support for Venice AI, simplifying integration by directly handling role: "tool" messages without requiring workarounds.
  • Configuration and Environment Variables: Integrated Venice AI configuration through a new VeniceConfig struct and corresponding VENICE_* environment variables, including input validation for VENICE_WEB_SEARCH.
  • Network Allowlist Update: Updated the sandbox network allowlist to include api.venice.ai, ensuring secure communication with the new backend.
Changelog
  • .env.example
    • Added example environment variables for configuring the Venice AI provider, including API key, model, base URL, web search, and web scraping settings.
  • src/config.rs
    • Added Venice variant to the LlmBackend enum.
    • Updated the FromStr implementation for LlmBackend to parse 'venice' and related aliases.
    • Updated the Display implementation for LlmBackend to output 'venice'.
    • Defined a new VeniceConfig struct to hold Venice AI specific configuration parameters like API key, base URL, model, web search, web scraping, and system prompt inclusion.
    • Added venice: Option<VeniceConfig> field to the LlmConfig struct.
    • Implemented logic within LlmConfig::from_env() to parse and validate Venice AI environment variables, including VENICE_API_KEY, VENICE_BASE_URL, VENICE_MODEL, VENICE_WEB_SEARCH, VENICE_WEB_SCRAPING, and VENICE_INCLUDE_SYSTEM_PROMPT.
  • src/llm/mod.rs
    • Updated module documentation to include Venice AI as a supported provider.
    • Declared and imported the new venice module.
    • Exported VeniceProvider for external use.
    • Added a match arm for LlmBackend::Venice in create_llm_provider to instantiate VeniceProvider.
    • Added create_venice_provider function to handle Venice AI provider initialization.
  • src/llm/venice.rs
    • Added a new file implementing the VeniceProvider struct, which conforms to the LlmProvider trait.
    • Implemented methods for complete and complete_with_tools using Venice AI's chat completions API.
    • Included logic for fetching and caching model catalog information, including dynamic pricing and context lengths, with a 1-hour TTL.
    • Provided utility functions for constructing Venice-specific API URLs and handling API keys.
    • Defined internal structs for Venice API request and response serialization/deserialization, including VeniceChatRequest, VeniceParameters, ChatCompletionMessage, ChatCompletionTool, ChatCompletionResponse, and VeniceModelsResponse.
    • Implemented message conversion logic from generic ChatMessage to Venice-specific ChatCompletionMessage format.
    • Included comprehensive unit tests for message conversion, parameter serialization, and cost calculation logic.
  • src/sandbox/config.rs
    • Added api.venice.ai to the list of default allowed network domains.
  • src/setup/wizard.rs
    • Initialized the venice field to None in the LlmConfig struct literal within the setup wizard.
Activity
  • The pull request was created by jesseproudman.
  • No human activity (comments, reviews) has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates Venice AI as a first-class LLM provider. Feedback includes a critical security issue regarding insecure handling of the API key in memory by bypassing SecretString protections. Additionally, issues were identified with including full API response bodies in error messages, which can lead to information leakage and resource exhaustion; truncation using character-aware methods is recommended. Other feedback covers a potential race condition in lazy initialization, improved error handling for tool argument parsing, and consistency in the .env.example file. Addressing these points will enhance both the functionality and security of the new provider.

Comment on lines +87 to +89
fn api_key(&self) -> String {
self.config.api_key.expose_secret().to_string()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The VeniceProvider implementation defeats the purpose of using SecretString for the API key by converting it to a regular heap-allocated String via to_string() in the api_key() helper method. This creates a non-zeroed copy of the secret in memory that persists until dropped and can be captured in core dumps. Additionally, the Authorization header is constructed using format!, creating another unprotected copy of the secret. The header is also not marked as sensitive, which could lead to the API key being logged if reqwest debug logging is enabled.

Remediation: Avoid converting SecretString to String. Use expose_secret() directly when constructing the header. Use reqwest::header::HeaderValue::from_str and call set_sensitive(true) on the resulting value to ensure it is handled securely by the HTTP client and excluded from any debug logging.

# VENICE_MODEL=zai-org-glm-5
# VENICE_BASE_URL=https://api.venice.ai/api/v1
# VENICE_WEB_SEARCH=off
# VENICE_WEB_SCRAPING=false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The VENICE_INCLUDE_SYSTEM_PROMPT environment variable is supported in the configuration logic but is missing from this example file. Adding it here helps users discover all available Venice-specific options.

# VENICE_WEB_SCRAPING=false
# VENICE_INCLUDE_SYSTEM_PROMPT=false

Comment on lines +140 to +143
return Err(LlmError::RequestFailed {
provider: "venice".to_string(),
reason: format!("HTTP {}: {}", status, response_text),
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Including the full response body in error messages can lead to excessive memory usage and poor log readability if the API returns a large HTML error page (e.g., from a proxy or WAF). It is recommended to truncate the response text using a character-aware method to avoid panics on multi-byte characters.

References
  1. Always truncate tool output or external API responses for previews or status updates to a reasonable maximum length.
  2. When truncating a UTF-8 string, use character-aware methods to avoid panics caused by slicing in the middle of a multi-byte character.

Comment on lines +212 to +222
async fn refresh_catalog_if_stale(&self) {
let is_stale = {
let catalog = self
.model_catalog
.read()
.expect("model_catalog lock poisoned");
match catalog.fetched_at {
None => true,
Some(t) => t.elapsed() > CATALOG_TTL,
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This lazy initialization pattern contains a race condition: multiple concurrent requests arriving while the catalog is stale will all pass the is_stale check and trigger redundant fetch_models() network calls. While the final update is thread-safe, you might consider a mechanism to throttle or deduplicate these initial fetch attempts.

Comment on lines +380 to +381
let arguments = serde_json::from_str(&tc.function.arguments)
.unwrap_or(serde_json::Value::Object(Default::default()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Silently falling back to an empty object when tool arguments fail to parse can make debugging difficult. If the LLM produces malformed JSON, it's better to log a warning or propagate the error so the failure is transparent.

                let arguments = serde_json::from_str(&tc.function.arguments).unwrap_or_else(|e| {
                    tracing::warn!("Failed to parse Venice tool arguments: {}. Raw: {}", e, tc.function.arguments);
                    serde_json::Value::Object(Default::default())
                });

Add VeniceEmbeddings as a third embeddings provider alongside OpenAI
and NEAR AI, allowing users who set LLM_BACKEND=venice to use Venice
for semantic search without needing a separate OpenAI API key.

Changes:
- Add VeniceEmbeddings struct with OpenAI-compatible API support
- Add venice_api_key and venice_base_url to EmbeddingsConfig
- Add "venice" match arms in both embedding creation sites in main.rs
- Add Venice option to the setup wizard embeddings step
- Auto-enable embeddings when VENICE_API_KEY is set
- Add EMBEDDING_PROVIDER=venice to .env.example

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@tribendu
Copy link

Summary

This PR adds Venice AI as a fully-featured LLM provider in the IronClaw project, including configuration support (800+ lines in src/llm/venice.rs), embeddings provider, and setup wizard integration. The implementation follows existing provider patterns, supports Venice-specific features (web search, web scraping), includes lazy-loading model catalog with pricing, and has comprehensive unit tests (15+ tests covering message conversion, parameter serialization, cost calculation).

Pros

  • Comprehensive test coverage: 15+ unit tests covering message conversion, tool calls, parameter serialization, cost calculation, and edge cases
  • Good documentation: Inline comments justify design decisions (e.g., std::sync::RwLock vs tokio::sync::RwLock)
  • Security-first: Uses secrecy::SecretString for API keys, proper error types for auth failures
  • Lazy catalog loading: Model catalog fetched only when needed with 1-hour TTL, graceful fallback on errors
  • Venice-specific features: Supports web search ("off"/"on"/"auto"), web scraping, and system prompt control
  • Complete integration: Includes embeddings provider, sandbox allowlist, setup wizard, and environment variable support
  • Consistent with codebase: Follows same patterns as NearAI/OpenAI providers
  • Cost calculation: Parses dynamic pricing from /models endpoint with per-token rate calculation

Concerns

  • Lock poisoning: Uses expect("model_catalog lock poisoned") - will panic on double panic instead of graceful degradation
  • Request type duplication: Defines many Venice-specific request/response types (VeniceChatRequest, ChatCompletionMessage, etc.) that could be reused with OpenAI-compatible provider
  • Hardcoded timeout: 120-second timeout in src/llm/venice.rs:67 should be configurable
  • No integration tests: Only unit tests; no mock server tests for API interactions
  • Stale cache silent: Failed catalog refresh keeps stale cache without user notification - could return outdated pricing
  • Rate limiting: Logs warning but provides no user-facing guidance on rate limits or backoff strategy
  • Embeddings dimension: Hardcoded dimension mapping (1536 for small, 3072 for large) embedded in main.rs setup/118 instead of derived from model

Suggestions

  1. Lock safety: Replace expect() poisoned lock handling with graceful fallback or use poison::into_inner()
  2. Request type reuse: Investigate if OpenAI-compatible provider can handle Venice's venice_parameters to reduce duplication
  3. Configurable timeout: Export timeout as config field in VeniceConfig
  4. Add integration tests: Use wiremock or similar to test actual request/response cycles, especially for /models pricing parsing
  5. Cache notifications: Log cached catalog age in warnings, add telemetry for stale cache hits
  6. Rate limit handling: Consider adding retry-after header parsing and user-facing rate limit errors with guidance
  7. Dimension mapping: Extract dimension mapping to a helper function or derive from response metadata instead of hardcoded values
  8. Request deduplication: Consider deduplicating simultaneous catalog refresh requests to reduce API load

Overall: This is a well-structured PR that thoughtfully integrates Venice AI as a first-class provider. The code quality is high, with excellent documentation and test coverage. The concerns listed are minor and mostly about robustness in edge cases. This PR is ready to merge once suggestions are considered (especially lock safety handling).

Copy link
Contributor

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid work adding Venice AI as a first-class provider. The implementation follows the existing provider patterns well, and the test coverage (15+ unit tests) is appreciated. A few things to address before merging:

Must fix:

  1. Lock poisoning -- expect("model_catalog lock poisoned") in the catalog cache will panic on double-panic scenarios. Replace with graceful fallback:

    match self.model_catalog.read() {
        Ok(guard) => { /* use guard */ }
        Err(poisoned) => {
            tracing::warn!("model catalog lock was poisoned, recovering");
            let guard = poisoned.into_inner();
            // use guard
        }
    }
  2. Hardcoded timeout -- The 120-second timeout should be configurable via VeniceConfig (add a timeout_secs field, default to 120). Other providers will likely want this too eventually.

Should fix:

  1. Request type duplication -- VeniceChatRequest, ChatCompletionMessage, ChatCompletionTool, etc. are very close to what an OpenAI-compatible provider would use. Consider whether the existing OpenAI-compat types could be reused with Venice-specific extensions (the venice_parameters field). This would reduce maintenance surface.

  2. Embeddings dimension mapping -- Hardcoded dimension mapping (1536/3072) in main.rs should be derived from the model or moved to a helper function rather than inlined in startup logic.

Nice to have:

  1. Integration tests -- Unit tests cover serialization and cost math well, but a wiremock test for the /models catalog fetch + cache TTL behavior would catch edge cases in the HTTP layer.

  2. Stale cache visibility -- When catalog refresh fails and stale cache is used, log the cache age so operators can tell if pricing data is hours or days old.

The architecture is sound and Venice-specific features (web search modes, web scraping, system prompt control) are cleanly modeled. Looking forward to the next revision.

Copy link
Member

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: feat: add Venice AI as first-class LLM backend

I reviewed the full diff (1205 additions, 6 deletions) across all 10 changed files. Overall this is a well-structured PR that follows the established provider patterns closely. Below is my detailed analysis.


LlmProvider Trait Implementation

The VeniceProvider correctly implements all required trait methods (complete, complete_with_tools, model_name, cost_per_token) and overrides the appropriate optional methods (list_models, model_metadata, active_model_name, set_model). The implementation mirrors the patterns established by NearAiChatProvider and OpenAiCompatibleChatProvider.

The complete_with_tools implementation handles tool calling natively (no flatten_tool_messages hack) which is the correct approach for a provider that supports role: "tool" messages.

Correctness Issues

1. Duplicate embeddings setup in src/main.rs (minor, not a regression)

There are two nearly identical blocks setting up Venice embeddings -- one around line 115 (inside the memory CLI subcommand path) and another around line 489 (in the main agent startup path). This isn't introduced by this PR (the same duplication exists for OpenAI/NearAI), but the PR propagates it. Consider extracting a helper function that maps (provider_name, config) -> Option<Arc<dyn EmbeddingProvider>> to avoid the 4x duplication. Not a blocker.

2. Missing "venice" arm in second embeddings block has fallthrough to OpenAI

In the main agent embeddings setup (around the _ => fallthrough arm at line 578 of main.rs), if someone sets EMBEDDING_PROVIDER=venice but there's a code path that doesn't match the "venice" string, it would silently fall through to trying OpenAI. The PR does add the "venice" arm correctly in both locations, so this is fine as-is. Just noting the fragile pattern.

.unwrap() / .expect() Audit

Production code:

  • .expect("model_catalog lock poisoned") and .expect("active_model lock poisoned") -- Acceptable. This is the established pattern used by NearAiProvider and NearAiChatProvider for std::sync::RwLock guards. The documented invariant (guards never held across .await) makes poisoning a genuine "should never happen" scenario.
  • .unwrap_or_else(|_| Client::new()) in VeniceProvider::new -- Acceptable. Reasonable fallback if Client::builder() fails.
  • .unwrap_or_default() on response.text().await and choice.message.content -- Acceptable. Same pattern as existing providers.
  • serde_json::from_str(&tc.function.arguments).unwrap_or(serde_json::Value::Object(Default::default())) -- Acceptable. Same pattern as nearai_chat.rs and openai_compatible_chat.rs.

Test code:

  • All .unwrap() / .expect() in #[cfg(test)] mod tests -- Fine per project rules.

No violations found.

Import Style

  • Uses crate:: imports throughout (crate::config::VeniceConfig, crate::error::LlmError, crate::llm::provider::..., crate::llm::costs). Correct.
  • use super::*; only in #[cfg(test)] mod tests -- acceptable and consistent with other providers.

Error Handling

Error handling is thorough and follows established patterns:

  • HTTP 401 maps to LlmError::AuthFailed
  • HTTP 429 maps to LlmError::RateLimited (with retry_after: None -- Venice doesn't seem to provide this header, which is fine)
  • All other non-success responses map to LlmError::RequestFailed with status and body
  • JSON parse failures map to LlmError::InvalidResponse
  • The VeniceEmbeddings provider uses the same error hierarchy (EmbeddingError::AuthFailed, HttpError, RateLimited) correctly

Design Quality

Model catalog with TTL cache -- Clean design. The lazy fetch with 1-hour TTL and graceful fallback to default_cost() is a nice touch. The cost_per_token() sync method correctly reads from the cached catalog without async overhead, and list_models()/model_metadata() trigger refresh when stale.

VeniceParameters -- Good use of #[serde(skip_serializing_if = "Option::is_none")] to keep them out of the request JSON when unconfigured. The build_venice_parameters() method correctly returns None when no Venice-specific options are set.

Config validation -- The VENICE_WEB_SEARCH validation (must be "off"/"on"/"auto") with a clear error message is solid.

Suggestions (Non-blocking)

  1. Consider tracing::warn! for tool call argument parse failures -- The line serde_json::from_str(&tc.function.arguments).unwrap_or(serde_json::Value::Object(Default::default())) silently swallows parse errors. Adding a tracing::warn! here would help debugging. (Same issue exists in existing providers, so this is a codebase-wide improvement, not specific to this PR.)

  2. The VeniceEmbeddings struct stores api_key as a plain String rather than SecretString. In the LLM provider, VeniceConfig.api_key is correctly typed as SecretString, but the embeddings provider receives it after .expose_secret() is called in main.rs. This is the same pattern used by OpenAiEmbeddings so it's consistent, but worth noting.

  3. The VeniceEmbeddings response types (VeniceEmbeddingRequest, VeniceEmbeddingResponse, VeniceEmbeddingData) are identical to the OpenAI embedding types. If Venice truly uses an OpenAI-compatible embedding endpoint, consider reusing the existing OpenAI embedding types or extracting a shared type. Not a blocker since the types are small.

Test Coverage

12 unit tests covering:

  • Message conversion (user, system, assistant, tool, assistant with tool calls)
  • Tool call argument serialization
  • Venice parameter serialization (with all fields, with none, with partial fields)
  • Cost calculation from catalog (with model, without model/fallback)
  • Embeddings config

Good coverage of the core logic. The tests correctly use #[cfg(test)] and test-only super::* imports.

Files Review Summary

File Status
src/llm/venice.rs Clean, follows existing patterns
src/config.rs Correct config struct, validation, env var handling
src/llm/mod.rs Correct module registration and factory
src/main.rs Venice embeddings wired in both code paths
src/workspace/embeddings.rs Clean VeniceEmbeddings implementation
src/workspace/mod.rs Correct pub use addition
src/sandbox/config.rs api.venice.ai in allowlist
src/settings.rs Doc comment updated
src/setup/wizard.rs Venice option in setup wizard
.env.example Well-documented env vars

Verdict

This is a solid, well-implemented PR that follows the project's established patterns for adding a new LLM provider. The code is clean, error handling is thorough, and tests cover the key logic paths. No blocking issues found.

Copy link
Member

@ilblackdragon ilblackdragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Clean implementation that follows the established provider patterns. No .unwrap()/.expect() violations in production code, correct crate:: imports, thorough error handling, and good test coverage. The suggestions in my detailed comment are all non-blocking.

@tribendu
Copy link

Batch 2 PR Review: nearai/ironclaw PRs 111, 110, 109, 103, 95, 74

Reviewer: AI Sub-Agent
Date: February 17, 2026
Review Method: Manual diff analysis with focus on code quality, bugs, security, performance, test coverage, and Rust best practices


PR #111: Fix backwards compatibility for nearai.session_token

Summary

This PR adds backwards compatibility for the nearai session management by implementing a fallback mechanism. When nearai.session_token is not found, the system now attempts to fall back to the legacy nearai.session setting. The change is minimal and focused on the session token retrieval logic in SessionManager::renew_session(). The implementation uses a graceful fallback pattern with proper error handling and logging.

Pros

  • Graceful degradation: Implements backwards compatibility without breaking existing functionality
  • Clear error handling: Preserves original error handling for both primary and fallback paths
  • Informative logging: Adds a warning message when falling back to legacy session storage
  • Minimal change scope: Only affects session token retrieval logic, reducing risk of regression
  • Proper pattern: Uses idiomatic Rust if let Some(...) for optional value handling

Concerns

  • Technical debt: This adds maintenance burden by keeping two parallel session storage schemas
  • No migration path: There's no clear deprecation timeline or migration strategy for the legacy schema
  • Silent fallback: The warning log may be missed in production, leading to continued use of deprecated schema
  • Error propagation: The error messages remain generic - they could be more specific about whether the primary or fallback path failed

Suggestions

  • Add a migration function to convert legacy sessions to the new format periodically
  • Consider adding telemetry/metrics to track usage of the legacy fallback path
  • Document the expected lifecycle and deprecation timeline for nearai.session
  • Add a configuration option to disable the fallback after migration is complete
  • Consider making the fallback opt-in rather than silent to encourage migration

PR #110: Add env docs for local LLM providers

Summary

This PR updates the .env.example file to document environment variables for local LLM providers including Ollama, LM Studio, vLLM, and other OpenAI-compatible endpoints. The changes are purely documentation-focused, providing clear examples of environment variable configurations for users who want to use local models instead of the default NEAR AI backend.

Pros

  • User-friendly: Makes it easy for new users to configure local LLM providers
  • Comprehensive: Covers multiple backend options (Ollama, OpenAI-compatible)
  • Clear defaults: Shows default values for URLs and ports
  • Well-organized: Groups related configurations logically
  • No code changes: Documentation-only PR, zero risk of breaking functionality

Concerns

  • No validation: Environment variables are documented but there's no validation that the backend value matches the selected provider
  • Incomplete examples: Example model names are specific but may not match what users have installed
  • Missing authentication notes: Doesn't document that LLM_API_KEY is optional for local servers

Suggestions

  • Add comments explaining when LLM_API_KEY is optional vs required
  • Include a link to documentation for each provider
  • Add a note about model availability requiring local installation
  • Consider adding environment variable validation logic in a future PR
  • Document how to switch between providers at runtime
  • Add example commands to test connections (e.g., curl http://localhost:11434/api/tags)

PR #109: Normalize memory search query and update marked.js

Summary

This PR addresses two security and stability issues in the web interface. First, it adds input normalization for memory search queries to prevent excessive query lengths and invalid input types. Second, it updates the marked.js dependency to a specific version with integrity hashing for supply chain security. The changes are defensive in nature, preventing potential DoS attacks and ensuring the integrity of third-party JavaScript dependencies.

Pros

  • Input validation: Adds MEMORY_SEARCH_QUERY_MAX_LENGTH constant (100 chars) to prevent excessively long queries
  • Type safety: Normalizes queries to string type, preventing crashes from non-string input
  • Supply chain security: Uses SRI (Subresource Integrity) hashing for marked.js CDN dependency
  • Consistent application: Applies normalization throughout search functions (searchMemory, snippetAround, highlightQuery)
  • Clear error handling: Early return when normalized query is empty

Concerns

  • Arbitrary limit: 100-character limit may be too restrictive for complex semantic search queries
  • Silent truncation: Queries longer than the limit are silently truncated without user feedback
  • CDN dependency: Still relies on external CDN for marked.js despite SRI - could use bundled version
  • No rate limiting: Query length limit doesn't prevent rapid-fire spam queries
  • Client-side only: Validation happens on client side only - server should also validate

Suggestions

  • Increase the limit or make it configurable (e.g., 200-500 characters)
  • Add user feedback when query is truncated (toast message or inline warning)
  • Implement rate limiting on the /api/memory/search endpoint
  • Add server-side query validation to defend against bypassed client checks
  • Consider bundling marked.js with the application to eliminate CDN dependency
  • Add unit tests for the normalization function covering edge cases (null, undefined, empty string, very long string)

PR #103: Per-request model override for OpenAI-compatible API

Summary

This is a significant feature PR that adds per-request model override capability across the entire LLM provider ecosystem. Previously, all requests used the active model, but now clients can specify a different model per request. The changes span multiple modules: request structs now include optional model fields, providers check for request-level models before falling back to defaults, OpenAI-compatible endpoint validates model names, and comprehensive integration tests verify functionality. The PR includes migration documentation and removes the previous strict model validation that returned 404 for non-active models.

Pros

  • Well-architected: Uses Option with builder pattern (with_model()) for clean API
  • Comprehensive testing: Updated integration tests to mock model tracking and verify model override works
  • Full stack coverage: Changes propagate from API entry points through worker layer to actual providers
  • Good error handling: Validates model name length (MAX_MODEL_NAME_BYTES: 256) before processing
  • Backwards compatible: Existing behavior preserved when model field is not provided
  • Code quality: Consistent pattern across CompletionRequest and ToolCompletionRequest
  • Documentation: Updates FEATURE_PARITY.md to reflect new capability

Concerns

  • No model validation: The endpoint now accepts ANY model name without checking if it exists or is configured
  • Validation bypasses providers: Validation happens at HTTP layer, but providers don't validate model availability
  • Security implications: Users can request models they shouldn't have access to; cost tracking may break
  • Incomplete fallback: If provider doesn't support the requested model, behavior is undefined
  • Missing permissions: No RBAC or policy checking for model override capability
  • Potential for confusion: Active model concept remains but can be overridden per request

Suggestions

  • Add a method to validate if a model exists/can be used before making the request
  • Implement optional model allowlist/denylist configuration per user or API key
  • Add telemetry to track model usage patterns across different requests
  • Document the precedence: request model > active model > provider default
  • Consider adding a force_model flag that fails fast if model is unavailable
  • Add tests for error cases (non-existent model, invalid characters, empty string)
  • Update cost tracking to handle per-request model pricing differences
  • Consider adding model availability check to LlmProvider trait's set_model() method

PR #95: Add Venice AI provider and embeddings

Summary

This PR adds comprehensive support for Venice AI as both an LLM provider and an embeddings provider. It introduces a new VeniceProvider module with full API integration including Venice-specific features like web search, web scraping, and dynamic model pricing fetched from the /models endpoint. The PR also adds VeniceEmbeddings for semantic search, updates configuration handling, and integrates Venice into the setup wizard. The implementation includes extensive unit tests, proper error handling, caching of model catalogs with TTL, and follows the existing provider patterns.

Pros

  • Fully featured: Implements complete Venice API including proprietary parameters (web search, scraping, system prompt)
  • Dynamic pricing: Fetches model catalog with pricing from Venice API, updates cost tracking automatically
  • Good caching: Uses 1-hour TTL for model catalog to reduce API calls
  • Comprehensive testing: Extensive unit tests for message conversion, serialization, cost calculation
  • Type safety:Uses properly typed structures and enums for Venice-specific functionality
  • Consistent patterns: Follows existing provider patterns (NearAiProvider, etc.)
  • Error handling: Proper HTTP error mapping (401→AuthFailed, 429→RateLimited)
  • Config validation: Validates Venice-specific config values (web_search must be "off"/"on"/"auto")

Concerns

  • Large file: venice.rs is 880 lines - consider splitting into smaller modules
  • Lock choice: Uses std::sync::RwLock with comment about async safety - risk if async code is added later
  • Secret handling: API key exposed as String in multiple places despite being wrapped in SecretString in Config
  • No retry logic: Network requests don't have exponential backoff for transient failures
  • Hardcoded timeout: 120-second timeout may be too long or too short for different use cases
  • Cache misses on errors: If catalog fetch fails, stale cache is kept indefinitely
  • Embeddings dimension hardcoding: Magic numbers for dimensions (1536, 3072) scattered in code

Suggestions

  • Split venice.rs into multiple modules: provider.rs, models.rs, api.rs
  • Consider using tokio::sync::RwLock for future-proofing with async code
  • Add retry logic with exponential backoff for API requests
  • Make timeout configurable via VeniceConfig
  • Add metrics for cache hit/miss ratio
  • Document the security model: when are secrets exposed?
  • Extract dimension mapping to a constant or helper function
  • Add integration test that calls actual Venice API with test credentials
  • Consider adding health check endpoint for provider connectivity
  • Add support for streaming responses (if Venice API supports it)

PR #74: Security fix: Enhanced HTTP response size validation

Summary

This PR strengthens the HTTP tool's defense against OOM attacks by implementing two-stage response size validation. First, it checks the Content-Length header before downloading any content to immediately reject oversized responses. Second, it streams the response body with a hard size cap during download, protecting against malicious or misconfigured servers that may send incorrect or missing Content-Length headers. The implementation uses streaming with futures::StreamExt and enforces a consistent 5 MB limit across both stages.

Pros

  • Defense in depth: Two-stage validation (header check + streaming cap) protects against multiple attack vectors
  • Early rejection: Aborts request before downloading potentially malicious content
  • Streaming approach: Memory-efficient, doesn't load entire response before checking size
  • Informative logging: Warns with URL and size details when rejecting responses
  • Consistent constant: Uses same 5 MB limit as WASM HTTP wrapper
  • Well-documented: Clear comments explain the security rationale and 5 MB justification
  • Test coverage: Includes test verifying the constant value is reasonable

Concerns

  • Header spoofing: Relies on server sending accurate Content-Length - malicious servers can send small value then send large body
  • No byte limit on individual chunks: Only checks cumulative size; individual chunks could still be large
  • Error message exposure: Returns detailed size information in errors which might leak information
  • No rate limiting: An attacker could still send many small requests to exhaust resources
  • Hardcoded limit: 5 MB may be too small for some legitimate use cases (e.g., downloading large JSON datasets)

Suggestions

  • Add per-chunk size limit (e.g., 1 MB max per chunk) in addition to cumulative limit
  • Consider making MAX_RESPONSE_SIZE configurable per tool or per request
  • Add telemetry/metrics for rejected responses to detect DDoS patterns
  • Implement request rate limiting in addition to size limiting
  • Add option to override limit for authorized users/admins
  • Consider adding a timeout for the streaming operation
  • Add test cases for Content-Length header spoofing attempt
  • Document how users can work around the limit if needed (e.g., using multiple smaller requests with pagination)

Overall Recommendations

High Priority

  1. PR feat: support per-request model override in /v1/chat/completions #103: Address the security implications of unrestricted model override before merging - add allowlist/denylist functionality
  2. PR fix: check Content-Length before downloading HTTP response body #74: Consider making the size limit configurable for flexibility while maintaining security

Medium Priority

  1. PR feat: add Venice AI as first-class LLM backend #95: Refactor the large venice.rs file and add retry logic for network resilience
  2. PR web: add integrity check for marked CDN and cap highlight regex input #109: Add server-side validation to complement client-side checks
  3. PR llm: fallback to legacy nearai.session key when loading DB session #111: Add migration strategy for legacy session tokens

Low Priority

  1. PR docs: add .env.example examples for Ollama and OpenAI-compatible #110: Add validation documentation and testing commands
  2. PR feat: add Venice AI as first-class LLM backend #95: Extract magic numbers and add integration tests
  3. PR web: add integrity check for marked CDN and cap highlight regex input #109: Consider removing CDN dependency for marked.js

General Observations

  • All PRs follow Rust best practices and maintain code quality
  • Test coverage is generally good, though some integration tests could be expanded
  • Error handling is consistent across the codebase
  • Documentation (comments and inline docs) is clear and helpful
  • Most PRs are backwards compatible, which is good for production deployments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments