Skip to content

Refactor LLM profile management#16

Open
enyst wants to merge 85 commits intomainfrom
agent-sdk-18-profile-manager
Open

Refactor LLM profile management#16
enyst wants to merge 85 commits intomainfrom
agent-sdk-18-profile-manager

Conversation

@enyst
Copy link
Owner

@enyst enyst commented Oct 26, 2025

Summary

  • fold the LLM profile manager responsibilities into LLMRegistry
  • teach conversation persistence to expand profile references without mutating LLM instances
  • document the inline vs profile-reference behavior and cover it with updated tests

Testing

  • uv run pre-commit run --all-files
  • pytest tests/sdk/llm/test_llm_registry_profiles.py

Summary by CodeRabbit

  • New Features

    • LLM profile management system: save, load, list, and validate profiles on disk.
    • Profile-based LLM configuration with profile references for streamlined serialization.
    • Added GPT-5 Mini model profile.
  • Documentation

    • New standalone example demonstrating LLM profile creation and usage.
  • Tests

    • Comprehensive test coverage for profile management and serialization workflows.

✏️ Tip: You can customize this high-level summary in your review settings.

openhands-agent and others added 30 commits October 18, 2025 16:18
…authored-by: openhands <openhands@all-hands.dev>
…sation startup\n\n- ProfileManager manages ~/.openhands/llm-profiles/*.json (load/save/list/register)\n- LocalConversation now calls ProfileManager.register_all to eagerly populate LLMRegistry\n\nCo-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
- embed profile lifecycle APIs into the registry
- update persistence helpers, docs, and examples to use registry
- replace profile manager tests with registry profile coverage

Co-authored-by: openhands <openhands@all-hands.dev>
- note that LLMRegistry is the unified entry point for disk and runtime profiles
- mention how to override the profile directory when embedding the SDK

Co-authored-by: openhands <openhands@all-hands.dev>
- rename payload helpers to resolve_llm_profiles/compact_llm_profiles
- update conversation state to use clearer helper names
- drop the optional agent_settings convenience module and its tests

Co-authored-by: openhands <openhands@all-hands.dev>
- replace the _transform flag with dedicated _compact/_resolve helpers
- make compact_llm_profiles/resolve_llm_profiles easier to follow by delegating to the new helpers

Co-authored-by: openhands <openhands@all-hands.dev>
Bring in new package layout and port LLM profile switching support.
Revert the in-progress switch_llm helpers and tests; agent-sdk-18 branch now only contains LLM profile persistence.
Example 25 now performs a read/write/delete workflow and verifies the persisted profile reference.
- move inline/profile compaction into LLM serializer/validator
- use model_dump_json context in ConversationState persistence
- add persistence settings module and cover profile reference tests
- document persistence comparison and recommendations
@enyst enyst force-pushed the agent-sdk-18-profile-manager branch from b17a63d to fab1d57 Compare December 30, 2025 00:16
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
examples/llm-profiles/gpt-5-mini.json (1)

1-7: LGTM!

The profile correctly uses usage_id (the canonical field) instead of the deprecated service_id. The configuration provides a complete example for users to follow.

examples/01_standalone_sdk/31_llm_profiles.py (1)

1-1: Documentation required for this example.

This example file still needs corresponding documentation in the docs repository, as flagged by the CI check in a previous review.

🧹 Nitpick comments (20)
examples/02_remote_agent_server/07_convo_with_apptainer_sandboxed_server.py (2)

31-36: Consider documenting platform assumptions.

The function defaults to linux/amd64 for unrecognized architectures. Since this example targets Apptainer (Linux containers), this is likely acceptable, but consider adding a comment documenting this assumption or logging a warning for unexpected platforms.

🔎 Optional enhancement with logging
 def detect_platform():
     """Detects the correct platform string."""
     machine = platform.machine().lower()
     if "arm" in machine or "aarch64" in machine:
         return "linux/arm64"
+    # Default to amd64 for x86_64 and other architectures
     return "linux/amd64"

67-73: Consider checking the command exit code.

The workspace test command doesn't verify result.exit_code. For an example that users might copy, demonstrating proper error checking would be valuable.

🔎 Suggested enhancement
 result = workspace.execute_command(
     "echo 'Hello from sandboxed environment!' && pwd"
 )
 logger.info(
     f"Command '{result.command}' completed with exit code {result.exit_code}"
 )
+if result.exit_code != 0:
+    logger.error(f"Command failed: {result.stderr}")
+else:
+    logger.info(f"Output: {result.stdout}")
-logger.info(f"Output: {result.stdout}")
openhands-tools/openhands/tools/browser_use/server.py (1)

63-66: Clarify the dual field name handling.

The code checks for both "key" and "name" fields when retrieving storage item keys. While this defensive approach may handle different data formats, it's unclear why both field names are necessary.

Consider adding a comment explaining the rationale, or if one format is standard (e.g., from _cdp_get_storage_state), simplify to use only that field name.

🔎 Example clarification comment
 # Set localStorage
 for item in origin_data.get("localStorage", []):
+    # Handle both 'key' (CDP format) and 'name' (legacy format)
     key = item.get("key") or item.get("name")

Also applies to: 80-83

openhands-tools/openhands/tools/browser_use/definition.py (2)

63-84: Catch specific exceptions instead of bare Exception.

The bare except Exception on line 83 makes debugging difficult when screenshot saving fails. Consider catching specific exceptions like OSError, ValueError, or IOError and optionally logging the error before returning None.

🔎 Proposed fix
         return str(file_path)
-    except Exception:
+    except (OSError, ValueError) as e:
+        from openhands.sdk import get_logger
+        logger = get_logger(__name__)
+        logger.debug(f"Failed to save screenshot: {e}")
         return None

582-668: LGTM!

The storage tool definitions are well-structured with appropriate annotations:

  • BrowserGetStorageTool: Correctly marked as read-only and idempotent
  • BrowserSetStorageTool: Correctly marked as destructive and non-idempotent

The descriptions clearly explain the purpose and usage of each tool.

Note: Line 630 has an unused # noqa: E501 directive that can be removed per static analysis.

openhands-tools/openhands/tools/preset/gemini.py (1)

91-92: Docstring is incomplete—missing TerminalTool and TaskTrackerTool.

The updated docstring only lists the Gemini-style file editing tools but omits other tools returned by the function. The agent actually includes TerminalTool and TaskTrackerTool (and optionally BrowserToolSet), which are visible in the get_gemini_tools() implementation at lines 66-77.

Consider either:

  1. Listing all tools in the docstring for completeness, or
  2. Clarifying that these are the key Gemini-specific tools (as opposed to TerminalTool/TaskTrackerTool which are common across presets)
Suggested docstring improvements

Option 1: List all tools

-    """Get an agent with gemini-style tools: read_file, write_file, edit,
-    list_directory."""
+    """Get an agent with gemini-style tools.
+    
+    Includes: terminal, read_file, write_file, edit, list_directory, 
+    task_tracker, and optionally browser tools.
+    """

Option 2: Emphasize the Gemini-specific difference

-    """Get an agent with gemini-style tools: read_file, write_file, edit,
-    list_directory."""
+    """Get an agent with gemini-style file editing tools.
+    
+    Uses read_file, write_file, edit, list_directory instead of the 
+    standard claude-style file_editor tool.
+    """
examples/04_llm_specific_tools/02_gemini_file_tools.py (1)

43-44: Consider documenting the workspace behavior.

Using os.getcwd() as the workspace means FACTS.txt will be created in the directory where the script is executed. While this works for a simple example, consider adding a comment to make this explicit, or use a temporary directory to avoid polluting the execution directory.

Optional: Use a temporary directory
 import os
+import tempfile

 from openhands.sdk import LLM, Agent, Conversation, Tool
 from openhands.tools.gemini import GEMINI_FILE_TOOLS
 from openhands.tools.terminal import TerminalTool


 # Route logs in their own directory for easy tracing
 _log_dir = "logs/gemini"
 os.makedirs(_log_dir, exist_ok=True)

 llm = LLM(
     model=os.getenv("LLM_MODEL", "gemini/gemini-3-pro-preview"),
     api_key=os.getenv("LLM_API_KEY"),
     base_url=os.getenv("LLM_BASE_URL", None),
     log_completions=True,
     log_completions_folder=_log_dir,
 )

 agent = Agent(
     llm=llm,
     tools=[
         Tool(name=TerminalTool.name),
         *GEMINI_FILE_TOOLS,  # Instead of Tool(name=FileEditorTool.name)
     ],
 )

-cwd = os.getcwd()
-conversation = Conversation(agent=agent, workspace=cwd)
+# Use a temporary directory to avoid polluting the execution directory
+with tempfile.TemporaryDirectory() as tmpdir:
+    conversation = Conversation(agent=agent, workspace=tmpdir)

-# Ask the agent to create a file, then delete it afterwards
-conversation.send_message("Write 3 facts about the current project into FACTS.txt.")
-conversation.run()
+    # Ask the agent to create a file, then delete it afterwards
+    conversation.send_message("Write 3 facts about the current project into FACTS.txt.")
+    conversation.run()

-conversation.send_message("Now delete the FACTS.txt file you just created.")
-conversation.run()
+    conversation.send_message("Now delete the FACTS.txt file you just created.")
+    conversation.run()

-# Report cost
-cost = llm.metrics.accumulated_cost
-print(f"EXAMPLE_COST: {cost}")
+    # Report cost
+    cost = llm.metrics.accumulated_cost
+    print(f"EXAMPLE_COST: {cost}")

Alternatively, just add a comment:

+# Note: FACTS.txt will be created in the directory where this script is run
 cwd = os.getcwd()
 conversation = Conversation(agent=agent, workspace=cwd)
openhands-tools/openhands/tools/preset/gpt5.py (3)

35-56: Consider consolidating tool imports.

Tools are imported in both register_gpt5_tools() (lines 21-30) and get_gpt5_tools() (lines 43-53). Since get_gpt5_tools() calls register_gpt5_tools(), the duplicate imports add minimal overhead but reduce code clarity.

🔎 Proposed consolidation
 def get_gpt5_tools(enable_browser: bool = True) -> list[Tool]:
     """Get the GPT-5 tool specifications using ApplyPatchTool for edits.
 
     Args:
         enable_browser: Whether to include browser tools.
     """
     register_gpt5_tools(enable_browser=enable_browser)
 
-    from openhands.tools.apply_patch import ApplyPatchTool
-    from openhands.tools.task_tracker import TaskTrackerTool
-    from openhands.tools.terminal import TerminalTool
-
     tools: list[Tool] = [
-        Tool(name=TerminalTool.name),
-        Tool(name=ApplyPatchTool.name),
-        Tool(name=TaskTrackerTool.name),
+        Tool(name="TerminalTool"),
+        Tool(name="ApplyPatchTool"),
+        Tool(name="TaskTrackerTool"),
     ]
     if enable_browser:
-        from openhands.tools.browser_use import BrowserToolSet
-
-        tools.append(Tool(name=BrowserToolSet.name))
+        tools.append(Tool(name="BrowserToolSet"))
     return tools

71-73: Consider more descriptive usage_id.

The usage_id "condenser" is less specific than patterns used in other presets (e.g., "planning_condenser" in planning.py). Consider using "gpt5_condenser" for consistency and clearer metrics attribution.

🔎 Proposed change
         condenser=get_gpt5_condenser(
-            llm=llm.model_copy(update={"usage_id": "condenser"})
+            llm=llm.model_copy(update={"usage_id": "gpt5_condenser"})
         ),

19-33: Function name is misleading—it logs tools without explicit registration.

register_gpt5_tools() only imports tools (to trigger auto-registration) and logs debug messages. It performs no explicit registration to a registry. Since the pattern mirrors register_default_tools() and register_planning_tools(), which contain similar comments about auto-registration on import, consider renaming to better reflect actual behavior (e.g., log_gpt5_tools()) or updating the docstring to clarify that tools are auto-registered upon import.

openhands-sdk/openhands/sdk/tool/registry.py (1)

141-156: Minor: Consider matching the order of type checks with the resolver dispatch.

The module qualname extraction checks typecallableToolDefinition instance, while the resolver dispatch at lines 128-139 checks ToolDefinition instance → typecallable. While this works correctly because Pydantic BaseModel instances aren't callable, aligning the order would improve readability and prevent subtle bugs if ToolDefinition ever becomes callable.

🔎 Suggested reorder to match resolver dispatch
     # Track the module qualname for this tool
     module_qualname = None
-    if isinstance(factory, type):
-        module_qualname = factory.__module__
-    elif callable(factory):
-        module_qualname = getattr(factory, "__module__", None)
-    elif isinstance(factory, ToolDefinition):
+    if isinstance(factory, ToolDefinition):
         module_qualname = factory.__class__.__module__
+    elif isinstance(factory, type):
+        module_qualname = factory.__module__
+    elif callable(factory):
+        module_qualname = getattr(factory, "__module__", None)
openhands-sdk/openhands/sdk/context/view.py (2)

166-189: Address unused loop variable llm_response_id.

Static analysis flags llm_response_id as unused in the loop body (B007). Prefix with underscore to indicate intentional non-use.

🔎 Proposed fix
-        for llm_response_id, action_indices in batches.items():
+        for _llm_response_id, action_indices in batches.items():

251-254: Address unused loop variable has_thinking.

Static analysis flags has_thinking as unused in this loop (B007). Prefix with underscore to indicate intentional non-use.

🔎 Proposed fix
-            for min_idx, max_idx, has_thinking in batch_ranges:
+            for min_idx, max_idx, _has_thinking in batch_ranges:
.github/workflows/pypi-release.yml (2)

201-208: Consider quoting $VERSION to satisfy shellcheck (SC2086).

While $VERSION is a controlled value (version string), quoting prevents potential issues with word splitting if the value ever contains spaces or special characters.

🔎 Proposed fix
             - name: Summary
               run: |
-                  echo "## ✅ Version Bump PRs Created" >> $GITHUB_STEP_SUMMARY
-                  echo "" >> $GITHUB_STEP_SUMMARY
-                  echo "PRs have been created to bump SDK packages to version **$VERSION**:" >> $GITHUB_STEP_SUMMARY
-                  echo "" >> $GITHUB_STEP_SUMMARY
-                  echo "- [OpenHands](https://github.com/All-Hands-AI/OpenHands/pulls?q=is%3Apr+bump-sdk-$VERSION)" >> $GITHUB_STEP_SUMMARY
-                  echo "- [OpenHands-CLI](https://github.com/All-Hands-AI/openhands-cli/pulls?q=is%3Apr+bump-sdk-$VERSION)" >> $GITHUB_STEP_SUMMARY
+                  {
+                    echo "## ✅ Version Bump PRs Created"
+                    echo ""
+                    echo "PRs have been created to bump SDK packages to version **${VERSION}**:"
+                    echo ""
+                    echo "- [OpenHands](https://github.com/All-Hands-AI/OpenHands/pulls?q=is%3Apr+bump-sdk-${VERSION})"
+                    echo "- [OpenHands-CLI](https://github.com/All-Hands-AI/openhands-cli/pulls?q=is%3Apr+bump-sdk-${VERSION})"
+                  } >> "$GITHUB_STEP_SUMMARY"

90-144: Consider handling pre-existing branches and PRs.

If a previous workflow run failed partway through, the branch bump-sdk-$VERSION may already exist, causing git checkout -b to fail. Similarly, gh pr create will fail if a PR for that branch already exists.

🔎 Proposed defensive handling
                   # Create branch
-                  git checkout -b "$BRANCH"
+                  # Delete remote branch if it exists (from failed previous run)
+                  git push origin --delete "$BRANCH" 2>/dev/null || true
+                  git checkout -b "$BRANCH"

And for PR creation:

                   # Create PR
-                  gh pr create \
+                  # Check if PR already exists
+                  if gh pr view "$BRANCH" --repo "$REPO" >/dev/null 2>&1; then
+                    echo "⚠️ PR already exists for branch $BRANCH"
+                  else
+                    gh pr create \
                     --repo "$REPO" \
                     ...
+                  fi
openhands-agent-server/openhands/agent_server/conversation_service.py (1)

407-437: LGTM! Consistent tool registration pattern on conversation resume.

The resume path correctly mirrors the start path behavior, ensuring tools are re-registered when conversations are loaded from persistence.

💡 Optional: Extract common tool registration logic

The tool registration logic is duplicated between start_conversation and __aenter__. Consider extracting to a helper method:

def _register_tools_from_qualnames(
    self, tool_module_qualnames: dict[str, str], context: str = ""
) -> None:
    """Dynamically register tools by importing their modules.
    
    Args:
        tool_module_qualnames: Mapping of tool names to module qualnames
        context: Description for logging (e.g., "when starting conversation X")
    """
    if not tool_module_qualnames:
        return
        
    import importlib
    
    for tool_name, module_qualname in tool_module_qualnames.items():
        try:
            importlib.import_module(module_qualname)
            logger.debug(
                f"Tool '{tool_name}' registered via module '{module_qualname}' {context}"
            )
        except ImportError as e:
            logger.warning(
                f"Failed to import module '{module_qualname}' for tool "
                f"'{tool_name}' {context}: {e}. Tool will not be available."
            )
    
    logger.info(
        f"Dynamically registered {len(tool_module_qualnames)} tools {context}: "
        f"{list(tool_module_qualnames.keys())}"
    )

Then call it from both locations:

# In start_conversation
self._register_tools_from_qualnames(
    request.tool_module_qualnames,
    f"for conversation {conversation_id}"
)

# In __aenter__
self._register_tools_from_qualnames(
    stored.tool_module_qualnames,
    f"when resuming conversation {stored.id}"
)
examples/02_remote_agent_server/05_custom_tool/README.md (2)

26-37: Add language specifier to fenced code block.

The ASCII diagram code block should have a language specifier for proper rendering. Use text or plaintext for ASCII art.

🔎 Proposed fix
-```
+```text
 ┌─────────────────┐         ┌──────────────────────────┐

140-173: Add language specifier to expected output code block.

Use text or plaintext for the expected output block to satisfy markdown linting.

🔎 Proposed fix
-```
+```text
 🔍 Checking for custom base image: custom-base-image:latest
examples/02_remote_agent_server/05_custom_tool/custom_tools/log_data.py (2)

151-151: Remove unused noqa directive.

The # noqa: E501 directive is unused since E501 (line too long) isn't enabled in this project's linting configuration.

🔎 Proposed fix
-- Log progress: message="Completed analysis", level="info", data={"files_checked": 10}
-"""  # noqa: E501
+- Log progress: message="Completed analysis", level="info", data={"files_checked": 10}
+"""

124-133: Consider wrapping the write operation in error handling.

The file write at lines 126-127 could fail (e.g., disk full, permissions). Currently, an unhandled exception would propagate. For a production tool, consider returning a failed observation instead.

Since this is example code, this is optional, but worth noting for users who adapt this pattern.

🔎 Proposed fix
         # Write back to file
-        self.log_file.parent.mkdir(parents=True, exist_ok=True)
-        with open(self.log_file, "w") as f:
-            json.dump(entries, f, indent=2)
-
-        return LogDataObservation(
-            success=True,
-            log_file=str(self.log_file),
-            entry_count=len(entries),
-        )
+        try:
+            self.log_file.parent.mkdir(parents=True, exist_ok=True)
+            with open(self.log_file, "w") as f:
+                json.dump(entries, f, indent=2)
+        except OSError as e:
+            logger.warning(f"Failed to write log file: {e}")
+            return LogDataObservation(
+                success=False,
+                log_file=str(self.log_file),
+                entry_count=len(entries) - 1,  # exclude failed entry
+            )
+
+        return LogDataObservation(
+            success=True,
+            log_file=str(self.log_file),
+            entry_count=len(entries),
+        )

Note: You'd need to add from openhands.sdk.logger import get_logger and logger = get_logger(__name__) at the top.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b17a63d and fab1d57.

📒 Files selected for processing (52)
  • .github/PULL_REQUEST_TEMPLATE.md
  • .github/scripts/check_documented_examples.py
  • .github/workflows/README-RELEASE.md
  • .github/workflows/assign-reviews.yml
  • .github/workflows/auto-label-issues.yml
  • .github/workflows/check-documented-examples.yml
  • .github/workflows/pr-review-by-openhands.yml
  • .github/workflows/pypi-release.yml
  • .github/workflows/run-examples.yml
  • .github/workflows/stale.yml
  • .github/workflows/todo-management.yml
  • .gitignore
  • examples/01_standalone_sdk/31_llm_profiles.py
  • examples/02_remote_agent_server/01_convo_with_local_agent_server.py
  • examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py
  • examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py
  • examples/02_remote_agent_server/05_custom_tool/Dockerfile
  • examples/02_remote_agent_server/05_custom_tool/README.md
  • examples/02_remote_agent_server/05_custom_tool/build_custom_image.sh
  • examples/02_remote_agent_server/05_custom_tool/custom_tool_example.py
  • examples/02_remote_agent_server/05_custom_tool/custom_tools/__init__.py
  • examples/02_remote_agent_server/05_custom_tool/custom_tools/log_data.py
  • examples/02_remote_agent_server/07_convo_with_apptainer_sandboxed_server.py
  • examples/04_llm_specific_tools/01_gpt5_apply_patch_preset.py
  • examples/04_llm_specific_tools/02_gemini_file_tools.py
  • examples/llm-profiles/gpt-5-mini.json
  • openhands-agent-server/openhands/agent_server/conversation_service.py
  • openhands-agent-server/openhands/agent_server/models.py
  • openhands-agent-server/pyproject.toml
  • openhands-sdk/openhands/sdk/agent/base.py
  • openhands-sdk/openhands/sdk/agent/prompts/model_specific/openai_gpt/gpt-5-codex.j2
  • openhands-sdk/openhands/sdk/context/agent_context.py
  • openhands-sdk/openhands/sdk/context/condenser/llm_summarizing_condenser.py
  • openhands-sdk/openhands/sdk/context/skills/skill.py
  • openhands-sdk/openhands/sdk/context/view.py
  • openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
  • openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py
  • openhands-sdk/openhands/sdk/conversation/state.py
  • openhands-sdk/openhands/sdk/llm/llm.py
  • openhands-sdk/openhands/sdk/llm/llm_registry.py
  • openhands-sdk/openhands/sdk/llm/utils/model_features.py
  • openhands-sdk/openhands/sdk/llm/utils/verified_models.py
  • openhands-sdk/openhands/sdk/persistence/__init__.py
  • openhands-sdk/openhands/sdk/tool/registry.py
  • openhands-sdk/pyproject.toml
  • openhands-tools/openhands/tools/browser_use/__init__.py
  • openhands-tools/openhands/tools/browser_use/definition.py
  • openhands-tools/openhands/tools/browser_use/impl.py
  • openhands-tools/openhands/tools/browser_use/server.py
  • openhands-tools/openhands/tools/preset/__init__.py
  • openhands-tools/openhands/tools/preset/gemini.py
  • openhands-tools/openhands/tools/preset/gpt5.py
💤 Files with no reviewable changes (5)
  • examples/02_remote_agent_server/02_convo_with_docker_sandboxed_server.py
  • openhands-sdk/openhands/sdk/llm/utils/verified_models.py
  • .gitignore
  • examples/02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py
  • examples/02_remote_agent_server/01_convo_with_local_agent_server.py
✅ Files skipped from review due to trivial changes (3)
  • examples/02_remote_agent_server/05_custom_tool/custom_tools/init.py
  • .github/PULL_REQUEST_TEMPLATE.md
  • openhands-sdk/pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
  • openhands-sdk/openhands/sdk/agent/base.py
🧰 Additional context used
🧬 Code graph analysis (15)
examples/02_remote_agent_server/05_custom_tool/build_custom_image.sh (1)
openhands-agent-server/openhands/agent_server/docker/build.py (1)
  • build (519-624)
openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py (1)
openhands-sdk/openhands/sdk/tool/registry.py (1)
  • get_tool_module_qualnames (176-184)
openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py (6)
openhands-sdk/openhands/sdk/llm/llm_registry.py (1)
  • LLMRegistry (35-241)
tests/integration/base.py (1)
  • max_iteration_per_run (213-219)
openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py (2)
  • execution_status (327-335)
  • execution_status (338-347)
openhands-sdk/openhands/sdk/conversation/base.py (1)
  • execution_status (55-57)
openhands-sdk/openhands/sdk/conversation/state.py (1)
  • ConversationExecutionStatus (40-52)
openhands-sdk/openhands/sdk/event/conversation_error.py (1)
  • ConversationErrorEvent (6-25)
openhands-sdk/openhands/sdk/context/agent_context.py (3)
tests/cross/test_agent_secrets_integration.py (1)
  • llm (26-27)
openhands-sdk/openhands/sdk/llm/utils/model_prompt_spec.py (1)
  • get_model_prompt_spec (81-95)
openhands-sdk/openhands/sdk/context/skills/skill.py (1)
  • Skill (36-368)
examples/01_standalone_sdk/31_llm_profiles.py (1)
openhands-sdk/openhands/sdk/llm/llm_registry.py (4)
  • list_profiles (120-123)
  • save_profile (139-170)
  • load_profile (131-137)
  • get (215-236)
examples/02_remote_agent_server/05_custom_tool/custom_tool_example.py (4)
openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py (8)
  • RemoteConversation (429-926)
  • workspace (386-392)
  • run (686-729)
  • append (229-231)
  • agent (377-383)
  • state (650-652)
  • id (322-324)
  • id (646-647)
openhands-sdk/openhands/sdk/tool/spec.py (1)
  • Tool (6-39)
openhands-workspace/openhands/workspace/docker/dev_workspace.py (1)
  • DockerDevWorkspace (10-116)
openhands-tools/openhands/tools/preset/default.py (2)
  • get_default_condenser (60-66)
  • get_default_tools (33-57)
examples/04_llm_specific_tools/01_gpt5_apply_patch_preset.py (1)
openhands-tools/openhands/tools/preset/gpt5.py (1)
  • get_gpt5_agent (64-75)
openhands-tools/openhands/tools/preset/gpt5.py (9)
openhands-sdk/openhands/sdk/agent/agent.py (1)
  • Agent (67-637)
openhands-sdk/openhands/sdk/context/condenser/llm_summarizing_condenser.py (1)
  • LLMSummarizingCondenser (29-238)
openhands-sdk/openhands/sdk/context/condenser/base.py (1)
  • CondenserBase (15-61)
openhands-sdk/openhands/sdk/llm/llm.py (1)
  • LLM (106-1201)
openhands-sdk/openhands/sdk/tool/spec.py (1)
  • Tool (6-39)
openhands-tools/openhands/tools/apply_patch/definition.py (1)
  • ApplyPatchTool (129-178)
openhands-tools/openhands/tools/task_tracker/definition.py (1)
  • TaskTrackerTool (403-431)
openhands-tools/openhands/tools/terminal/definition.py (1)
  • TerminalTool (234-296)
openhands-tools/openhands/tools/browser_use/definition.py (1)
  • BrowserToolSet (671-726)
openhands-tools/openhands/tools/preset/__init__.py (2)
openhands-tools/openhands/tools/preset/gpt5.py (1)
  • get_gpt5_agent (64-75)
openhands-tools/openhands/tools/preset/planning.py (1)
  • get_planning_agent (145-170)
openhands-sdk/openhands/sdk/persistence/__init__.py (1)
openhands-sdk/openhands/sdk/persistence/settings.py (1)
  • should_inline_conversations (13-17)
openhands-sdk/openhands/sdk/tool/registry.py (2)
openhands-sdk/openhands/sdk/tool/tool.py (1)
  • ToolDefinition (147-460)
openhands-sdk/openhands/sdk/mcp/tool.py (1)
  • name (123-125)
openhands-tools/openhands/tools/browser_use/impl.py (2)
openhands-tools/openhands/tools/browser_use/definition.py (2)
  • BrowserGetStorageAction (585-588)
  • BrowserSetStorageAction (626-631)
openhands-tools/openhands/tools/browser_use/server.py (2)
  • _get_storage (16-31)
  • _set_storage (33-104)
examples/04_llm_specific_tools/02_gemini_file_tools.py (1)
openhands-sdk/openhands/sdk/tool/spec.py (1)
  • Tool (6-39)
openhands-sdk/openhands/sdk/llm/llm_registry.py (1)
openhands-sdk/openhands/sdk/llm/llm.py (2)
  • LLM (106-1201)
  • load_from_json (1107-1110)
openhands-tools/openhands/tools/browser_use/definition.py (1)
openhands-sdk/openhands/sdk/tool/tool.py (4)
  • ToolDefinition (147-460)
  • create (207-222)
  • ToolAnnotations (60-92)
  • title (226-229)
🪛 actionlint (1.7.9)
.github/workflows/pypi-release.yml

22-22: shellcheck reported issue in this script: SC2086:info:9:28: Double quote to prevent globbing and word splitting

(shellcheck)


67-67: label "blacksmith-4vcpu-ubuntu-2404" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-slim", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


202-202: shellcheck reported issue in this script: SC2086:info:1:41: Double quote to prevent globbing and word splitting

(shellcheck)


202-202: shellcheck reported issue in this script: SC2086:info:2:12: Double quote to prevent globbing and word splitting

(shellcheck)


202-202: shellcheck reported issue in this script: SC2086:info:3:79: Double quote to prevent globbing and word splitting

(shellcheck)


202-202: shellcheck reported issue in this script: SC2086:info:4:12: Double quote to prevent globbing and word splitting

(shellcheck)


202-202: shellcheck reported issue in this script: SC2086:info:5:102: Double quote to prevent globbing and word splitting

(shellcheck)


202-202: shellcheck reported issue in this script: SC2086:info:6:110: Double quote to prevent globbing and word splitting

(shellcheck)


202-202: shellcheck reported issue in this script: SC2129:style:1:1: Consider using { cmd1; cmd2; } >> file instead of individual redirects

(shellcheck)

🪛 LanguageTool
examples/02_remote_agent_server/05_custom_tool/README.md

[style] ~80-~80: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...l ### 2. Dockerfile The Dockerfile is very simple: ```dockerfile FROM nikolaik/python-nod...

(EN_WEAK_ADJECTIVE)

🪛 markdownlint-cli2 (0.18.1)
examples/02_remote_agent_server/05_custom_tool/README.md

26-26: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


140-140: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.10)
openhands-sdk/openhands/sdk/context/view.py

167-167: Loop control variable llm_response_id not used within loop body

(B007)


251-251: Loop control variable has_thinking not used within loop body

(B007)

examples/01_standalone_sdk/31_llm_profiles.py

113-113: Avoid specifying long messages outside the exception class

(TRY003)

examples/02_remote_agent_server/05_custom_tool/custom_tool_example.py

78-78: subprocess call: check for execution of untrusted input

(S603)


79-79: Starting a process with a partial executable path

(S607)


90-90: subprocess call: check for execution of untrusted input

(S603)


97-97: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


98-98: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


119-119: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


206-206: Probable insecure usage of temporary file or directory: "/tmp/agent_data.json"

(S108)

examples/04_llm_specific_tools/01_gpt5_apply_patch_preset.py

25-25: Avoid specifying long messages outside the exception class

(TRY003)

examples/02_remote_agent_server/05_custom_tool/custom_tools/log_data.py

78-78: Probable insecure usage of temporary file or directory: "/tmp/agent_data.json"

(S108)


151-151: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

openhands-tools/openhands/tools/browser_use/server.py

31-31: Use explicit conversion flag

Replace with conversion flag

(RUF010)


101-101: Consider moving this statement to an else block

(TRY300)


104-104: Use explicit conversion flag

Replace with conversion flag

(RUF010)

openhands-sdk/openhands/sdk/llm/llm_registry.py

108-108: Avoid specifying long messages outside the exception class

(TRY003)


110-110: Avoid specifying long messages outside the exception class

(TRY003)


112-114: Avoid specifying long messages outside the exception class

(TRY003)


136-136: Avoid specifying long messages outside the exception class

(TRY003)


167-167: Do not catch blind exception: Exception

(BLE001)

openhands-tools/openhands/tools/browser_use/definition.py

83-83: Do not catch blind exception: Exception

(BLE001)


630-630: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

openhands-sdk/openhands/sdk/llm/llm.py

406-411: Avoid specifying long messages outside the exception class

(TRY003)


414-416: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-binary-and-test (macos-latest)
  • GitHub Check: build-binary-and-test (ubuntu-latest)
🔇 Additional comments (61)
openhands-agent-server/pyproject.toml (1)

3-3: Verify the version bump strategy aligns with the scope of changes.

The version increments from 1.7.0 to 1.7.2 (skipping 1.7.1). Given the PR introduces new LLM profile management features and significant persistence refactoring, please confirm:

  1. Skipping 1.7.1 is intentional (e.g., it was used in another branch or release)
  2. A patch version bump (vs. minor bump to 1.8.0) aligns with the project's versioning strategy for this scope of changes
.github/scripts/check_documented_examples.py (1)

31-38: No action needed. The regex pattern r"examples/(?:[-\w]+/)+[-\w]+\.py" correctly matches all example files in the repository. Verification confirms that all 54 Python files under examples/ are nested at least one level deep; no flat files exist directly under examples/. The pattern and comment are accurate.

Likely an incorrect or invalid review comment.

openhands-sdk/openhands/sdk/context/condenser/llm_summarizing_condenser.py (1)

39-41: The default value changes are intentional and properly tested.

The test file explicitly documents the rationale:

  • max_size: 120 → 240 to "allow more room for tool loops"
  • keep_first: 4 → 2 to "leave more room for condensation"

These new defaults are validated by test_default_values_are_tuned_for_manipulation_indices and covered by extensive integration tests. The change is safe and behavioral impact is acceptable.

Note: All preset configurations (gpt5, default, gemini) explicitly override these defaults with max_size=80, keep_first=4, so the new defaults only affect users creating LLMSummarizingCondenser directly. However, documentation examples in agent.py and local_conversation.py still reference the old values and should be updated for consistency.

openhands-sdk/openhands/sdk/context/skills/skill.py (2)

414-417: LGTM: Code formatting improvement.

The multi-line formatting with trailing comma improves readability without any behavioral change.


706-707: Remove the unnecessary None guard.

Skill.load cannot return None. The type annotation correctly declares -> "Skill" (line 190), and all code paths in the implementation return a Skill object:

  • _handle_third_party can return None, but if it does, execution continues to frontmatter parsing (line 208-209)
  • The three remaining branches (lines 240, 253, 261) all explicitly return Skill objects

The None check at lines 706-707 is defensive code that will never catch anything. Additionally, the call site at line 414 (load_skills_from_dir) correctly accesses skill.trigger without a None check, which is the consistent pattern. Remove the unnecessary guard here.

examples/02_remote_agent_server/07_convo_with_apptainer_sandboxed_server.py (4)

1-28: LGTM! Clean setup with proper secret handling.

The imports are appropriate, and the LLM initialization correctly uses SecretStr for API key handling. The assertion message is clear and helpful.


93-97: Event settling pattern is acceptable for this example.

The busy-wait loop with sleep(0.1) is straightforward and appropriate for a demonstration. The 2-second inactivity threshold provides a reasonable settling period.


107-109: LGTM! Proper cleanup with finally block.

The cleanup is correctly placed in a finally block to ensure the conversation is closed even if errors occur during execution.


74-79: The assertion is safe and guaranteed to pass.

ApptainerWorkspace inherits from RemoteWorkspace, and the Conversation factory explicitly checks isinstance(workspace, RemoteWorkspace) in its __new__ method to determine whether to return a RemoteConversation or LocalConversation. Since ApptainerWorkspace is a subclass of RemoteWorkspace, the factory will always return a RemoteConversation instance, making the assertion on line 79 reliable.

.github/workflows/check-documented-examples.yml (1)

11-11: LGTM - Logical exclusion for LLM-specific tool examples.

Excluding examples/04_llm_specific_tools/** from documentation checks is consistent with the existing exclusion for examples/03_github_workflows/**. This appropriately scopes the workflow to focus on core examples that require documentation.

.github/workflows/todo-management.yml (1)

133-133: Verify ALLHANDS_BOT_GITHUB_PAT has necessary permissions and consider workflow triggering implications.

Using a PAT instead of the default GITHUB_TOKEN for checkout (line 133) and operations (line 173) has two implications:

  1. Permissions: Ensure the PAT has repo scope (or public_repo + write:issues/pull_requests for public repos) to push branches and create PRs.

  2. Workflow triggering: When using a PAT, PRs created by this workflow will trigger other workflows (e.g., CI checks). This differs from the default GITHUB_TOKEN, which prevents recursive workflow triggers. Verify this is intentional and won't cause workflow loops.

Based on learnings from GitHub Actions best practices, using a PAT to enable downstream workflow triggers should be carefully considered.

Also applies to: 173-173

.github/workflows/run-examples.yml (1)

54-57: Apptainer setup is justified—specific example uses containerization.

The example examples/02_remote_agent_server/07_convo_with_apptainer_sandboxed_server.py explicitly requires Apptainer, importing ApptainerWorkspace to manage sandboxed execution for the remote agent server. The workflow setup correctly includes Apptainer to support this example.

If a concern remains, consider whether version 1.3.6 is the appropriate minimum required version or if a more recent stable version should be used.

Likely an incorrect or invalid review comment.

.github/workflows/assign-reviews.yml (1)

191-191: Ensure ALLHANDS_BOT_GITHUB_PAT has required scopes for GitHub API operations.

The prompt performs multiple GitHub API operations including reading timelines, managing labels, posting comments, requesting reviews, and checking collaborator permissions. Since this is a public repository, the token should have at minimum: public_repo, write:issues, and write:pull_requests scopes. Verify the PAT has these permissions before deployment.

.github/workflows/pr-review-by-openhands.yml (1)

101-101: Verify that ALLHANDS_BOT_GITHUB_PAT is configured with minimal scopes.

The PAT used at line 101 is only needed for posting PR review comments (via gh pr review), which requires pull-requests: write scope on the target repository. Ensure the PAT is configured with:

  1. Minimal scopes: Only pull-requests: write and contents: read for this specific use case (not broader repo access, admin, or workflow scopes)
  2. Repository restrictions: If supported, limit the PAT to the intended repository only

The token handling is secure—it's passed only to the GitHub CLI subprocess and is not exposed to PR code due to persist-credentials: false.

openhands-tools/openhands/tools/browser_use/impl.py (3)

217-217: LGTM!

The storage action imports are correctly placed and follow the same pattern as other browser action imports.

Also applies to: 223-223


239-242: LGTM!

The action dispatcher correctly routes storage actions to their corresponding methods, following the established pattern for other browser actions.


343-351: LGTM!

The storage methods correctly follow the established pattern: ensuring initialization before delegating to the underlying server implementation.

openhands-tools/openhands/tools/browser_use/__init__.py (1)

12-13: LGTM!

The storage action and tool exports are correctly added to the public API, maintaining consistency with the existing module structure.

Also applies to: 23-24, 45-46, 58-59

openhands-tools/openhands/tools/browser_use/server.py (1)

16-31: LGTM!

The _get_storage method correctly retrieves browser storage state and handles errors gracefully. The approach of returning error messages as strings is consistent with other browser methods in this codebase.

openhands-tools/openhands/tools/browser_use/definition.py (3)

3-6: LGTM!

The new imports support the screenshot saving functionality and are appropriately scoped.


110-119: LGTM!

The screenshot saving logic appropriately saves to disk when a save directory is configured, while preserving the existing inline image functionality.


722-723: LGTM!

The storage tools are correctly registered in the BrowserToolSet, making them available alongside other browser tools.

examples/04_llm_specific_tools/02_gemini_file_tools.py (4)

1-14: LGTM! Clear documentation of the Gemini tools pattern.

The module docstring effectively explains the purpose and highlights the key difference—replacing Tool(name=FileEditorTool.name) with *GEMINI_FILE_TOOLS.


16-25: LGTM! Imports and logging setup are appropriate.

The imports are correctly structured, and creating a dedicated log directory for this example is good practice for tracing.


35-41: LGTM! Agent configuration correctly demonstrates the Gemini tools pattern.

The use of *GEMINI_FILE_TOOLS with the spread operator is the correct way to include multiple Gemini-style file editing tools in place of the single FileEditorTool.


53-55: LGTM! Cost reporting is a helpful addition.

Displaying the accumulated cost helps users understand the expense of running this example.

openhands-tools/openhands/tools/preset/__init__.py (1)

23-23: LGTM! Export pattern is consistent.

The GPT-5 agent export follows the same pattern as other presets (gemini, planning, default).

Also applies to: 31-31

examples/04_llm_specific_tools/01_gpt5_apply_patch_preset.py (1)

22-26: API key handling is appropriate.

The fallback from LLM_API_KEY to OPENAI_API_KEY provides flexibility, and the error message clearly directs users.

openhands-tools/openhands/tools/preset/gpt5.py (2)

59-61: Verify condenser configuration values.

The condenser uses max_size=80 and keep_first=4, which differ significantly from the defaults (max_size=240, keep_first=2). This more aggressive condensation (80 vs 240 events) may impact context retention for GPT-5 workloads. Please confirm these values are intentional and appropriate for the GPT-5 use case.


64-75: Agent construction looks solid.

The agent properly configures tools, condenser, and system prompt with cli_mode awareness. The browser tool conditional logic based on cli_mode is appropriate.

.github/workflows/auto-label-issues.yml (1)

18-18: LGTM!

Passing the PAT to github-script enables authenticated API calls for cross-repo or elevated permissions. The token is properly sourced from secrets.

openhands-sdk/openhands/sdk/persistence/__init__.py (1)

1-23: LGTM!

Clean re-export pattern that provides a stable public API surface while allowing internal module reorganization. The __all__ declaration correctly matches the imports.

openhands-sdk/openhands/sdk/tool/registry.py (1)

176-184: LGTM!

The get_tool_module_qualnames() function correctly returns a copy of the mapping under the lock, preventing callers from mutating internal state and ensuring thread safety.

openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py (1)

492-509: LGTM!

The local import avoids circular dependencies, and including tool_module_qualnames in the payload enables dynamic tool registration on the server side. The debug log helps with troubleshooting registration issues.

openhands-sdk/openhands/sdk/context/view.py (1)

117-268: Well-designed tool loop detection algorithm.

The two-phase approach (detect tool loops → merge ranges) correctly preserves atomicity for Claude API requirements. The docstring accurately describes the behavior, and the boundary computation handles the interleaving of thinking blocks with subsequent batches properly.

.github/workflows/pypi-release.yml (1)

65-79: LGTM on the new version-bump automation.

The create-version-bump-prs job properly chains after publish, extracts the version from the previous job's outputs, and creates coordinated PRs across dependent repositories. The conditional trigger (if: github.event_name == 'release') ensures manual dispatches don't create bump PRs.

Note: The blacksmith-4vcpu-ubuntu-2404 runner label is flagged by actionlint as unknown, but this is expected for custom/self-hosted runners configured in your infrastructure.

.github/workflows/README-RELEASE.md (1)

58-74: LGTM! Clear documentation of the automated version bump workflow.

The documentation clearly explains the automated PR creation process for downstream repositories after PyPI publication. This will help maintainers understand what to expect and what actions they need to take.

openhands-sdk/openhands/sdk/agent/prompts/model_specific/openai_gpt/gpt-5-codex.j2 (1)

1-2: LGTM! Simplified prompt removes rigid preamble constraints.

The removal of the 8-12 word preamble constraint and friendly/curious tone requirements streamlines the prompt and allows for more flexible tool usage patterns.

examples/02_remote_agent_server/05_custom_tool/Dockerfile (1)

1-17: LGTM! Clean Dockerfile for custom tool integration.

The Dockerfile correctly sets up the environment for custom tools by:

  • Using a suitable base image with Python 3.12 and Node.js 22
  • Copying custom tools into a well-defined location
  • Extending PYTHONPATH to make the tools importable
openhands-agent-server/openhands/agent_server/models.py (1)

101-108: LGTM! Clean API extension for dynamic tool registration.

The new tool_module_qualnames field:

  • Is backward compatible (optional with default empty dict)
  • Has a clear, descriptive docstring explaining its purpose
  • Follows the existing field pattern in the model
  • Enables the server to dynamically import and register client-provided tools
examples/02_remote_agent_server/05_custom_tool/build_custom_image.sh (1)

1-45: LGTM! Well-structured build script with good UX.

The script demonstrates good practices:

  • Error handling with set -e
  • Clear user feedback with emoji-prefixed messages
  • Sensible defaults (TAG parameter with fallback)
  • Helpful usage instructions after successful build
  • Proper script directory detection
openhands-agent-server/openhands/agent_server/conversation_service.py (1)

211-236: LGTM! Clean dynamic tool registration on conversation start.

The implementation correctly:

  • Guards execution with if request.tool_module_qualnames
  • Uses local import of importlib to avoid overhead
  • Handles ImportError gracefully without failing the conversation
  • Provides appropriate debug/warning/info logging
  • Allows the agent to fail gracefully if unregistered tools are used
examples/02_remote_agent_server/05_custom_tool/custom_tool_example.py (4)

1-68: LGTM! Well-documented example with comprehensive setup.

The example provides:

  • Detailed docstring explaining the workflow and prerequisites
  • Clear use case descriptions
  • Proper environment variable handling
  • Cross-platform support via detect_platform()

70-102: LGTM! Image building logic handles both existing and new images.

The code correctly:

  • Checks if the custom base image exists before building
  • Uses the build script with proper error handling
  • Provides helpful feedback to users
  • Exits cleanly on build failure

The static analysis warnings (S603, S607) are acceptable here since this is example code with controlled, local paths.


140-191: LGTM! Clean conversation execution with event collection.

The implementation demonstrates:

  • Proper callback registration for event tracking
  • Workspace command validation before conversation
  • Appropriate wait logic for event settling
  • Good user feedback throughout the process

192-251: LGTM! Proper data retrieval with good error handling.

The code correctly:

  • Uses the workspace file download API
  • Handles JSON parsing errors gracefully
  • Displays structured log data clearly
  • Cleans up temporary files
  • Ensures conversation closure in finally block

The static analysis warning (S108) about /tmp/agent_data.json is a false positive - this path refers to a location inside the workspace container, not on the host system.

openhands-sdk/openhands/sdk/conversation/state.py (4)

6-6: LGTM! Proper use of TYPE_CHECKING for circular import avoidance.

The lazy import pattern with TYPE_CHECKING correctly:

  • Provides type hints without runtime import overhead
  • Avoids circular dependency issues
  • Imports LLMRegistry at runtime only when needed (line 217)

Also applies to: 23-27


169-176: LGTM! Inline mode context properly propagated during serialization.

The modification correctly:

  • Retrieves the inline mode setting via should_inline_conversations()
  • Passes it through the context dict with INLINE_CONTEXT_KEY
  • Enables LLM serialization to decide whether to inline credentials or use profile references

188-188: LGTM! Clean API extension for profile reference expansion.

The llm_registry parameter:

  • Is optional (allows None)
  • Has clear documentation explaining its purpose
  • Supports the profile-reference persistence mode
  • Follows the established parameter pattern

Also applies to: 195-197


210-225: LGTM! Context-based profile expansion correctly implemented.

The implementation properly addresses the past review feedback by:

  • Building a context dict with both INLINE_CONTEXT_KEY and llm_registry
  • Passing the context to model_validate, which propagates to nested LLM fields
  • Relying on Pydantic's validator (_coerce_inputs) to handle profile expansion
  • Ensuring profile references work in non-inline mode with the registry

This approach centralizes the profile expansion logic in the LLM model's validator as intended.

Based on learnings, the context propagation approach was confirmed to work correctly after adding the llm_registry to the context dict.

openhands-sdk/openhands/sdk/context/agent_context.py (1)

158-191: Model-aware skill gating logic looks correct.

The filtering logic properly gates vendor-specific skills (claude, gemini) based on model family detection. The case-insensitive comparison via .lower() is appropriate.

One minor consideration: if other vendor-specific skills are added in the future, this filtering block will need extension. Consider documenting this behavior or extracting the vendor-to-family mapping to a constant for maintainability.

openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py (3)

109-111: Good: Early LLMRegistry initialization for profile resolution.

Initializing the registry early ensures profile references can be resolved during conversation resume, which aligns with the PR objective.


386-401: Max iteration guard implementation looks correct.

The guard properly:

  • Sets execution status to ERROR
  • Emits a ConversationErrorEvent with descriptive code and detail
  • Breaks the run loop

The >= comparison is correct since iteration increments after each step (line 371), so with max_iteration_per_run=500, the 500th iteration completes before the check triggers on iteration 500.


214-217: Registry subscription and LLM population is well-structured.

Subscribing before adding ensures all LLMs trigger stats registration, and iterating over list(self.agent.get_all_llms()) creates a defensive copy.

examples/01_standalone_sdk/31_llm_profiles.py (1)

66-125: Example implementation is well-structured.

The main function demonstrates a complete workflow:

  1. Profile creation/loading via registry
  2. Agent creation with loaded LLM
  3. Conversation execution with persistence
  4. Verification that profile_id is correctly serialized in base_state.json

This effectively validates the PR's profile reference persistence behavior.

openhands-sdk/openhands/sdk/llm/llm.py (2)

357-380: Conditional serialization for profile references is well-designed.

The _serialize_with_profiles serializer correctly:

  • Defaults to inlining full LLM data (backward compatible)
  • Respects explicit opt-out via INLINE_CONTEXT_KEY in context
  • Emits minimal {"profile_id": ...} when not inlining

This achieves the PR objective of "expand profile references without mutating LLM instances."


392-423: Profile expansion in _coerce_inputs handles edge cases appropriately.

The validation logic:

  1. Detects profile-only payloads (profile_id without model)
  2. Raises a clear error when inlining is enabled but profile ref is encountered
  3. Requires llm_registry in context for expansion
  4. Merges expanded profile data while preserving profile_id

The error message references OPENHANDS_INLINE_CONVERSATIONS consistently with the env var name defined in persistence/settings.py and used by should_inline_conversations().

openhands-sdk/openhands/sdk/llm/llm_registry.py (4)

106-115: Path traversal protection is comprehensive.

The _ensure_safe_profile_id method properly guards against:

  • Empty or special directory names (., ..)
  • Path separators (/, \) via Path(profile_id).name != profile_id
  • Invalid characters via regex whitelist

This addresses the security concern from a previous review.


139-170: Profile saving implementation handles secrets correctly.

The save_profile method:

  • Uses context={"expose_secrets": include_secrets} to control secret exposure during serialization
  • Applies restrictive 0o600 permissions on new files (best-effort for non-POSIX)
  • Properly sanitizes profile_id before path construction

This addresses the previous review concern about secret handling.


196-213: Immutable profile loading pattern is correct.

_load_profile_with_synced_id creates a new LLM instance via model_copy when the stored profile_id differs from the filename. This respects SDK immutability conventions while ensuring the active profile is accurately reflected.


163-168: Best-effort chmod is appropriate for cross-platform support.

The bare except Exception at line 167 is intentional for handling non-POSIX systems where chmod may not work. The debug logging provides visibility without failing the operation.

@enyst
Copy link
Owner Author

enyst commented Dec 30, 2025

Reviewed outstanding AI threads:\n- Clarified docs-check skip rationale for LLM-specific examples (removed brittle PR-number reference).\n- Other remaining threads refer to code that no longer exists after branch updates/force-pushes; resolving as obsolete.

enyst and others added 16 commits January 1, 2026 07:21
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Resolve conflicts keeping both LLM profile registry support (llm_registry)
and cipher encryption support for secrets.

- local_conversation.py: Pass both llm_registry and cipher to ConversationState.create()
- state.py: Accept both llm_registry and cipher parameters, add cipher to context
  for resume path, preserve pre-created event_log from tool verification
- test_state_serialization.py: Keep _DifferentAgentForVerifyTest class from main,
  preserve fixture parameters in test function, keep both verify test variants

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@examples/01_standalone_sdk/34_llm_profiles.py`:
- Around line 1-8: This example file conflicts with another example numbered 34;
rename the example file (e.g., change the filename from 34_llm_profiles.py to a
unique number like 35_llm_profiles.py) and update any references to it (example
listings, README, docs, import paths or test/CI manifests that mention
"34_llm_profiles.py" or the example number) so all references use the new
filename; ensure any environment variables or examples that refer to the old
number are updated accordingly and run the examples collection check to confirm
the duplicate is resolved.

In `@openhands-sdk/openhands/sdk/conversation/state.py`:
- Around line 257-270: The current call to registry.ensure_default_profile(...)
always persists a default profile; change it to only run when persistence is
enabled on the registry: check a persistence flag (e.g.
registry.is_persistence_enabled() or registry.persistence_dir is not None)
before calling ensure_default_profile, and otherwise leave agent.llm unchanged
when building the new agent via agent.model_copy(update={...}); reference
ensure_default_profile, llm_registry/registry, agent.model_copy, and agent.llm
to locate and gate the behavior.

In `@openhands-sdk/openhands/sdk/llm/llm.py`:
- Around line 374-390: The _coerce_inputs routine expands profile references by
calling llm.model_dump(exclude_none=True) which currently causes secrets to be
redacted; update the expansion to pass the Pydantic context to expose secrets
(e.g., call model_dump with context={"expose_secrets": True} or by merging
info.context if present) so secret fields are serialized with real values when
registry.load_profile(profile_id) is used; make the change inside _coerce_inputs
where registry.load_profile/profile_id are handled and ensure the context key
"expose_secrets" is set to True when calling model_dump.

In `@tests/sdk/conversation/local/test_state_serialization.py`:
- Around line 174-185: The test compares serialized state stats using the wrong
key name: it reads service_to_metrics but the schema uses usage_to_metrics,
causing the loop to skip and metrics not validated; update the accesses in the
block that examines loaded_stats and original_stats (variables loaded_dump,
original_dump, loaded_stats, original_stats) to use "usage_to_metrics"
consistently (both when retrieving the dict and iterating keys/values) and keep
the existing assertions that loaded_metrics contains each key and equals the
expected metric.

Comment on lines +1 to +8
"""Create and use an LLM profile with :class:`LLMRegistry`.

Run with::

uv run python examples/01_standalone_sdk/34_llm_profiles.py

Profiles are stored under ``~/.openhands/llm-profiles/<name>.json`` by default.
Set ``LLM_PROFILE_NAME`` to pick a profile.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix duplicate example number to unblock CI.
The workflow reports example number 34 conflicts with 34_critic_example.py. Please renumber this example (and update any references) so the check passes.

🧰 Tools
🪛 GitHub Actions: Check duplicate example numbers

[error] 1-1: Duplicate example number 34 detected in folder examples/01_standalone_sdk. This file and 34_critic_example.py share the same number. Please renumber to ensure uniqueness.

🤖 Prompt for AI Agents
In `@examples/01_standalone_sdk/34_llm_profiles.py` around lines 1 - 8, This
example file conflicts with another example numbered 34; rename the example file
(e.g., change the filename from 34_llm_profiles.py to a unique number like
35_llm_profiles.py) and update any references to it (example listings, README,
docs, import paths or test/CI manifests that mention "34_llm_profiles.py" or the
example number) so all references use the new filename; ensure any environment
variables or examples that refer to the old number are updated accordingly and
run the examples collection check to confirm the duplicate is resolved.

Comment on lines +257 to +270
context: dict[str, object] = {}
registry = llm_registry
if registry is None:
from openhands.sdk.llm.llm_registry import LLMRegistry

registry = LLMRegistry()
context["llm_registry"] = registry

# Ensure that any runtime-provided LLM without an explicit profile is
# persisted as a stable "default" profile, so conversation state can
# safely store only a profile reference.
agent = agent.model_copy(
update={"llm": registry.ensure_default_profile(agent.llm)}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid writing default profiles when persistence is disabled.
ensure_default_profile() persists to disk even when persistence_dir is None, which defeats the “no persistence” intent and can leak secrets to ~/.openhands/llm-profiles. Gate this behavior on persistence being enabled.

🛠️ Proposed fix
-        # Ensure that any runtime-provided LLM without an explicit profile is
-        # persisted as a stable "default" profile, so conversation state can
-        # safely store only a profile reference.
-        agent = agent.model_copy(
-            update={"llm": registry.ensure_default_profile(agent.llm)}
-        )
+        # Only persist a default profile when persistence is enabled.
+        if persistence_dir is not None:
+            agent = agent.model_copy(
+                update={"llm": registry.ensure_default_profile(agent.llm)}
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
context: dict[str, object] = {}
registry = llm_registry
if registry is None:
from openhands.sdk.llm.llm_registry import LLMRegistry
registry = LLMRegistry()
context["llm_registry"] = registry
# Ensure that any runtime-provided LLM without an explicit profile is
# persisted as a stable "default" profile, so conversation state can
# safely store only a profile reference.
agent = agent.model_copy(
update={"llm": registry.ensure_default_profile(agent.llm)}
)
context: dict[str, object] = {}
registry = llm_registry
if registry is None:
from openhands.sdk.llm.llm_registry import LLMRegistry
registry = LLMRegistry()
context["llm_registry"] = registry
# Only persist a default profile when persistence is enabled.
if persistence_dir is not None:
agent = agent.model_copy(
update={"llm": registry.ensure_default_profile(agent.llm)}
)
🤖 Prompt for AI Agents
In `@openhands-sdk/openhands/sdk/conversation/state.py` around lines 257 - 270,
The current call to registry.ensure_default_profile(...) always persists a
default profile; change it to only run when persistence is enabled on the
registry: check a persistence flag (e.g. registry.is_persistence_enabled() or
registry.persistence_dir is not None) before calling ensure_default_profile, and
otherwise leave agent.llm unchanged when building the new agent via
agent.model_copy(update={...}); reference ensure_default_profile,
llm_registry/registry, agent.model_copy, and agent.llm to locate and gate the
behavior.

Comment on lines +374 to +390
def _coerce_inputs(cls, data: Any, info: ValidationInfo):
if not isinstance(data, Mapping):
return data
d = dict(data)

profile_id = d.get("profile_id")
if profile_id and "model" not in d:
if info.context is None or "llm_registry" not in info.context:
raise ValueError(
"LLM registry required in context to load profile references."
)

registry = info.context["llm_registry"]
llm = registry.load_profile(profile_id)
expanded = llm.model_dump(exclude_none=True)
expanded["profile_id"] = profile_id
d.update(expanded)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's understand the secret serialization implementation
cat openhands-sdk/openhands/sdk/utils/pydantic_secrets.py | head -100

Repository: enyst/agent-sdk

Length of output: 2244


🏁 Script executed:

#!/bin/bash
# Check the Pydantic version and imports in the project
fd "pyproject.toml|requirements.txt|setup.py" --max-depth 3 | head -5

Repository: enyst/agent-sdk

Length of output: 206


🏁 Script executed:

#!/bin/bash
# Look at the LLM class definition and model_dump context usage
rg -n "class.*LLM|model_dump" openhands-sdk/openhands/sdk/llm/llm.py | head -20

Repository: enyst/agent-sdk

Length of output: 184


🏁 Script executed:

#!/bin/bash
# Check Pydantic version used
rg "pydantic" openhands-sdk/pyproject.toml

Repository: enyst/agent-sdk

Length of output: 82


🏁 Script executed:

#!/bin/bash
# Verify that LLM model uses the serialize_secret function
rg -n "serialize_secret|field_serializer|SecretStr" openhands-sdk/openhands/sdk/llm/llm.py | head -20

Repository: enyst/agent-sdk

Length of output: 1108


🏁 Script executed:

#!/bin/bash
# Check if model_dump is called with context parameter elsewhere in the codebase
rg -n "model_dump\(" openhands-sdk/openhands/sdk/ -A 2 | grep -E "context|expose_secrets" | head -20

Repository: enyst/agent-sdk

Length of output: 682


🏁 Script executed:

#!/bin/bash
# Look at the context around line 388 to see the full _coerce_inputs method
sed -n '374,395p' openhands-sdk/openhands/sdk/llm/llm.py

Repository: enyst/agent-sdk

Length of output: 872


Add expose_secrets context when expanding profile references to prevent credential loss.

When llm.model_dump(exclude_none=True) is called without the context={"expose_secrets": True} parameter, the serialize_secret function will redact secret fields (API keys, AWS credentials) instead of exposing their actual values. The redacted placeholders are then used to update the expanded profile, replacing real credentials with masked values. This breaks authenticated calls after profile restoration.

The serialize_secret implementation in pydantic_secrets.py explicitly supports this pattern:

if info.context and info.context.get("expose_secrets"):
    return v.get_secret_value()

This same pattern is already used elsewhere in the codebase (llm_registry.py, remote_conversation.py).

🛠️ Proposed fix
-            expanded = llm.model_dump(exclude_none=True)
+            expanded = llm.model_dump(
+                exclude_none=True,
+                context={"expose_secrets": True},
+            )
🧰 Tools
🪛 Ruff (0.14.14)

382-384: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@openhands-sdk/openhands/sdk/llm/llm.py` around lines 374 - 390, The
_coerce_inputs routine expands profile references by calling
llm.model_dump(exclude_none=True) which currently causes secrets to be redacted;
update the expansion to pass the Pydantic context to expose secrets (e.g., call
model_dump with context={"expose_secrets": True} or by merging info.context if
present) so secret fields are serialized with real values when
registry.load_profile(profile_id) is used; make the change inside _coerce_inputs
where registry.load_profile/profile_id are handled and ensure the context key
"expose_secrets" is set to True when calling model_dump.

Comment on lines +174 to +185
# Test model_dump equality ignoring any additional runtime stats
loaded_dump = loaded_state.model_dump(mode="json")
original_dump = state.model_dump(mode="json")
loaded_stats = loaded_dump.pop("stats", None)
original_stats = original_dump.pop("stats", None)
assert loaded_dump == original_dump
if original_stats is not None:
assert loaded_stats is not None
loaded_metrics = loaded_stats.get("service_to_metrics", {})
for key, metric in original_stats.get("service_to_metrics", {}).items():
assert key in loaded_metrics
assert loaded_metrics[key] == metric
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stats key mismatch risks a no‑op comparison.

This block uses service_to_metrics, but later tests assert usage_to_metrics. If the real key is usage_to_metrics, this loop never runs and won’t validate metrics preservation. Align keys to the stats schema.

🔧 Proposed fix (align to usage_to_metrics)
-        loaded_metrics = loaded_stats.get("service_to_metrics", {})
-        for key, metric in original_stats.get("service_to_metrics", {}).items():
+        loaded_metrics = loaded_stats.get("usage_to_metrics", {})
+        for key, metric in original_stats.get("usage_to_metrics", {}).items():
             assert key in loaded_metrics
             assert loaded_metrics[key] == metric
🤖 Prompt for AI Agents
In `@tests/sdk/conversation/local/test_state_serialization.py` around lines 174 -
185, The test compares serialized state stats using the wrong key name: it reads
service_to_metrics but the schema uses usage_to_metrics, causing the loop to
skip and metrics not validated; update the accesses in the block that examines
loaded_stats and original_stats (variables loaded_dump, original_dump,
loaded_stats, original_stats) to use "usage_to_metrics" consistently (both when
retrieving the dict and iterating keys/values) and keep the existing assertions
that loaded_metrics contains each key and equals the expected metric.

- Remove empty persistence/ package (was scaffolding, no longer needed)
- Fix docstring whitespace in local_conversation.py
- Add debug logging to LLMRegistry.load_profile()

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@openhands-sdk/openhands/sdk/llm/llm_registry.py`:
- Around line 166-173: The chmod call is currently catching broad Exception
which can hide bugs; narrow it to OS-related errors by catching OSError (or
optionally PermissionError) around path.chmod in the same block that checks
existed_before, and update the logger call (logger.debug or logger.warning) to
include the error details; locate the code around path.chmod and the
existed_before flag inside the function in llm_registry.py and replace the broad
except Exception with except OSError as e (or except (OSError, PermissionError)
as e) to ensure only filesystem-related failures are handled.

Comment on lines +166 to +173
with path.open("w", encoding="utf-8") as handle:
json.dump(data, handle, indent=2, ensure_ascii=False)
# Apply restrictive permissions when creating a new file
if not existed_before:
try:
path.chmod(0o600)
except Exception as e: # best-effort on non-POSIX systems
logger.debug(f"Unable to chmod profile file {path}: {e}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Narrow exception handling around chmod.
Catching Exception can mask unrelated bugs; limit to OS-related errors (and optionally warn).

🔧 Proposed fix
-        if not existed_before:
-            try:
-                path.chmod(0o600)
-            except Exception as e:  # best-effort on non-POSIX systems
-                logger.debug(f"Unable to chmod profile file {path}: {e}")
+        if not existed_before:
+            try:
+                path.chmod(0o600)
+            except (OSError, NotImplementedError) as e:  # best-effort on non-POSIX systems
+                logger.warning(f"Unable to chmod profile file {path}: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with path.open("w", encoding="utf-8") as handle:
json.dump(data, handle, indent=2, ensure_ascii=False)
# Apply restrictive permissions when creating a new file
if not existed_before:
try:
path.chmod(0o600)
except Exception as e: # best-effort on non-POSIX systems
logger.debug(f"Unable to chmod profile file {path}: {e}")
with path.open("w", encoding="utf-8") as handle:
json.dump(data, handle, indent=2, ensure_ascii=False)
# Apply restrictive permissions when creating a new file
if not existed_before:
try:
path.chmod(0o600)
except (OSError, NotImplementedError) as e: # best-effort on non-POSIX systems
logger.warning(f"Unable to chmod profile file {path}: {e}")
🧰 Tools
🪛 Ruff (0.14.14)

172-172: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In `@openhands-sdk/openhands/sdk/llm/llm_registry.py` around lines 166 - 173, The
chmod call is currently catching broad Exception which can hide bugs; narrow it
to OS-related errors by catching OSError (or optionally PermissionError) around
path.chmod in the same block that checks existed_before, and update the logger
call (logger.debug or logger.warning) to include the error details; locate the
code around path.chmod and the existed_before flag inside the function in
llm_registry.py and replace the broad except Exception with except OSError as e
(or except (OSError, PermissionError) as e) to ensure only filesystem-related
failures are handled.

@openhands-ai
Copy link

openhands-ai bot commented Jan 29, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Check duplicate example numbers
    • [Optional] Docs example

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #16 at branch `agent-sdk-18-profile-manager`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants