-
Notifications
You must be signed in to change notification settings - Fork 138
Open
Labels
llmAbout LLMs.About LLMs.
Description
Summary:
- Gemini 3 Flash Preview sometimes enters a long repetition loop during browser tasks (e.g., Microsoft login account picker), repeating phrases like "Wait, I'll try to click on index 3090" hundreds of times.
- Responses hit max_output_tokens (~65k), take ~5 minutes, and cost ~$0.20.
- Suspected to be Gemini API behavior; Claude (Haiku/Sonnet/Opus) with MCP Playwright does not show this.
Questions to investigate:
- Does the agent SDK have built-in repetition/loop detection or loop breakers that should catch this?
- Are there recommended frequency_penalty / repetition_penalty settings for Gemini (via LiteLLM) to mitigate this?
- Can streaming in the OpenHands UI be enabled to detect/abort long single-response loops?
- How often does this occur in CI/evals (see .github/workflows/integration-runner.yml) and can we make it reproducible?
Context:
- User uses OpenHands Agent-SDK via OpenHands UI with VNC browser for complex sites (e.g., Outlook).
- Issue appears "not rare" for them.
- Benchmark shows Gemini 3 Flash performing well, but reliability in complex UI is questioned.
References:
- Discussion mentions integration-runner workflow:
llm-config: - Benchmark: https://index.openhands.dev/home
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
llmAbout LLMs.About LLMs.