Skip to content

Investigate Gemini 3 flash repetition/looping in browser tasks #1872

@enyst

Description

@enyst

Summary:

  • Gemini 3 Flash Preview sometimes enters a long repetition loop during browser tasks (e.g., Microsoft login account picker), repeating phrases like "Wait, I'll try to click on index 3090" hundreds of times.
  • Responses hit max_output_tokens (~65k), take ~5 minutes, and cost ~$0.20.
  • Suspected to be Gemini API behavior; Claude (Haiku/Sonnet/Opus) with MCP Playwright does not show this.

Questions to investigate:

  1. Does the agent SDK have built-in repetition/loop detection or loop breakers that should catch this?
  2. Are there recommended frequency_penalty / repetition_penalty settings for Gemini (via LiteLLM) to mitigate this?
  3. Can streaming in the OpenHands UI be enabled to detect/abort long single-response loops?
  4. How often does this occur in CI/evals (see .github/workflows/integration-runner.yml) and can we make it reproducible?

Context:

  • User uses OpenHands Agent-SDK via OpenHands UI with VNC browser for complex sites (e.g., Outlook).
  • Issue appears "not rare" for them.
  • Benchmark shows Gemini 3 Flash performing well, but reliability in complex UI is questioned.

References:

Metadata

Metadata

Assignees

No one assigned

    Labels

    llmAbout LLMs.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions