-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
Thanks for setting up this repository! I would like to add our paper, FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games, to this list.
It best fits the Video Adventure Games & Benchmark sections.
🎓 Venue: EMNLP 2025 Main Conference
🔗 Website: https://ahnjaewoo.github.io/flashadventure/
📄 Paper: https://arxiv.org/abs/2509.01052
🌎 Code: https://github.com/ahnjaewoo/FlashAdventure
TL;DR: Key Findings
- Current GUI agents struggle with full story arc completion (best: 5.88% success rate).
- COAST improves goal / milestone completion by 5.88 / 2.78 percentage points over the baseline.
- Still, significant gap remains between GUI agents and human performance (97.06% vs 5.88%).
- Agents exhibit weak planning, poor visual perception, and deficient lateral thinking.
Thanks!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels