Don't Overthink It: Intermittent Self-Evaluation in Reasoning Language Models Playing Textual Games

Repository for the code related to my MSc thesis project. The thesis can be read in full, along with a summary, at this link.

The project explores the capabilities of RLMs playing textual games, and tentatively defines a game-agnostic framework for exploiting the advantages of reasoning and CoT prompting while avoiding some of the associated pitfalls and disadvantages.

Abstract

Reasoning Language Models have remarkable problem-solving capabilities that bring them even closer to human performance compared to standard LLMs, albeit gaining two traits that are typical of human agents: an increased response time and a heightened risk of overthinking. We choose the text-based games of TextWorld as a comprehensive example of a complex task environment, and present two novel and related techniques that counteract high response time and overthinking: n-think and ephemerality.

N-think models employ reasoning only every $n$ turns and, in that turn, they follow a self-evaluation prompt that increases context awareness, recall, and performance; in all other turns, reasoning is deactivated and thus inference time is minimized. Ephemeral $n$-think models instead do not retain their thought process in the context once the self-evaluation turn ends, but only their final response; in this way, the game content is not diluted by excessive thinking. These techniques curtail answer length and context length respectively, which are two critical components that slow down inference and carry an increased risk of overthinking.

The ephemeral 1-think configuration exhibits the highest performance by drastically reducing overthinking, whereas at higher values of $n$ the impact of ephemerality is either negative or negligible. Non-ephemeral $n$-think with low $n$ (e.g. 4) is also a promising configuration that noticeably reduces execution time with a small decrease in score.

We then implement two successful improvements to the $n$-think technique, namely random $n$-think and Chain-of-Thought-based self-evaluation; perform a qualitative analysis on the behaviors and patterns exhibited during self-evaluation turns with and without CoT; and finally identify future developments to the framework like ask-to-think, dynamic $n$-think, semi-ephemerality, or an application in real-time games.

Paper

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
Code		Code
Docker		Docker
Logs		Logs
Plots		Plots
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Don't Overthink It: Intermittent Self-Evaluation in Reasoning Language Models Playing Textual Games

Repository for the code related to my MSc thesis project. The thesis can be read in full, along with a summary, at this link.

Abstract

Paper

About

Uh oh!

Releases

Packages

Languages

MizuGreg/LLMs-Play-Textual-Games

Folders and files

Latest commit

History

Repository files navigation

Don't Overthink It: Intermittent Self-Evaluation in Reasoning Language Models Playing Textual Games

Repository for the code related to my MSc thesis project. The thesis can be read in full, along with a summary, at this link.

Abstract

Paper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages