Repository for the code related to my MSc thesis project. The thesis can be read in full, along with a summary, at this link.
The project explores the capabilities of RLMs playing textual games, and tentatively defines a game-agnostic framework for exploiting the advantages of reasoning and CoT prompting while avoiding some of the associated pitfalls and disadvantages.
Reasoning Language Models have remarkable problem-solving capabilities that bring them even closer to human performance compared to standard LLMs, albeit gaining two traits that are typical of human agents: an increased response time and a heightened risk of overthinking. We choose the text-based games of TextWorld as a comprehensive example of a complex task environment, and present two novel and related techniques that counteract high response time and overthinking: n-think and ephemerality.
N-think models employ reasoning only every
The ephemeral 1-think configuration exhibits the highest performance by drastically reducing overthinking, whereas at higher values of
We then implement two successful improvements to the
TBD