Skip to content

A repo for survey paper "The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning" and a collection of AWESOME papers focused on using LLMs, VLMs for improving RL.

License

Notifications You must be signed in to change notification settings

Masoudjafaripour/FM_RL_Survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Awesome LLM- and VLM-Integrated Reinforcement Learning Awesome

This repository accompanies our survey paper:

The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning [Sheila Schoepp, Masoud Jafaripour*, Yingyue Cao*, Tianpei Yang, Fatemeh Abdollahi, Shadan Golestan, Zahin Sufiyan, Osmar R. Zaiane, Matthew E. Taylor] (*equal contribution) University of Alberta, Nanjing University, Alberta Machine Intelligence Institute (Amii) 📄 [arXiv Paper]

📌 This work provides a systematic taxonomy and analysis of how large language models (LLMs) and vision-language models (VLMs) enhance reinforcement learning (RL) tasks.


✨ Overview

We categorize the integration of LLMs/VLMs into RL into three core roles:

  • LLM/VLM as Agent: the model acts as a parametric or non-parametric decision-maker.
  • LLM/VLM as Planner: the model generates comprehensive or incremental plans.
  • LLM/VLM as Reward: the model defines or generates a reward function or model.

We also discuss:

  • Challenges: sample inefficiency, reward engineering, poor generalization, etc.
  • Future Directions: grounding, bias mitigation, action advice, multimodal representation.

🚀 Paper Highlights

  • Unified taxonomy: Three key roles in FM-enhanced RL: Agent, Planner, Reward
  • Comprehensive benchmarks: Over 40+ recent methods categorized and compared
  • Multimodal perspective: Includes both LLM and VLM applications across domains
  • Open challenges: In-depth discussion of limitations and paths forward

🧠 Table of Contents


🔍 Datasets

Dataset Year Domain Modality Link
MineDojo 2022 Minecraft Vision+Text GitHub
SayCan 2022 Robotics Language+Action GitHub
RL4VLM 2024 Multi-task RL Vision+Language GitHub

(See paper for full list.)


🤖 LLM/VLM as Agent

Parametric LLM/VLM Agents

Parametric (fine-tuned): AGILE, TWOSOME, POAD, Retroformer, Zhai et al.

Method Model(s) FT RL Role Metrics Code
AGILE [Feng et al., 2024] Meerkat, Vicuna-1.5 ✓* $\pi_l, v_l, \text{rft}$ acc, rew Link
Retroformer [Yao et al., 2024] GPT-3, GPT-4, LongChat $\pi_l, \text{rft}$ sr, se -
TWOSOME [Tan et al., 2024] Llama ✓* $\pi_l, v_l, \text{rft}$ sr, rew, gen, se Link
POAD [Wen et al., 2024] CodeLlama, Llama 2 ✓* $\pi_l, v_l, \text{rft}$ rew, gen, se Link
GLAM [Carta et al., 2023] FLAN-T5 $\pi_l, v_l, \text{rft}$ se, gen Link
Zhai et al. [Zhai et al., 2024] LLaVA-v1.6-Mistral ✓* $\pi_l, v_l, \text{rft}$ sr Link

Non-Parametric LLM/VLM Agents

Non-parametric: Reflexion, ExpeL, ICPI, RLingua, REMEMBERER

Method Model(s) FT RL Role Metrics Code
ICPI [Brooks et al., 2023] Codex × $\pi_l, v_l, \tau_\pi$ rew, gen Link
Reflexion [Shinn et al., 2023] GPT-3, GPT-3.5-Turbo, GPT-4 × $\pi_l, \tau_\pi$ sr, acc Link
REMEMBERER [Zhang et al., 2023a] GPT-3.5 × $\pi_l, v_l, \tau_\pi$ sr, rob Link
ExpeL [Zhao et al., 2024] GPT-3.5-Turbo, GPT-4 × $\pi_l, \tau_\pi$ sr, gen Link
RLingua [Chen et al., 2024] GPT-4 × $\pi_g, v_g, \tau_\pi$ sr, se -
Xu et al. [Xu et al., 2024] GPT-3.5-Turbo × $\pi_l, v_l$ sr, rob -
LangGround [Li et al., 2024] GPT-4 × $\pi_l$ sr, gen, se, int Link

🧭 LLM/VLM as Planner

Comprehensive Planning Approaches

Comprehensive planners: SayTap, PSL, LMA3, Inner Monologue

Method Model(s) FT RL Role Metrics Code
SayTap [Tang et al., 2023] GPT-4 × $\pi_g, v_g$ sr, acc -
LgTS [Shukla et al., 2024] Llama 2 × $\pi_g, v_g$ sr, se -
PSL [Dalal et al., 2024] GPT-4 × $\pi, v$ sr, gen, se Link
LLaRP [Szot et al., 2024] Llama × $\pi_g, v_g$ sr, gen, rob, se Link
LMA3 [Colas et al., 2023] GPT-3.5-Turbo × $\pi_g$ gen, exp -
When2Ask [Hu et al., 2024] Vicuna × $\pi, v$ sr -
Inner Monologue [Huang et al., 2022] GPT-3, PaLM × $\pi, v$ sr, rob, al -

Incremental Planning Approaches

Incremental planners: SayCan, BOSS, AdaRefiner, LLM4Teach

Method Model(s) FT RL Role Metrics Code
SayCan [Ichter et al., 2022] PaLM × $\pi_l, v_l, \text{rft}$ sr, rob Link
LLM4Teach [Zhou et al., 2024] ChatGLM-Turbo, Vicuna × $\pi, v$ sr, se Link
AdaRefiner [Zhang and Lu, 2024] Llama 2, GPT-4 ✓* $\pi_l, v_l, \tau_\pi$ sr, rew, gen, exp Link
BOSS [Zhang et al., 2023b] Llama × $\pi_l, v_l, \text{rft}, \tau_\pi$ sr, gen, rob, se -
Text2Motion [Lin et al., 2023] Codex, GPT-3.5 × $\pi, v$ sr, gen, int -

🎯 LLM/VLM as Reward

Reward Function Approaches

Reward Function: Text2Reward, Eureka, Zeng et al.

Method Model(s) FT RL Role Metrics Code
Text2Reward [Xie et al., 2024] GPT-4 × $\pi, \text{ref}$ sr, se, al Link
Zeng et al. [2024] GPT-4 × $\pi, \tau_\pi$ sr, se -
Eureka [Ma et al., 2024] GPT-4 × $\pi, v, \tau_\pi$ sr, gen, se, al Link

Reward Model Approaches

Reward Model: VLM-RM, MineCLIP, RL-VLM-F

Method Model(s) FT RL Role Metrics Code
Kwon et al. [2023] GPT-3 × $\pi, v$ acc, se, al -
PREDILECT [Holk et al., 2024] GPT-4 × $\pi$ rew, se, al -
ELLM [Du et al., 2023] Codex, GPT-3 × $\pi_l, v_l$ sr, gen, se, exp -
RL-VLM-F [Wang et al., 2024] Gemini-Pro, GPT-4V × $\pi, v$ sr, rew, se Link
VLM-RM [Rocamonde et al., 2024] CLIP × $\pi, v$ sr, al Link
MineCLIP [Fan et al., 2022] CLIP ✓* $\pi_l, v_l$ sr, gen, se, al Link

🧭 Future Research Directions

  • Grounding: Bridging high-level plans with low-level controllers
  • Bias Mitigation: Debiasing pretrained FMs for RL safety
  • Multimodal Representation: Richer integration of language, vision, and control
  • Action Advice: Using FMs as virtual oracles to guide agents

📚 Citation

If you find this work helpful, please consider citing our survey:

@article{schoepp2024llmrlsurvey,
  title={The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning},
  author={Schoepp, Sheila and Jafaripour, Masoud and Cao, Yingyue and Yang, Tianpei and Abdollahi, Fatemeh and Golestan, Shadan and Sufiyan, Zahin and Zaiane, Osmar R. and Taylor, Matthew E.},
  journal={arXiv preprint arXiv:2502.15214},
  year={2024}
}

🤝 Contributing

We welcome pull requests to add missing papers, implementations, or benchmarks!

How to contribute:

  1. Fork the repository
  2. Add your paper or code to the relevant section in README.md
  3. Use the format: | [Title](Paper Link) | Category | [Code](Code Link) |
  4. Open a pull request

🗓️ Last Update

2025/06/07

About

A repo for survey paper "The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning" and a collection of AWESOME papers focused on using LLMs, VLMs for improving RL.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published