VLMDriveTune 🚗🧠

Fine-tuning Vision-Language Models for Autonomous Driving Decision Planning

Overview

VLMDriveTune is an open-source project focused on fine-tuning Vision-Language Models (VLMs) for decision planning in autonomous driving scenarios. By leveraging the expressive power of pre-trained VLMs, this project adapts them to downstream driving tasks such as behavior prediction, maneuver classification, and goal-directed planning.

This repository provides tools, datasets, and training pipelines to adapt InterVL2-1B for real-world autonomous driving decision modules.

🔧 Features

🧠 VLM Fine-Tuning Pipeline: Modular pipeline to fine-tune VLM on driving-specific tasks.
📦 Dataset Integration: Supports structured scene data (e.g., nuPlan, Waymo, or custom vectorized environments).
🧾 Prompt Engineering for Driving Tasks: Custom vision-language prompts for planning-relevant tasks.
🧪 Evaluation Tools: Custom metrics for VLM output quality and scenario performance.

🧩 Use Cases

Planning-aware scene understanding
Maneuver prediction with vision-language reasoning
Goal-directed trajectory selection
Safety-critical decision refinement using natural language context

📁 Project Structure

VLMDriveTune/
│
├── README.md
├── setup.py
├── command
│   ├── InternVL2-1B.sh
│   └── eval.sh
├── config
│   └── zero_stage1_config.json
├── data
│   ├── sample.jsonl
│   └── vlm_samples
├── docs
│   ├── env_install.md
│   ├── vlm_finetune.md
│   └── vlm_trt.md
├── internvl
│   ├── __init__.py
│   ├── conversation.py
│   ├── dist_utils.py
│   ├── model
│   ├── patch
│   └── train
├── scripts
│   ├── internvl_eval.py
│   ├── pytorch_internvl_infer.py
│   └── trt_internvl_infer.py
├── tools
│   ├── __init__.py
│   ├── arrow2jsonl.py
│   ├── bart_score.py
│   ├── convert_parquet.py
│   ├── convert_to_int8.py
│   ├── extract_mlp.py
│   ├── extract_video_frames.py
│   ├── extract_vit.py
│   ├── json2jsonl.py
│   ├── jsonl2jsonl.py
│   ├── merge_lora.py
│   ├── replace_llm.py
│   └── resize_pos_embed.py
└── vlmtrt
    ├── build_vit_engine.py
    ├── conversation.py
    └── convert_qwen2_ckpt.py

🚀 Get Started

1. Clone the repo

git clone https://github.com/zf-account/VLMDriveTune.git
cd VLMDriveTune

2. Install dependencies

conda create -n vlmdrivetune python=3.10
conda activate vlmdrivetune
pip install -r requirements.txt

3. Prepare dataset

You can use scene data from nuPlan, Waymo, or a custom driving dataset. Please follow docs/data_prepare.md for instructions.

4. Run training

Please follow docs/vlm_finetune.md to finetune vlms.

📊 Evaluation

Please follow docs/vlm_finetune.md to evaluate your fine-tuned model on benchmark scenarios.

📦 VLM Converter Module

The VLM Converter is a performance-boosting module designed to convert and quantize large Vision-Language Models (VLMs) using TensorRT-LLM, significantly improving inference speed while maintaining accuracy.

Please follow docs/vlm_trt.md to convert and quantize vlms.

🧭 Acknowledgments

This project builds upon the work of:

And the broader open-source autonomous driving and VLM communities.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VLMDriveTune 🚗🧠

Overview

🔧 Features

🧩 Use Cases

📁 Project Structure

🚀 Get Started

1. Clone the repo

2. Install dependencies

3. Prepare dataset

4. Run training

📊 Evaluation

📦 VLM Converter Module

🧭 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
command		command
config		config
data		data
docs		docs
internvl		internvl
scripts		scripts
tools		tools
vlmtrt		vlmtrt
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

zf-account/VLMDriveTune

Folders and files

Latest commit

History

Repository files navigation

VLMDriveTune 🚗🧠

Overview

🔧 Features

🧩 Use Cases

📁 Project Structure

🚀 Get Started

1. Clone the repo

2. Install dependencies

3. Prepare dataset

4. Run training

📊 Evaluation

📦 VLM Converter Module

🧭 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages