Feature(wrh): Adding Implict 3D feature injection to action head (evaluated on Libero Long Tasks) #20

ruiheng123 · 2025-11-09T09:03:05Z

🚀 Feature(wrh): Adding Implict 3D feature injection to action head (evaluated on Libero Long Tasks)

This PR implements my modification to VLA-adapter by injecting implict 3D feature to action head.

📚 Method:

Inspired by PointVLA and Evo-0, I utilize π3, a foundation model for 3D recounstruction (similar to VGGT) , I inject 3D hidden features to the action head to fuse action features and 3D features. Specifically, the fusing process is formulated as follows:

As mentioned in original paper, the learnable action tokens (input of action head, noted as $$\mathbf{h_{\text{AT}}}$$, we execute 3 times of Attentions (one for $$\text{SA}(\mathbf{h_{\text{AT}}})$$, one for $$\text{CA}(\mathbf{h_{\text{AT}}}, \mathbf{h_{\text{AQ}}})$$ and another for $$\text{CA}(\mathbf{h_{\text{AT}}}, \mathbf{h_{\text{vis}}})$$). Or notably:

$$\text{BridgeAttention}(\mathbf{h_{\text{AT}}}, \mathbf{h_{\text{AT}}} \odot \mathbf{h_{\text{AQ}}} \odot \mathbf{h_{\text{vis}}})$$

where $$\odot$$ means concatenation.

Remark: vis represents hidden state of VLM and AQ represents hidden state of Action Query.

In my implementation, I add the 4th cross attention, noted as $$\text{CA}(\mathbf{h_{\text{AT}}}, \mathbf{h_{\text{3D}}})$$, where $$\mathbf{h_{\text{3D}}}$$ is the hidden state of π3. Thus the bridge attention can be noted as

$$\text{BridgeAttention}(\mathbf{h_{\text{AT}}}, \mathbf{h_{\text{AT}}} \odot \mathbf{h_{\text{AQ}}} \odot \mathbf{h_{\text{vis}}} \odot \mathbf{h_{\text{3D}}})$$

I experimented on my 4090D GPUs, setting the injection layers (13 represents middle and 23 represents last) to explore whether π3 enhances the task execution and spatial understanding ability.

Remark: My loss (logged on wandb) on injecting start layer (1st layer) shows higher than middle and last, so the result of start layer is not shown here.

Or you can view this diagram: (the left is default, and the right is my modification. )

🧪 Experimental Setup

Device: 2× NVIDIA RTX 4090D (48GB) (Author's setting: 4 * H100)
Batch Size: 8
Accumulate Step: 2
Num of Action head layers: 24 (as default)
Different settings: Inject to layer 13 (Middle), Inject to layer 23 (last)
Evaluate benchmark: Libero Long
Comparing process: run same steps and evaluate same steps.

📊 Performance Results

The result is shown in the following image. The comparison with the author's checkpoint is also listed.

As a result, some of my result on task 9 (commonly difficult to achieve high success rate in Libero Long) shows some improvement, while other tasks like task 1&2 shows slight decrease.

Moreover, the overall result is shown in the following image.

The injection of 3D feat shows a bit improvement to some extent.

where to see

My added run.sh points out the location to set this 3D injection: Set use_3d as True and choose inject_layers, if you input 'all', that means inject 3D features to all action head layers.

  --use_3d True \ 
  --inject_layers all \

and in libero evaluation (eval.sh for libero and eval2.sh for calvin) , add these settings samely.

WangYH-BUPT · 2025-11-09T11:22:12Z

wow!!! Perfect!! I finished processing over a dozen meetings in the last two days and then worked on this! Thank you so much for your improvements to the adapter!

ruiheng123 added 6 commits October 9, 2025 09:49

New 3dpc branch

b5f53d7

feature(wrh): add 3d branch

26f7aca

feature(wrh): add pi3 injection --initial version

ff9a926

feature(wrh): add pi3 injection --initial version

37f7e4c

feature(wrh): add pi3 injection --initial version

af9ba82

feature(wrh): add pi3 injection --initial version

5619db5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature(wrh): Adding Implict 3D feature injection to action head (evaluated on Libero Long Tasks) #20

Feature(wrh): Adding Implict 3D feature injection to action head (evaluated on Libero Long Tasks) #20

Uh oh!

ruiheng123 commented Nov 9, 2025 •

edited

Loading

Uh oh!

WangYH-BUPT commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feature(wrh): Adding Implict 3D feature injection to action head (evaluated on Libero Long Tasks) #20

Are you sure you want to change the base?

Feature(wrh): Adding Implict 3D feature injection to action head (evaluated on Libero Long Tasks) #20

Uh oh!

Conversation

ruiheng123 commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!