[NPUW] Fix eagle3 with chunk prefill #33975

GuoliangShiIntel · 2026-02-05T02:44:52Z

Details:

Background:

The Eagle3 Target/Draft model now outputs last_hidden_status in addition to logits
While logits only needs the last token (via slice output), last_hidden_status requires all token tensors.
For chunk prefill, we must accumulate last_hidden_status across chunks, unlike logits which only needs the final chunk.

Changes in this PR:
Added logic to accumulate and concatenate last_hidden_status outputs across chunks during chunk prefill in the Eagle3 pipeline.

Tickets:

- CVS-180647

AsyaPronina

Great fix, thank you!

src/plugins/intel_npu/src/plugin/npuw/llm_eagle3_extension.cpp

AsyaPronina · 2026-02-05T10:23:20Z

src/plugins/intel_npu/src/plugin/npuw/llm_eagle3_extension.cpp

+    const uint32_t target_total_len = static_cast<uint32_t>(target_shape[1]);
+
+    OPENVINO_ASSERT(m_chunked_seq_offset + chunk_token_count <= target_total_len,
+                    "Chunked sequence offset exceeds pre-allocated size");


"Can't write chunk by stored chunked sequence offset and requested number of tokens, as it will exceed pre-allocated size"

AsyaPronina · 2026-02-05T10:35:54Z

src/plugins/intel_npu/src/plugin/npuw/llm_eagle3_extension.cpp

+
+    // Copy chunk data directly to the correct position in pre-allocated tensor
+    uint8_t* dst_ptr = reinterpret_cast<uint8_t*>(m_last_hidden_state->data());
+    dst_ptr += m_chunked_seq_offset * row_bytes;  // Move to the current write position


Could we please use ov::npuw::util::make_tensor_slice and tensor->copy_to(another_tensor) here? Some examples can be found in LLMInferRequest: https://github.com/openvinotoolkit/openvino/blob/master/src/plugins/intel_npu/src/plugin/npuw/llm_infer_request.cpp

Good proposal, done

AlexanderKalistratov · 2026-02-06T15:10:53Z

src/plugins/intel_npu/src/plugin/npuw/llm_eagle3_extension.hpp

+
+    // Reset chunked prefill state before starting a new chunked prefill session
+    void reset_chunked_prefill_state() {
+        m_last_hidden_state = {};


Why we need to do this?
It means on each prefill stage we are allocating a new tensor. Why?

Good question. Please consider this scenario:

If we run two prompts consecutively using infer:

For the first prompt: m_last_hidden_state is null -> pre-allocate tensor for the full tensor -> copy each chunk's last_hidden_state into pre-allocated memory

After the first prefill completes, the generate phase also updates m_last_hidden_state. When the generate phase finishes, m_last_hidden_state remains non-null.

For the second prompt: Since m_last_hidden_state is still non-null, prefill will not enter the "Pre-allocate tensor on first chunk" path, causing a memory size mismatch that triggers the assertion.

Given that each prompt inference only prefill once, it's reasonable to reset the tensor here.

So m_last_hidden_state is pointing to different tensors in prefill and generate phase?
It would be nice to have an explanatory comment here.

Having allocation per prefill is not a big deal I think. But we also can have pre-allocated tensor for prefill phase and not allocate it every time.

dmatveev · 2026-02-06T21:54:24Z

build_jenkins

GuoliangShiIntel · 2026-02-09T10:30:02Z

@dmatveev All CI passed, can we merge this PR?

GuoliangShiIntel self-assigned this Feb 5, 2026

github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Feb 5, 2026

GuoliangShiIntel force-pushed the sgl/fix_eagle_trunk_prefill_issue branch from 5d192f4 to dfefbd5 Compare February 5, 2026 02:45

GuoliangShiIntel marked this pull request as ready for review February 5, 2026 03:23

GuoliangShiIntel requested review from a team as code owners February 5, 2026 03:23

GuoliangShiIntel removed their assignment Feb 5, 2026

GuoliangShiIntel requested a review from AsyaPronina February 5, 2026 07:06

dmatveev added this to the 2026.1 milestone Feb 5, 2026

dmatveev self-assigned this Feb 5, 2026

AsyaPronina reviewed Feb 5, 2026

View reviewed changes

GuoliangShiIntel force-pushed the sgl/fix_eagle_trunk_prefill_issue branch 3 times, most recently from f04e2c0 to 2827581 Compare February 6, 2026 06:42

AsyaPronina approved these changes Feb 6, 2026

View reviewed changes

AlexanderKalistratov reviewed Feb 6, 2026

View reviewed changes

GuoliangShiIntel force-pushed the sgl/fix_eagle_trunk_prefill_issue branch from 2827581 to 8babf24 Compare February 6, 2026 16:06

AlexanderKalistratov approved these changes Feb 6, 2026

View reviewed changes

GuoliangShiIntel force-pushed the sgl/fix_eagle_trunk_prefill_issue branch from 8babf24 to 2aaccdb Compare February 6, 2026 16:17

GuoliangShiIntel added 4 commits February 7, 2026 08:36

Fix eagle3 with chunk prefill

d5d7993

Address Review Comments

c873ce1

Reset m_last_hidden_state

b5dcc55

Add comment

2aaccdb

AlexanderKalistratov added this pull request to the merge queue Feb 9, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 9, 2026

Merge branch 'master' into sgl/fix_eagle_trunk_prefill_issue

87606b5

AlexanderKalistratov enabled auto-merge February 9, 2026 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPUW] Fix eagle3 with chunk prefill #33975

[NPUW] Fix eagle3 with chunk prefill #33975

GuoliangShiIntel commented Feb 5, 2026 •

edited

Loading

Uh oh!

AsyaPronina left a comment

Uh oh!

Uh oh!

AsyaPronina Feb 5, 2026

Uh oh!

GuoliangShiIntel Feb 6, 2026

Uh oh!

AsyaPronina Feb 5, 2026

Uh oh!

GuoliangShiIntel Feb 6, 2026

Uh oh!

AlexanderKalistratov Feb 6, 2026

Uh oh!

GuoliangShiIntel Feb 6, 2026 •

edited

Loading

Uh oh!

AlexanderKalistratov Feb 6, 2026

Uh oh!

dmatveev commented Feb 6, 2026

Uh oh!

GuoliangShiIntel commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[NPUW] Fix eagle3 with chunk prefill #33975

Are you sure you want to change the base?

[NPUW] Fix eagle3 with chunk prefill #33975

Conversation

GuoliangShiIntel commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

Uh oh!

AsyaPronina left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AsyaPronina Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

GuoliangShiIntel Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

AsyaPronina Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

GuoliangShiIntel Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

AlexanderKalistratov Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

GuoliangShiIntel Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexanderKalistratov Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

dmatveev commented Feb 6, 2026

Uh oh!

GuoliangShiIntel commented Feb 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

GuoliangShiIntel commented Feb 5, 2026 •

edited

Loading

GuoliangShiIntel Feb 6, 2026 •

edited

Loading