[NPUW] Support multiple outputs in LLMCompiledModel#33664
[NPUW] Support multiple outputs in LLMCompiledModel#33664AsyaPronina wants to merge 10 commits intoopenvinotoolkit:masterfrom
Conversation
d8580e5 to
78264b9
Compare
If there's already a Result, why don't you use one in your 3mp? |
There was a problem hiding this comment.
I believe this change fixes the issue when a single Node has two Results connected to it.. And this happens when you introduce one more Result to feed the 3rd model. Can this be avoided?
If our partitioning doesn't work for that case, I'd probably fix it later
There was a problem hiding this comment.
I understand the concern, however, CutLMHead transformations happens before the model is split into prefill and generate. When we work with the prefill model, we also add a Slice but before only one Result node. So two Result nodes got separated: one is now after the Slice , another one is still connected to the layer before LM head. However, we don't need Slice for generate model as it outputs 1 token already. And only here we face the issue with two Results from one layer before the LM head.
If to merge two Results into one in advance, then Slice will be added to the both results..
There was a problem hiding this comment.
After testing it turns out that preserving changes in partitioning is a more safe approach. Merging multiple Result nodes per output layer for generate model works, however this one Result node would have multiple names (from multiple Result-s node to catch their meanings). But, in LLMInferRequest, on the contrary, we are using output_port->get_any_name() to get any tensor name but only one for mapping of names to outputs. This may cause issues. So, preserving partitioning changes currently.
GuoliangShiIntel
left a comment
There was a problem hiding this comment.
@AsyaPronina I have tested this change on both the Eagle3 and Omni pipelines, and both work well.
I'd like to leave an additional comment here in case it's related to the "multi-outputs" feature. Since "multi-outputs" may result in incorrect KV cache redirection, I made some related changes in the previous Eagle3 PR.
I noticed that Eugene added a comment after the PR was merged. I'm not sure if there's anything we should address regarding this?
|
@AsyaPronina And another question, is this PR plan for OV 26.0 release? if yes, i will also change for eagle3 pipeline to align this PR. |
|
@GuoliangShiIntel there's a lot PR checks failing here for some reason.. |
| // then all Result nodes share the same shape. | ||
| if (maybe_results.size() > 1) { | ||
| const auto shape = (*maybe_results.begin())->get_shape(); | ||
| for (auto i = 1; i < maybe_results.size(); ++i) { |
There was a problem hiding this comment.
Linux builds failed here:
error: comparison of integer expressions of different signedness: 'int' and 'std::vector<std::shared_ptr<ov::Node> >::size_type' {aka 'long unsigned int'} [-Werror=sign-compare]
for (auto i = 1; i < maybe_results.size(); ++i) {
Hello @GuoliangShiIntel!
|
1ad052b to
ee57571
Compare
3d2ef23 to
63200b2
Compare
dd88e0e to
49a8cb5
Compare
Details:
CutLMHeadtransformation.OnnxConfig,OnnxConfigWithPast,TextDecoderOnnxConfigandTextDecoderWithPositionIdsOnnxConfigfrom optimum-onnx name LLM output with "logits": https://github.com/huggingface/optimum-onnx/blob/main/optimum/exporters/onnx/base.py#L137, https://github.com/huggingface/optimum-onnx/blob/main/optimum/exporters/onnx/config.py#L100export()function set names for output tensors from Exporter config: https://github.com/huggingface/optimum-intel/blob/main/optimum/exporters/openvino/convert.py#L442-L445Tickets: