Skip to content

Conversation

@zzzzzzzxh
Copy link
Contributor

No description provided.

- Add @jit decorator to LlamaModelEagle3.construct to enable static graph mode
- Add set_model_inputs method to declare dynamic shapes for inputs
- This allows MindSpore static graph compiler to handle Eagle3's dynamic batch sizes properly
…ecode graphs

Problem:
- Original implementation used @jit decorator on construct methods, causing control flow
  statements (is_prefill, capture_hidden_mode, forward_mode conditions) to be
  compiled into static graph, resulting in 6 sub-graphs and triggering
  "multi sub graph" error

Solution:
- Reference vllm-mindspore qwen2_eagle3 implementation
- Remove @jit decorators and implement exec_model to compile and cache separate
  prefill_graph and decode_graph based on is_prefill flag
- Each compiled graph has no control flow, meeting MindSpore static graph
  optimization requirements
Problem:
- LlamaAttention.construct has if is_prefill branch (llama.py)
- When is_prefill passed as parameter, control flow is compiled into graph

Solution:
- Remove is_prefill parameter from LlamaDecoderLayerEagle3.attention call
- Rely on get_model_context("is_prefill") inside LlamaAttention
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant