Skip to content

Conversation

@Honry
Copy link
Contributor

@Honry Honry commented Feb 4, 2026

Add support for GroupQueryAttention with:

  • do_rotary=true (cos_cache/sin_cache inputs)
  • Packed QKV (optional key/value inputs)
  • Optional past_key/past_value for prefill mode
  • Remove fp16->fp32 casting workaround

Add ApplyRotaryEmbedding helper function.

Fix decode stage by using qkv_sequence_length to distinguish prefill vs decode, and use runtime seqlens_k instead of static past_sequence_length for rotary position calculation.

Add support for GroupQueryAttention with:
- do_rotary=true (cos_cache/sin_cache inputs)
- Packed QKV (optional key/value inputs)
- Optional past_key/past_value for prefill mode
- Remove fp16->fp32 casting workaround

Add ApplyRotaryEmbedding helper function.

Fix decode stage by using qkv_sequence_length instead of has_past_key
to distinguish prefill vs decode, and use runtime seqlens_k instead of
static past_sequence_length for rotary position calculation.
@Honry
Copy link
Contributor Author

Honry commented Feb 4, 2026

@fdwr, @guschmue, PTAL, thanks!

fdwr
fdwr previously approved these changes Feb 4, 2026
Copy link
Contributor

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, else LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants