Skip to content

Conversation

@mohammed-saalim
Copy link
Contributor

Summary

This PR fixes a KeyError in the InsertIOQDQ pass that occurrs when quantizing LLMs (such as SmolLM2) for the Qualcomm QNN backend.

Problem

In insert_io_qdq.py, the q_dq_map dictionary was missing entries for dequantize operations. When a node's quantization encoding was already a dequantize operation (e.g., dequantize_per_tensor.default), trying to look it up in the map during the _insert phase caused a KeyError.

Solution

Extended the q_dq_map to include dequantize-to-self (identity) mappings for:

  • quantized_decomposed.dequantize_per_tensor.default
  • quantized_decomposed.dequantize_per_tensor.tensor
  • quantized_decomposed.dequantize_per_channel.default
    This allows the pass to correctly handle nodes that have already been processed into dequantized form.

Testing

  • Verified that the modified file parses correctly via Python's ast module.
  • Confirmed that q_dq_map now contains the expected 6 keys.
  • Manual verification on Qualcomm hardware is requested from the maintainers to confirm resolution for the SmolLM2 workflow.
    Fixes Qualcomm Quantization and Lowering for LLM fails #16690

Extend q_dq_map to include dequantize ops mapping to themselves.
This fixes KeyError when nodes have dequantize encodings (e.g.,
dequantize_per_tensor.default) instead of quantize encodings.

Fixes pytorch#16690
Copilot AI review requested due to automatic review settings February 4, 2026 06:06
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 4, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17194

Note: Links to docs will display an error until the docs builds have been completed.

❌ 8 Awaiting Approval, 1 New Failure

As of commit 5ebb788 with merge base 2ace1cc (image):

AWAITING APPROVAL - The following workflows need approval before CI can run:

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 4, 2026
@mohammed-saalim
Copy link
Contributor Author

While changing the quantization recipe (like using 8-bit KV cache) might change the graph structure, the InsertIOQDQ.py
pass should still be robust enough to handle dequantize operations in the IR without throwing a KeyError. This PR ensures the pass is forward-compatible with models that already have these encodings

@mohammed-saalim
Copy link
Contributor Author

@pytorchbot label "release notes: none"

@pytorch-bot pytorch-bot bot added the release notes: none Do not include this in the release notes label Feb 4, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a KeyError in the InsertIOQDQ pass that occurred when quantizing LLMs (such as SmolLM2) for the Qualcomm QNN backend. The error was caused by missing entries in the q_dq_map dictionary for dequantize operations.

Changes:

  • Extended q_dq_map with identity mappings for dequantize operations to handle nodes that already have dequantize encodings
  • Added three new entries mapping dequantize operations to themselves (per-tensor default, per-tensor tensor, and per-channel default)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: none Do not include this in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qualcomm Quantization and Lowering for LLM fails

1 participant