[ET-VK][quantization] Implement layout-flexible quantize/dequantize operators #17106

SS-JIA · 2026-02-02T17:13:55Z

Stack from ghstack (oldest at bottom):

Implemented quantize_per_tensor and dequantize_per_tensor GLSL shaders
and C++ dispatch logic to support the new single-dimension packed INT8 layouts
(kPackedInt8_4W, kPackedInt8_4C, kPackedInt8_4H). These operators enable
conversion between floating-point tensors and packed int8 representations with
per-tensor scale and zero-point parameters.

The implementation includes:

GLSL shaders: quantize_per_tensor and dequantize_per_tensor with support for
both texture->buffer and buffer->buffer data flows, including GL_EXT_debug_printf
statements for debugging
QuantizeDequantize.cpp: Added dispatch functions for the new layouts and
registered etvk.q_dq_8bit_per_tensor.default operator
Test infrastructure: Created q_dq_8bit_per_tensor test binary with DEBUG_MODE
support and reference CPU implementation for validation

The shaders implement the quantization formula Q = clamp(round(x/scale) + zp, -128, 127)
and dequantization formula x' = (Q - zp) * scale, with proper int8 packing/unpacking
using little-endian byte ordering and sign extension.

Differential Revision: D92061370

…perators Implemented quantize_per_tensor and dequantize_per_tensor GLSL shaders and C++ dispatch logic to support the new single-dimension packed INT8 layouts (kPackedInt8_4W, kPackedInt8_4C, kPackedInt8_4H). These operators enable conversion between floating-point tensors and packed int8 representations with per-tensor scale and zero-point parameters. The implementation includes: - GLSL shaders: quantize_per_tensor and dequantize_per_tensor with support for both texture->buffer and buffer->buffer data flows, including GL_EXT_debug_printf statements for debugging - QuantizeDequantize.cpp: Added dispatch functions for the new layouts and registered etvk.q_dq_8bit_per_tensor.default operator - Test infrastructure: Created q_dq_8bit_per_tensor test binary with DEBUG_MODE support and reference CPU implementation for validation The shaders implement the quantization formula Q = clamp(round(x/scale) + zp, -128, 127) and dequantization formula x' = (Q - zp) * scale, with proper int8 packing/unpacking using little-endian byte ordering and sign extension. Differential Revision: [D92061370](https://our.internmc.facebook.com/intern/diff/D92061370/) [ghstack-poisoned]

pytorch-bot · 2026-02-02T17:13:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17106

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 2 Pending

As of commit 67f063b with merge base 477867a ():

NEW FAILURES - The following jobs have failed:

pull / test-vulkan-operators-linux / linux-job (gh)
RuntimeError: Command docker exec -t f507d2ce9904df37c1b7bae3c5cb948794f33aafed6a6f9249e6fae6f3cfc5a2 /exec failed with exit code 134
pull / unittest / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_linear_model
Test CUDA Builds / test-models-cuda (conv1d) / linux-job (gh)
RuntimeError: Command docker exec -t a6dcae6b83d74401c4df1128a2c0b85786ac62f0f44502a02691fef2f16bacb2 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-02-02T17:15:06Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…equantize operators" Implemented quantize_per_tensor and dequantize_per_tensor GLSL shaders and C++ dispatch logic to support the new single-dimension packed INT8 layouts (kPackedInt8_4W, kPackedInt8_4C, kPackedInt8_4H). These operators enable conversion between floating-point tensors and packed int8 representations with per-tensor scale and zero-point parameters. The implementation includes: - GLSL shaders: quantize_per_tensor and dequantize_per_tensor with support for both texture->buffer and buffer->buffer data flows, including GL_EXT_debug_printf statements for debugging - QuantizeDequantize.cpp: Added dispatch functions for the new layouts and registered etvk.q_dq_8bit_per_tensor.default operator - Test infrastructure: Created q_dq_8bit_per_tensor test binary with DEBUG_MODE support and reference CPU implementation for validation The shaders implement the quantization formula Q = clamp(round(x/scale) + zp, -128, 127) and dequantization formula x' = (Q - zp) * scale, with proper int8 packing/unpacking using little-endian byte ordering and sign extension. Differential Revision: [D92061370](https://our.internmc.facebook.com/intern/diff/D92061370/) [ghstack-poisoned]

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 2, 2026

This was referenced Feb 2, 2026

[ET-VK][testing] Add per-shader timing breakdown to benchmark output #17105

Open

[ET-VK][ez] Implement helper functions to get fastest moving dim #17107

Open

[ET-VK][qconv] Add layout-flexible impl of quantized depthwise conv2d #17108

Open

meta-codesync bot added fb-exported meta-exported labels Feb 2, 2026

This was referenced Feb 3, 2026

[ET-VK] Add alignment fields to PackedDimInfo for padded size calculation #17170

Open

[ET-VK][quantization] Add layout-flexible clone for int8x4 tensors #17171

Open

This was referenced Feb 4, 2026

[ET-VK][qconv] Add layout-agnostic general shader for quantized conv #17219

Open

[ET-VK][testing] Create dedicated test binary for pointwise convolutions #17220

Open

[ET-VK][qconv] Add flexible layout impl for quantized pointwise conv #17221

Open

SS-JIA requested review from kirklandsign and larryliu0820 as code owners February 4, 2026 20:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK][quantization] Implement layout-flexible quantize/dequantize operators #17106

[ET-VK][quantization] Implement layout-flexible quantize/dequantize operators #17106

Uh oh!

SS-JIA commented Feb 2, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[ET-VK][quantization] Implement layout-flexible quantize/dequantize operators #17106

Are you sure you want to change the base?

[ET-VK][quantization] Implement layout-flexible quantize/dequantize operators #17106

Uh oh!

Conversation

SS-JIA commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17106

❌ 3 New Failures, 2 Pending

Uh oh!

github-actions bot commented Feb 2, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SS-JIA commented Feb 2, 2026 •

edited

Loading

pytorch-bot bot commented Feb 2, 2026 •

edited

Loading

This PR needs a `release notes:` label