[ET-VK][qconv] Add flexible layout impl for im2col by SS-JIA · Pull Request #17249 · pytorch/executorch

SS-JIA · 2026-02-05T16:33:12Z

Stack from ghstack (oldest at bottom):

This implements an im2col-based approach for quantized conv2d, which
transforms convolution into matrix multiplication. The im2col
transformation extracts sliding windows from the input tensor and
reshapes them into a 2D matrix, enabling reuse of the optimized
pointwise convolution shader for the compute-intensive portion.

Two im2col shaders are added:

q8ta_im2col.glsl: Generic shader with layout-agnostic input access
via BufferMetadata and specialization constants
q8ta_im2col_4w4c.glsl: Optimized shader for 4W4C input layout that
exploits the alignment between consecutive width positions and packed
channel values

The im2col output is always stored in 4W4C layout to match the expected
input format of the pointwise convolution shader. The operator is
registered as etvk.q8ta_conv2d_im2col.default and currently supports
non-grouped convolutions where input channels is a multiple of 4.

Authored with assistance from Claude.

Differential Revision: D92407723

This implements an im2col-based approach for quantized conv2d, which transforms convolution into matrix multiplication. The im2col transformation extracts sliding windows from the input tensor and reshapes them into a 2D matrix, enabling reuse of the optimized pointwise convolution shader for the compute-intensive portion. Two im2col shaders are added: - `q8ta_im2col.glsl`: Generic shader with layout-agnostic input access via BufferMetadata and specialization constants - `q8ta_im2col_4w4c.glsl`: Optimized shader for 4W4C input layout that exploits the alignment between consecutive width positions and packed channel values The im2col output is always stored in 4W4C layout to match the expected input format of the pointwise convolution shader. The operator is registered as `etvk.q8ta_conv2d_im2col.default` and currently supports non-grouped convolutions where input channels is a multiple of 4. Authored with assistance from Claude. Differential Revision: [D92407723](https://our.internmc.facebook.com/intern/diff/D92407723/) [ghstack-poisoned]

pytorch-bot · 2026-02-05T16:33:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17249

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Pending, 3 Unrelated Failures

As of commit 6fa8b78 with merge base 1cffd23 ():

NEW FAILURES - The following jobs have failed:

pull / unittest / linux / linux-job (gh)
RuntimeError: Command docker exec -t 9a703f324bd7f244f6e78ee1e9c81c66b204961e44c1595768b87b24a7ffd2ba /exec failed with exit code 1
pull / unittest-editable / linux / linux-job (gh)
RuntimeError: Command docker exec -t c50d4dc039c473b4d961d5eb59ac293cc293dc15fece9b3d016351a5ad0095ce /exec failed with exit code 1
Test CUDA Builds / test-model-cuda-e2e (mistralai, Voxtral-Mini-3B-2507, quantized-int4-tile-packed) / linux-job (gh)
RuntimeError: Command docker exec -t bf74a0421fe2aa592fb5b6155f8d8d52a65b9791aff563b8ac96f7c412dbcdd1 /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / test-samsung-quantmodels-linux / linux-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This implements an im2col-based approach for quantized conv2d, which transforms convolution into matrix multiplication. The im2col transformation extracts sliding windows from the input tensor and reshapes them into a 2D matrix, enabling reuse of the optimized pointwise convolution shader for the compute-intensive portion. Two im2col shaders are added: - `q8ta_im2col.glsl`: Generic shader with layout-agnostic input access via BufferMetadata and specialization constants - `q8ta_im2col_4w4c.glsl`: Optimized shader for 4W4C input layout that exploits the alignment between consecutive width positions and packed channel values The im2col output is always stored in 4W4C layout to match the expected input format of the pointwise convolution shader. The operator is registered as `etvk.q8ta_conv2d_im2col.default` and currently supports non-grouped convolutions where input channels is a multiple of 4. Authored with assistance from Claude. Differential Revision: [D92407723](https://our.internmc.facebook.com/intern/diff/D92407723/) ghstack-source-id: 338601821 Pull Request resolved: #17249

github-actions · 2026-02-05T16:33:58Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

This implements an im2col-based approach for quantized conv2d, which transforms convolution into matrix multiplication. The im2col transformation extracts sliding windows from the input tensor and reshapes them into a 2D matrix, enabling reuse of the optimized pointwise convolution shader for the compute-intensive portion. Two im2col shaders are added: - `q8ta_im2col.glsl`: Generic shader with layout-agnostic input access via BufferMetadata and specialization constants - `q8ta_im2col_4w4c.glsl`: Optimized shader for 4W4C input layout that exploits the alignment between consecutive width positions and packed channel values The im2col output is always stored in 4W4C layout to match the expected input format of the pointwise convolution shader. The operator is registered as `etvk.q8ta_conv2d_im2col.default` and currently supports non-grouped convolutions where input channels is a multiple of 4. Authored with assistance from Claude. Differential Revision: [D92407723](https://our.internmc.facebook.com/intern/diff/D92407723/) [ghstack-poisoned]

Pull Request resolved: #17249 This implements an im2col-based approach for quantized conv2d, which transforms convolution into matrix multiplication. The im2col transformation extracts sliding windows from the input tensor and reshapes them into a 2D matrix, enabling reuse of the optimized pointwise convolution shader for the compute-intensive portion. Two im2col shaders are added: - `q8ta_im2col.glsl`: Generic shader with layout-agnostic input access via BufferMetadata and specialization constants - `q8ta_im2col_4w4c.glsl`: Optimized shader for 4W4C input layout that exploits the alignment between consecutive width positions and packed channel values The im2col output is always stored in 4W4C layout to match the expected input format of the pointwise convolution shader. The operator is registered as `etvk.q8ta_conv2d_im2col.default` and currently supports non-grouped convolutions where input channels is a multiple of 4. Authored with assistance from Claude. ghstack-source-id: 338638552 @exported-using-ghexport Differential Revision: [D92407723](https://our.internmc.facebook.com/intern/diff/D92407723/)

This was referenced Feb 5, 2026

[ET-VK][testing] Add per-shader timing breakdown to benchmark output #17105

Merged

[ET-VK] Add alignment fields to PackedDimInfo for padded size calculation #17170

Merged

SS-JIA mentioned this pull request Feb 5, 2026

[ET-VK][quantization] Implement layout-flexible quantize/dequantize operators #17106

Merged

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 5, 2026

meta-codesync bot added fb-exported meta-exported labels Feb 5, 2026

manuelcandales approved these changes Feb 5, 2026

View reviewed changes

meta-codesync bot merged commit 752fdb3 into gh/SS-JIA/411/base Feb 5, 2026
176 of 184 checks passed

meta-codesync bot deleted the gh/SS-JIA/411/head branch February 5, 2026 23:29

meta-codesync bot temporarily deployed to cherry-pick-bot February 5, 2026 23:29 Inactive

pytorchbot mentioned this pull request Feb 5, 2026

[ET-VK][qconv] Add flexible layout impl for im2col #17268

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK][qconv] Add flexible layout impl for im2col#17249

[ET-VK][qconv] Add flexible layout impl for im2col#17249
meta-codesync[bot] merged 2 commits intogh/SS-JIA/411/basefrom
gh/SS-JIA/411/head

SS-JIA commented Feb 5, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SS-JIA commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17249

❌ 3 New Failures, 1 Pending, 3 Unrelated Failures

Uh oh!

github-actions bot commented Feb 5, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SS-JIA commented Feb 5, 2026 •

edited

Loading

pytorch-bot bot commented Feb 5, 2026 •

edited

Loading

This PR needs a `release notes:` label