[ET-VK][qconv] Add flexible layout impl for im2col#17249
[ET-VK][qconv] Add flexible layout impl for im2col#17249meta-codesync[bot] merged 2 commits intogh/SS-JIA/411/basefrom
Conversation
This implements an im2col-based approach for quantized conv2d, which transforms convolution into matrix multiplication. The im2col transformation extracts sliding windows from the input tensor and reshapes them into a 2D matrix, enabling reuse of the optimized pointwise convolution shader for the compute-intensive portion. Two im2col shaders are added: - `q8ta_im2col.glsl`: Generic shader with layout-agnostic input access via BufferMetadata and specialization constants - `q8ta_im2col_4w4c.glsl`: Optimized shader for 4W4C input layout that exploits the alignment between consecutive width positions and packed channel values The im2col output is always stored in 4W4C layout to match the expected input format of the pointwise convolution shader. The operator is registered as `etvk.q8ta_conv2d_im2col.default` and currently supports non-grouped convolutions where input channels is a multiple of 4. Authored with assistance from Claude. Differential Revision: [D92407723](https://our.internmc.facebook.com/intern/diff/D92407723/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17249
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 1 Pending, 3 Unrelated FailuresAs of commit 6fa8b78 with merge base 1cffd23 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This implements an im2col-based approach for quantized conv2d, which transforms convolution into matrix multiplication. The im2col transformation extracts sliding windows from the input tensor and reshapes them into a 2D matrix, enabling reuse of the optimized pointwise convolution shader for the compute-intensive portion. Two im2col shaders are added: - `q8ta_im2col.glsl`: Generic shader with layout-agnostic input access via BufferMetadata and specialization constants - `q8ta_im2col_4w4c.glsl`: Optimized shader for 4W4C input layout that exploits the alignment between consecutive width positions and packed channel values The im2col output is always stored in 4W4C layout to match the expected input format of the pointwise convolution shader. The operator is registered as `etvk.q8ta_conv2d_im2col.default` and currently supports non-grouped convolutions where input channels is a multiple of 4. Authored with assistance from Claude. Differential Revision: [D92407723](https://our.internmc.facebook.com/intern/diff/D92407723/) ghstack-source-id: 338601821 Pull Request resolved: #17249
This PR needs a
|
This implements an im2col-based approach for quantized conv2d, which transforms convolution into matrix multiplication. The im2col transformation extracts sliding windows from the input tensor and reshapes them into a 2D matrix, enabling reuse of the optimized pointwise convolution shader for the compute-intensive portion. Two im2col shaders are added: - `q8ta_im2col.glsl`: Generic shader with layout-agnostic input access via BufferMetadata and specialization constants - `q8ta_im2col_4w4c.glsl`: Optimized shader for 4W4C input layout that exploits the alignment between consecutive width positions and packed channel values The im2col output is always stored in 4W4C layout to match the expected input format of the pointwise convolution shader. The operator is registered as `etvk.q8ta_conv2d_im2col.default` and currently supports non-grouped convolutions where input channels is a multiple of 4. Authored with assistance from Claude. Differential Revision: [D92407723](https://our.internmc.facebook.com/intern/diff/D92407723/) [ghstack-poisoned]
Pull Request resolved: #17249 This implements an im2col-based approach for quantized conv2d, which transforms convolution into matrix multiplication. The im2col transformation extracts sliding windows from the input tensor and reshapes them into a 2D matrix, enabling reuse of the optimized pointwise convolution shader for the compute-intensive portion. Two im2col shaders are added: - `q8ta_im2col.glsl`: Generic shader with layout-agnostic input access via BufferMetadata and specialization constants - `q8ta_im2col_4w4c.glsl`: Optimized shader for 4W4C input layout that exploits the alignment between consecutive width positions and packed channel values The im2col output is always stored in 4W4C layout to match the expected input format of the pointwise convolution shader. The operator is registered as `etvk.q8ta_conv2d_im2col.default` and currently supports non-grouped convolutions where input channels is a multiple of 4. Authored with assistance from Claude. ghstack-source-id: 338638552 @exported-using-ghexport Differential Revision: [D92407723](https://our.internmc.facebook.com/intern/diff/D92407723/)
752fdb3
into
gh/SS-JIA/411/base
Pull Request resolved: #17249 This implements an im2col-based approach for quantized conv2d, which transforms convolution into matrix multiplication. The im2col transformation extracts sliding windows from the input tensor and reshapes them into a 2D matrix, enabling reuse of the optimized pointwise convolution shader for the compute-intensive portion. Two im2col shaders are added: - `q8ta_im2col.glsl`: Generic shader with layout-agnostic input access via BufferMetadata and specialization constants - `q8ta_im2col_4w4c.glsl`: Optimized shader for 4W4C input layout that exploits the alignment between consecutive width positions and packed channel values The im2col output is always stored in 4W4C layout to match the expected input format of the pointwise convolution shader. The operator is registered as `etvk.q8ta_conv2d_im2col.default` and currently supports non-grouped convolutions where input channels is a multiple of 4. Authored with assistance from Claude. ghstack-source-id: 338638552 @exported-using-ghexport Differential Revision: [D92407723](https://our.internmc.facebook.com/intern/diff/D92407723/)
Pull Request resolved: #17249 This implements an im2col-based approach for quantized conv2d, which transforms convolution into matrix multiplication. The im2col transformation extracts sliding windows from the input tensor and reshapes them into a 2D matrix, enabling reuse of the optimized pointwise convolution shader for the compute-intensive portion. Two im2col shaders are added: - `q8ta_im2col.glsl`: Generic shader with layout-agnostic input access via BufferMetadata and specialization constants - `q8ta_im2col_4w4c.glsl`: Optimized shader for 4W4C input layout that exploits the alignment between consecutive width positions and packed channel values The im2col output is always stored in 4W4C layout to match the expected input format of the pointwise convolution shader. The operator is registered as `etvk.q8ta_conv2d_im2col.default` and currently supports non-grouped convolutions where input channels is a multiple of 4. Authored with assistance from Claude. ghstack-source-id: 338638552 @exported-using-ghexport Differential Revision: [D92407723](https://our.internmc.facebook.com/intern/diff/D92407723/)
Stack from ghstack (oldest at bottom):
This implements an im2col-based approach for quantized conv2d, which
transforms convolution into matrix multiplication. The im2col
transformation extracts sliding windows from the input tensor and
reshapes them into a 2D matrix, enabling reuse of the optimized
pointwise convolution shader for the compute-intensive portion.
Two im2col shaders are added:
q8ta_im2col.glsl: Generic shader with layout-agnostic input accessvia BufferMetadata and specialization constants
q8ta_im2col_4w4c.glsl: Optimized shader for 4W4C input layout thatexploits the alignment between consecutive width positions and packed
channel values
The im2col output is always stored in 4W4C layout to match the expected
input format of the pointwise convolution shader. The operator is
registered as
etvk.q8ta_conv2d_im2col.defaultand currently supportsnon-grouped convolutions where input channels is a multiple of 4.
Authored with assistance from Claude.
Differential Revision: D92407723