-
Notifications
You must be signed in to change notification settings - Fork 240
Feat (equalize): adding initial support for MixQuant #1448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
| # When both rotation and permutation are enabled, use the unified context manager | ||
| if args.apply_permute: | ||
| print("Applying permutations...") | ||
| with rotate_permute_mode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decouple rotation and permutations, we can first compute all rotations no matter what, and then do rotate_permute_mode maybe?
In that case, there should be the possibility pass the rotation rewriters to the class that it will just take care of applying them (rather than computing them)
97d5a01 to
bbe6651
Compare
Reason for this PR
This PR extends graph equalization to enable MixQuant, which calibrates permutations to improve quantization accuracy when using block rotations. If calibrated intentionally (e.g., with mass diffusion), permutations can help balance the distribution of activation magnitudes across blocks prior to rotation. This is particularly beneficial for low-bit quantization (e.g., INT4, FP4) where outlier management is critical.
Changes Made in this PR
New Permutation Infrastructure (
src/brevitas/graph/equalize.py):PermuteGraphclass to manage permutation computation and applicationmassdiff,zigzag,absmax, andrandomapply_permutemethod toRegionclass for applying permutationsrotate_permute_modecontext manager for unified rotation+permutation workflowCLI Integration (
src/brevitas_examples/llm/llm_args.py):--apply-permuteflag to enable permutation equalization--permute-fnargument to select permutation strategy (default:massdiff)Main Workflow Updates (
src/brevitas_examples/llm/main.py):fused_rotation_no_fx()to support permutation modeUtility Functions (
src/brevitas/graph/utils.py):find_node_for_modulehelper function for graph traversalContext Manager Design: The
rotate_permute_modecontext manager encapsulates the entire workflow:Notable Implementation Details:
Expected Results
MixQuant demonstrates significant improvements over block rotations alone on Llama-3.2-1B-Instruct with W4A4 per-channel quantization. Using block rotations with
block_rotation_dim: 32andmassdiffpermutation strategy, MixQuant achieves:The improvements stem from better activation outlier management through channel permutations that balance magnitude distributions within rotation blocks.
Configuration: Both methods use Qronos for error correction with dynamic per-row activations, MSE weight scales, and fused Hadamard rotations. See
src/brevitas_examples/papers/mixquant/llama3-mixquant-int4.ymlfor the full config. You can run this as:Please use https://github.com/i-colbert/brevitas/tree/mixquant/src/brevitas_examples/papers/mixquant to reproduce the experiments from the paper.
Testing Summary
Added
test_rotate_permute_modetotests/brevitas/graph/test_equalization.pymassdiff,zigzag,absmax, andrandomblock_rotation_dim,disable_block_rotation_for_fused, andexpansion_stepRisk Highlight
Checklist
devbranch.