Can we benefit from sparse sage attention (spargeattn)? #485
Replies: 6 comments 33 replies
-
|
I'm planning to integrate this and run tests today. Theoretically, Sparse Attention+SA2 is the fastest, let's see if it's as fast as they say.
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
Nice work! I see you updated Comfy gui too. |
Beta Was this translation helpful? Give feedback.
-
|
Actually, logically, 3bfp16 was converted to 3bQ8. Q8 means the same quality is maintained at 1:1, while K6 means slightly less quality is maintained, meaning the model is slightly smaller. However, logically, 3bfp16 = 3bQ8 should have the same quality. I use ComfyUI with PyTorch 2.7.1=cu128, an older ComfyUI version, while the one I use for testing purposes has the newer ComfyUI version, PyTorch 2.9.1 + cu130. Theoretically, the newer one should work faster, but in reality, the opposite is true; the old one is faster, so I use the old one every day. I only use the new one for testing. This is my old system: This is my new system for testing: |
Beta Was this translation helpful? Give feedback.
-
|
Is it possible you could share the NVFP4 3b model? Would it be possible to create a NVFP4 7b model for educational purposes? Edit: Also, this one which has some NVFP4 models. https://huggingface.co/Nexus24/vaeGGUF/tree/main I've downloaded them and placed them into the seedvr2 folder under models. However, they are not showing up. Any idea how I can get them to be available in the drop down list? Edit2: |
Beta Was this translation helpful? Give feedback.





Uh oh!
There was an error while loading. Please reload this page.
-
Ref: https://github.com/thu-ml/SpargeAttn
Beta Was this translation helpful? Give feedback.
All reactions