Skip to content

feat: support cpu and xpu devices in llama4#326

Open
dvrogozh wants to merge 1 commit intometa-llama:mainfrom
dvrogozh:xpu
Open

feat: support cpu and xpu devices in llama4#326
dvrogozh wants to merge 1 commit intometa-llama:mainfrom
dvrogozh:xpu

Conversation

@dvrogozh
Copy link
Contributor

@dvrogozh dvrogozh commented Apr 9, 2025

Chang is similar to the one previously done for Llama3.

CPU support tried on Intel Xeon.

XPU support tried on Intel Data Center GPU Max series. Note that a fix in fairscale for facebookresearch/fairscale#1195 is required to make XPU working (CPU not affected). I believe that this fix might also be needed for CUDA as well since issue seems to be device agnostic and I see it with the simplified reproducer on NVidia A10.

Verified on the platforms named above with LLama4 sample completion and chat completion scripts.

Requires: facebookresearch/fairscale#1196
CC: @ashwinb, @raghotham

CPU support tried on Intel Xeon.

XPU support tried on Intel Data Center GPU Max series. Note that a fix
in fairscale for facebookresearch/fairscale#1195
is required to make XPU working (CPU not affected). I believe that this
fix might also be needed for CUDA as well since issue seems to be device
agnostic and I see it with the simplified reproducer on NVidia A10.

Verified on the platforms named above with LLama4 sample completion
and chat completion scripts.

Requires: facebookresearch/fairscale#1196
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 9, 2025
@dvrogozh
Copy link
Contributor Author

@ashwinb : can you, please, help review this PR and maybe the one to fairscale (facebookresearch/fairscale#1196)?

Carol25170

This comment was marked as abuse.

@ashwinb
Copy link
Contributor

ashwinb commented Apr 28, 2025

Thank you @dvrogozh especially for fixing the fairscale issue. I had observed that myself and it caused me much pain. I am not sure about the release cadence for fairscale packages, so will figure that out. Do we need that fix to land (in pypi) before this PR can work?

@dvrogozh
Copy link
Contributor Author

<...>the fairscale issue. <...> Do we need that fix to land (in pypi) before this PR can work?

API wise - no. That's fairscale internal issue and we did not update fairscale API.

Functionally wise:

  • We don't need the fix for CPU
  • We need the fix for XPU otherwise there will be runtime error (with latest fairscale version v0.4.13 which I've tried)
  • I suppose we need the fix for CUDA, but I can't verify since I don't have a system with multi-CUDA devices and llama4 don't fit in one. CUDA part of the story somewhat leaves me confused - I guess llama4 was verified on CUDA, at the same time I don't see why fariscale issue would not shown up on CUDA. I wonder, was llama4 verified on some internal/specific/patched version of fairscale where the issue did not exist?

@dvrogozh
Copy link
Contributor Author

dvrogozh commented Jun 2, 2025

@ashwinb : do we have a way forward making a release in fairscale and merging in this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants