Skip to content

Decentralised inference + Prompt tuning of LLaMA-2-70B with Petals (swarm model-parallelism). Meta-repo with summary and links to the two write-ups.

Notifications You must be signed in to change notification settings

sparklerz/petals-llama2-70b

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Petals × LLaMA-2-70B — Decentralised Inference + Prompt Tuning over the Internet

Running LLaMA-2-70B on a global Petals swarm for inference, and performing prompt tuning via swarm model-parallelism. This repo is a concise overview with links to full write-ups and exact commands.

At a glance (stack): Petals · PyTorch · Transformers · Hugging Face Hub


Why this matters

Petals splits very large models across many volunteer/peer GPUs and stitches them together at runtime. That lets you serve or tune 70B-scale models without owning a single 80GB GPU. Peers can join/leave; Petals routes requests over the swarm.


What’s in this repo

This is a meta-repo (README + links). For the exact scripts, flags, logs, and troubleshooting, use the two articles below.


Links

About

Decentralised inference + Prompt tuning of LLaMA-2-70B with Petals (swarm model-parallelism). Meta-repo with summary and links to the two write-ups.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published