🔍 Awesome Edge LLMs

A comprehensive survey on Edge AI，covering hardware, software, frameworks, applications, performance optimization, and the deployment of LLMs on edge devices.

Open Source Edge Models

The listed models are base model limited to either of the following:

Parameter ≤ 10B
Officially claimed edge models

Model	Size	Org	Time	Download	Paper
SmalLM3	3B	Hugging Face	2025.7.9	🤗	📖
MiniCPM4	8B	OpenBMB	2025.6.6	🤗
Qwen2.5-Omni	7B	Qwen	2025.3.26	🤗
MiniCPM-o 2.6	8B	OpenBMB	2025.1.14	🤗	-
Phi-4	14B	Microsoft	2025.1.9 2024.12.12(release)	🤗
VITA-1.5	7B	VITA	2025.1.6	-
Megrez-3B-Omni	3B	Infinigence	2024.12.16	🤗	-
OmniAudio	2.6B	Nexa AI	2024.12.12	🤗	📖
InternVL 2.5	8B	OpenGVLab	2024.12.5	🤗	-
GLM-Edge	1.5B 2B 4B 5B	THUDM	2024.11.29	🤗	-
SmalVLM	2B	Hugging Face	2024.11.26	🤗	📖
SmalLM2	135M 360M 1.7B	Hugging Face	2024.11.1	🤗	📖
Ministral	3B 8B	Mistral AI	2024.10.16	🤗	📖
Qwen2.5	0.5B, 1.5B, 3B, 7B	Qwen	2024.9.19	🤗	📖
Pixtral 12B	12B	Mistral AI	2024.9.17	🤗	📖
Qwen2-VL	2B 7B	Qwen	2024.8.30	🤗	📖
Phi 3.5	3.8B 4.1B	Microsoft	2024.8.21	🤗	-
MiniCPM-V 2.6	8B	OpenBMB	2024.8.6	🤗	-
SmolLM	135M 360M 1.7B	Hugging Face	2024.8.2	🤗	📖
Gemma2	2B 9B	Google	2024.7.31	🤗	📖
DCLM 7B	7B	Apple	2024.7.18	🤗
Phi-3	3.8B 7B	Microsoft	2024.4.23	🤗
Mistral NeMo	12B	Mistral AI	2024.6.18	🤗	📖
Gemma	2B 7B	Google	2024.2.21	🤗	📖
Mistral 7B	2B 7B	Mistral AI	2023.9.27	🤗	📖

Embodied Model

LLM Inference

Title	Date	Org	Paper
DashInfer-VLM	2025.1	ModelScope	📖
SparseInfer	2024.11	University of Seoul, etc
Mooncake	2024.6	Moonshot AI	📖
flashinfer	2024.2	flashinfer-ai	📖
inferflow	2024.2	Tencent AI Lab
PowerInfer	2023.12	SJTU
PETALS	2023.12	HSE University, etc
TensorRT-LLM	2023.10	NVIDIA	-
LightSeq	2023.10	UC Berkeley, etc
vLLM	2023.9	UC Berkeley, etc
StreamingLLM	2023.9	Meta AI, etc
MLC-LLM	2023.5	mlc-ai	📖
Medusa	2023.9	Tianle Cai, etc	📖
LightLLM	2023.8	ModelTC	-
FastServe	2023.5	Peking University
SpecInfer	2023.05	Peking University, etc
Ollama	2023.8	Ollama Inc	-
LMDeploy	2023.6	InternLM	📖
Megatron-LM	2020.5	NVIDIA

Processor

NVIDIA

✅ 50 Series @2025

	GeForce RTX 5090	GeForce RTX 5080	GeForce RTX 5070 Ti	GeForce RTX 5070
NVIDIA CUDA Cores	21760	10752	8960	6144
Shader Cores	Blackwell	Blackwell	Blackwell	Blackwell
Tensor Cores (AI)	5th Generation 3352 AI TOPS	5th Generation 1801 AI TOPS	5th Generation 1406 AI TOPS	5th Generation 988 AI TOPS
Ray Tracing Cores	4th Generation 318 TFLOPS	4th Generation 171 TFLOPS	4th Generation 133 TFLOPS	4th Generation 94 TFLOPS
Boost Clock (GHz)	2.41	2.62	2.45	2.51
Base Clock (GHz)	2.01	2.30	2.30	2.16
Standard Memory Config	32 GB GDDR7	16 GB GDDR7	16 GB GDDR7	12 GB GDDR7
Memory Interface Width	512-bit	256-bit	256-bit	192-bit
Price	$1999	$999	$749	$549

✅ 40 Super Series @2024

GPU Specs	GeForce RTX 4080 Super	GeForce RTX 4070 Ti Super	GeForce RTX 4070 Super
CUDA Cores	10,240	8448	7168
Memory Configuration	16 GB GDDR6X	16 GB GDDR6X	12 GB GDDR6X
Memory Interface Width	256-bit	256-bit	256-bit
Memory Bandwidth	736 GB/s	736 GB/s	736 GB/s
Base Clock (GHz)	2.21 GHz	2.31 GHz	1.92 GHz
Boost Clock (GHz)	2.55 GHz	2.61 GHz	2.48 GHz
Graphics Card Power	320W	285W	200W
Recommended PSU	750W	700W	650W
Price	$999	$799	$599

✅ 40 Series @2022

GPU Specs	GeForce RTX 4090	GeForce RTX 4080	GeForce RTX 4070 Ti	GeForce RTX 4070	GeForce RTX 4060 Ti	GeForce RTX 4060
NVIDIA CUDA Cores	16384	9728	7680	5888	4352	3072
Shader Cores	Ada Lovelace	Ada Lovelace	Ada Lovelace	Ada Lovelace	Ada Lovelace	Ada Lovelace
Tensor Cores (AI)	4th Gen 330 AI TFLOPS	4th Gen 200 AI TFLOPS	4th Gen 150 AI TFLOPS	4th Gen 100 AI TFLOPS	4th Gen 90 AI TFLOPS	4th Gen 60 AI TFLOPS
Ray Tracing Cores	3rd Gen 191 TFLOPS	3rd Gen 112 TFLOPS	3rd Gen 92 TFLOPS	3rd Gen 64 TFLOPS	3rd Gen 54 TFLOPS	3rd Gen 35 TFLOPS
Boost Clock (GHz)	2.52	2.51	2.61	2.48	2.54	2.42
Base Clock (GHz)	2.23	2.21	2.31	1.92	2.31	1.83
Standard Memory Config	24 GB GDDR6X	16 GB GDDR6X	12 GB GDDR6X	12 GB GDDR6X	8 GB GDDR6	8 GB GDDR6
Memory Interface Width	384-bit	256-bit	192-bit	192-bit	128-bit	128-bit
Graphics Card Power (W)	450W	320W	285W	200W	160W	115W
Recommended PSU (W)	850W	750W	700W	650W	550W	450W
Price	$1,599	$1,199	$799	$599	$399 (8GB) $499 (16GB)	$299

Hardware Applications

AI Glasses

Name	Company	Model	Time	Price
雷鸟V3	雷鸟创新	Qwen	2025.1.7	¥ 1799 +
闪极拍拍镜	闪极科技	Qwen Kimi GLM, etc.	2024.12.19	¥999 +
INMO GO2	影目科技	-	2024.11.29	¥3999
Rokid Glasses	Rokid	Qwen	2024.11.18	¥2499
Looktech	Looktech	ChatGPT Claude Gemini	2024.11.16	$199
Ray-Ban	Meta	Meta AI	2023.9	$299

Reference

Awesome-LLMs-on-device

Awesome-LLM-Inference

数字生命卡兹克- AI硬件大全

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.github/workflows		.github/workflows
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Awesome Edge LLMs

Open Source Edge Models

LLM Inference

Processor

NVIDIA

Hardware Applications

AI Glasses

Reference

About

Uh oh!

Releases

Packages

Lynncc6/Awesome-Edge-LLMs

Folders and files

Latest commit

History

Repository files navigation

🔍 Awesome Edge LLMs

Open Source Edge Models

LLM Inference

Processor

NVIDIA

Hardware Applications

AI Glasses

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages