A comprehensive survey on Edge AI,covering hardware, software, frameworks, applications, performance optimization, and the deployment of LLMs on edge devices.
The listed models are base model limited to either of the following:
- Parameter ≤ 10B
- Officially claimed edge models
| Model | Size | Org | Time | Download | Paper |
|---|---|---|---|---|---|
| SmalLM3 | 3B | Hugging Face | 2025.7.9 | 🤗 | 📖 |
| MiniCPM4 | 8B | OpenBMB | 2025.6.6 | 🤗 | |
| Qwen2.5-Omni | 7B | Qwen | 2025.3.26 | 🤗 | |
| MiniCPM-o 2.6 | 8B | OpenBMB | 2025.1.14 | 🤗 | - |
| Phi-4 | 14B | Microsoft | 2025.1.9 2024.12.12(release) |
🤗 | |
| VITA-1.5 | 7B | VITA | 2025.1.6 | - | |
| Megrez-3B-Omni | 3B | Infinigence | 2024.12.16 | 🤗 | - |
| OmniAudio | 2.6B | Nexa AI | 2024.12.12 | 🤗 | 📖 |
| InternVL 2.5 | 8B | OpenGVLab | 2024.12.5 | 🤗 | - |
| GLM-Edge | 1.5B 2B 4B 5B | THUDM | 2024.11.29 | 🤗 | - |
| SmalVLM | 2B | Hugging Face | 2024.11.26 | 🤗 | 📖 |
| SmalLM2 | 135M 360M 1.7B | Hugging Face | 2024.11.1 | 🤗 | 📖 |
| Ministral | 3B 8B | Mistral AI | 2024.10.16 | 🤗 | 📖 |
| Qwen2.5 | 0.5B, 1.5B, 3B, 7B | Qwen | 2024.9.19 | 🤗 | 📖 |
| Pixtral 12B | 12B | Mistral AI | 2024.9.17 | 🤗 | 📖 |
| Qwen2-VL | 2B 7B | Qwen | 2024.8.30 | 🤗 | 📖 |
| Phi 3.5 | 3.8B 4.1B | Microsoft | 2024.8.21 | 🤗 | - |
| MiniCPM-V 2.6 | 8B | OpenBMB | 2024.8.6 | 🤗 | - |
| SmolLM | 135M 360M 1.7B | Hugging Face | 2024.8.2 | 🤗 | 📖 |
| Gemma2 | 2B 9B | 2024.7.31 | 🤗 | 📖 | |
| DCLM 7B | 7B | Apple | 2024.7.18 | 🤗 | |
| Phi-3 | 3.8B 7B | Microsoft | 2024.4.23 | 🤗 | |
| Mistral NeMo | 12B | Mistral AI | 2024.6.18 | 🤗 | 📖 |
| Gemma | 2B 7B | 2024.2.21 | 🤗 | 📖 | |
| Mistral 7B | 2B 7B | Mistral AI | 2023.9.27 | 🤗 | 📖 |
Embodied Model
| Title | Date | Org | Paper |
|---|---|---|---|
| DashInfer-VLM | 2025.1 | ModelScope | 📖 |
| SparseInfer | 2024.11 | University of Seoul, etc | |
| Mooncake | 2024.6 | Moonshot AI | 📖 |
| flashinfer | 2024.2 | flashinfer-ai | 📖 |
| inferflow | 2024.2 | Tencent AI Lab | |
| PowerInfer | 2023.12 | SJTU | |
| PETALS | 2023.12 | HSE University, etc | |
| TensorRT-LLM | 2023.10 | NVIDIA | - |
| LightSeq | 2023.10 | UC Berkeley, etc | |
| vLLM | 2023.9 | UC Berkeley, etc | |
| StreamingLLM | 2023.9 | Meta AI, etc | |
| MLC-LLM | 2023.5 | mlc-ai | 📖 |
| Medusa | 2023.9 | Tianle Cai, etc | 📖 |
| LightLLM | 2023.8 | ModelTC | - |
| FastServe | 2023.5 | Peking University | |
| SpecInfer | 2023.05 | Peking University, etc | |
| Ollama | 2023.8 | Ollama Inc | - |
| LMDeploy | 2023.6 | InternLM | 📖 |
| Megatron-LM | 2020.5 | NVIDIA |
✅ 50 Series @2025
| GeForce RTX 5090 | GeForce RTX 5080 | GeForce RTX 5070 Ti | GeForce RTX 5070 | |
|---|---|---|---|---|
| NVIDIA CUDA Cores | 21760 | 10752 | 8960 | 6144 |
| Shader Cores | Blackwell | Blackwell | Blackwell | Blackwell |
| Tensor Cores (AI) | 5th Generation 3352 AI TOPS |
5th Generation 1801 AI TOPS |
5th Generation 1406 AI TOPS |
5th Generation 988 AI TOPS |
| Ray Tracing Cores | 4th Generation 318 TFLOPS |
4th Generation 171 TFLOPS |
4th Generation 133 TFLOPS |
4th Generation 94 TFLOPS |
| Boost Clock (GHz) | 2.41 | 2.62 | 2.45 | 2.51 |
| Base Clock (GHz) | 2.01 | 2.30 | 2.30 | 2.16 |
| Standard Memory Config | 32 GB GDDR7 | 16 GB GDDR7 | 16 GB GDDR7 | 12 GB GDDR7 |
| Memory Interface Width | 512-bit | 256-bit | 256-bit | 192-bit |
| Price | $1999 | $999 | $749 | $549 |
✅ 40 Super Series @2024
| GPU Specs | GeForce RTX 4080 Super | GeForce RTX 4070 Ti Super | GeForce RTX 4070 Super |
|---|---|---|---|
| CUDA Cores | 10,240 | 8448 | 7168 |
| Memory Configuration | 16 GB GDDR6X | 16 GB GDDR6X | 12 GB GDDR6X |
| Memory Interface Width | 256-bit | 256-bit | 256-bit |
| Memory Bandwidth | 736 GB/s | 736 GB/s | 736 GB/s |
| Base Clock (GHz) | 2.21 GHz | 2.31 GHz | 1.92 GHz |
| Boost Clock (GHz) | 2.55 GHz | 2.61 GHz | 2.48 GHz |
| Graphics Card Power | 320W | 285W | 200W |
| Recommended PSU | 750W | 700W | 650W |
| Price | $999 | $799 | $599 |
✅ 40 Series @2022
| GPU Specs | GeForce RTX 4090 | GeForce RTX 4080 | GeForce RTX 4070 Ti | GeForce RTX 4070 | GeForce RTX 4060 Ti | GeForce RTX 4060 |
|---|---|---|---|---|---|---|
| NVIDIA CUDA Cores | 16384 | 9728 | 7680 | 5888 | 4352 | 3072 |
| Shader Cores | Ada Lovelace | Ada Lovelace | Ada Lovelace | Ada Lovelace | Ada Lovelace | Ada Lovelace |
| Tensor Cores (AI) | 4th Gen 330 AI TFLOPS |
4th Gen 200 AI TFLOPS |
4th Gen 150 AI TFLOPS |
4th Gen 100 AI TFLOPS |
4th Gen 90 AI TFLOPS |
4th Gen 60 AI TFLOPS |
| Ray Tracing Cores | 3rd Gen 191 TFLOPS |
3rd Gen 112 TFLOPS |
3rd Gen 92 TFLOPS |
3rd Gen 64 TFLOPS |
3rd Gen 54 TFLOPS |
3rd Gen 35 TFLOPS |
| Boost Clock (GHz) | 2.52 | 2.51 | 2.61 | 2.48 | 2.54 | 2.42 |
| Base Clock (GHz) | 2.23 | 2.21 | 2.31 | 1.92 | 2.31 | 1.83 |
| Standard Memory Config | 24 GB GDDR6X | 16 GB GDDR6X | 12 GB GDDR6X | 12 GB GDDR6X | 8 GB GDDR6 | 8 GB GDDR6 |
| Memory Interface Width | 384-bit | 256-bit | 192-bit | 192-bit | 128-bit | 128-bit |
| Graphics Card Power (W) | 450W | 320W | 285W | 200W | 160W | 115W |
| Recommended PSU (W) | 850W | 750W | 700W | 650W | 550W | 450W |
| Price | $1,599 | $1,199 | $799 | $599 | $399 (8GB) $499 (16GB) |
$299 |
| Name | Company | Model | Time | Price |
|---|---|---|---|---|
| 雷鸟V3 | 雷鸟创新 | Qwen | 2025.1.7 | ¥ 1799 + |
| 闪极拍拍镜 | 闪极科技 | Qwen Kimi GLM, etc. | 2024.12.19 | ¥999 + |
| INMO GO2 | 影目科技 | - | 2024.11.29 | ¥3999 |
| Rokid Glasses | Rokid | Qwen | 2024.11.18 | ¥2499 |
| Looktech | Looktech | ChatGPT Claude Gemini | 2024.11.16 | $199 |
| Ray-Ban | Meta | Meta AI | 2023.9 | $299 |