Skip to content

Commit 9564876

Browse files
authored
bump version to v0.12.0 (#4300)
* bump version to v0.12.0 * constraint transformers<5.0.0 * update supported models * minor fix
1 parent ad7d2e1 commit 9564876

File tree

8 files changed

+26
-7
lines changed

8 files changed

+26
-7
lines changed

README.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,13 @@ ______________________________________________________________________
2424
## Latest News 🎉
2525

2626
<details open>
27+
<summary><b>2026</b></summary>
28+
29+
- \[2026/02\] Support [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor) 4bit symmetric/asymmetric quantization. Refer [here](./docs/en/quantization/llm_compressor.md) for detailed guide
30+
31+
</details>
32+
33+
<details close>
2734
<summary><b>2025</b></summary>
2835

2936
- \[2025/09\] TurboMind supports MXFP4 on NVIDIA GPUs starting from V100, achieving 1.5x the performmance of vLLM on H800 for openai gpt-oss models!
@@ -175,6 +182,7 @@ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by
175182
<li>InternVL3.5 (1B-241BA28B)</li>
176183
<li>Intern-S1 (241B)</li>
177184
<li>Intern-S1-mini (8.3B)</li>
185+
<li>Intern-S1-Pro (1TB)</li>
178186
<li>Mono-InternVL (2B)</li>
179187
<li>ChemVLM (8B-26B)</li>
180188
<li>CogVLM-Chat (17B)</li>
@@ -216,7 +224,7 @@ The default prebuilt package is compiled on **CUDA 12** since v0.3.0.
216224
For the GeForce RTX 50 series, please install the LMDeploy prebuilt package complied with **CUDA 12.8**
217225

218226
```shell
219-
export LMDEPLOY_VERSION=0.11.1
227+
export LMDEPLOY_VERSION=0.12.0
220228
export PYTHON_VERSION=310
221229
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128
222230
```

README_zh-CN.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,15 +24,23 @@ ______________________________________________________________________
2424
## 最新进展 🎉
2525

2626
<details open>
27-
<summary><b>2025</b></summary>
27+
<summary><b>2026</b></summary>
28+
29+
- \[2026/02\] 支持 [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor) 4bit 对称和非对称量化。 具体操作指南详见[此处](./docs/zh_cn/quantization/llm_compressor.md)
30+
2831
</details>
2932

33+
<details close>
34+
<summary><b>2025</b></summary>
35+
3036
- 【2025年9月】TurboMind 引擎支持 MXFP4,适用于 NVIDIA V100 及以上 GPU。在 H800 上推理 openai gpt-oss 模型,性能可达 vLLM 的 1.5倍!
3137
- 【2025年6月】深度优化 FP8 MoE 模型推理
3238
- 【2025年6月】集成[DLSlime](https://github.com/DeepLink-org/DLSlime)[Mooncake](https://github.com/kvcache-ai/Mooncake),实现DeepSeek PD分离部署,向两个团队表示诚挚的感谢!
3339
- 【2025年4月】集成deepseek-ai组件FlashMLA、DeepGemm、DeepEP、MicroBatch、eplb等,提升DeepSeek推理性能
3440
- 【2025年1月】新增对DeepSeek V3及R1的支持
3541

42+
</details>
43+
3644
<details close>
3745
<summary><b>2024</b></summary>
3846

@@ -176,6 +184,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
176184
<li>InternVL3.5 (1B-241BA28B)</li>
177185
<li>Intern-S1 (241B)</li>
178186
<li>Intern-S1-mini (8.3B)</li>
187+
<li>Intern-S1-Pro (1TB)</li>
179188
<li>Mono-InternVL (2B)</li>
180189
<li>ChemVLM (8B-26B)</li>
181190
<li>CogVLM-Chat (17B)</li>
@@ -217,7 +226,7 @@ pip install lmdeploy
217226
若使用 GeForce RTX 50 系列显卡,请安装基于 **CUDA 12.8** 编译的 LMDeploy 预编译包。
218227

219228
```shell
220-
export LMDEPLOY_VERSION=0.11.1
229+
export LMDEPLOY_VERSION=0.12.0
221230
export PYTHON_VERSION=310
222231
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128
223232
```

docs/en/get_started/installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ pip install lmdeploy
2323
The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by:
2424

2525
```shell
26-
export LMDEPLOY_VERSION=0.11.1
26+
export LMDEPLOY_VERSION=0.12.0
2727
export PYTHON_VERSION=310
2828
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
2929
```

docs/en/supported_models/supported_models.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
7272
| InternLM3 | 8B | LLM | Yes | Yes | Yes | Yes | Yes |
7373
| Intern-S1 | 241B | MLLM | Yes | Yes | Yes | Yes | - |
7474
| Intern-S1-mini | 8.3B | MLLM | Yes | Yes | Yes | Yes | - |
75+
| Intern-S1-Pro | 1TB | MLLM | Yes | - | - | - | No |
7576
| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No |
7677
| Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No |
7778
| ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No |

docs/zh_cn/get_started/installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ pip install lmdeploy
2323
默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3),你可以使用以下命令安装 lmdeploy:
2424

2525
```shell
26-
export LMDEPLOY_VERSION=0.11.1
26+
export LMDEPLOY_VERSION=0.12.0
2727
export PYTHON_VERSION=310
2828
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
2929
```

docs/zh_cn/supported_models/supported_models.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
| InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes |
2020
| Intern-S1 | 241B | MLLM | Yes | Yes | Yes | No |
2121
| Intern-S1-mini | 8.3B | MLLM | Yes | Yes | Yes | No |
22+
| Intern-S1-Pro | 1TB | MLLM | Yes | - | - | No |
2223
| Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes |
2324
| Qwen1.5<sup>\[1\]</sup> | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes |
2425
| Qwen2<sup>\[2\]</sup> | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |

lmdeploy/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Copyright (c) OpenMMLab. All rights reserved.
22
from typing import Tuple
33

4-
__version__ = '0.11.1'
4+
__version__ = '0.12.0'
55
short_version = __version__
66

77

requirements/runtime_cuda.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ shortuuid
2323
tiktoken
2424
torch<=2.8.0,>=2.0.0
2525
torchvision<=0.23.0,>=0.15.0
26-
transformers
26+
transformers<5.0.0
2727
triton<=3.4.0,>=3.0.0; sys_platform == "linux" and "aarch64" not in platform_machine and "arm" not in platform_machine
2828
uvicorn
2929
xgrammar

0 commit comments

Comments
 (0)