修复方法_approx_token_len逻辑，排除中文字符的重复统计 by zzhRooT1998 · Pull Request #327 · datawhalechina/hello-agents

zzhRooT1998 · 2026-02-01T09:52:55Z

Refactor token length estimation for mixed Chinese and English text. Improve handling of CJK characters and non-CJK tokens.

修复方法_approx_token_len逻辑，排除中文字符的重复统计

a17e554

Refactor token length estimation for mixed Chinese and English text. Improve handling of CJK characters and non-CJK tokens.

zzhRooT1998 mentioned this pull request Feb 1, 2026

[Issue] 章节8.3.4：简短问题描述 / Chapter8.3.4:方法_approx_token_len逻辑有误 #325

Open

3 tasks

Edwina-coco approved these changes Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

修复方法_approx_token_len逻辑，排除中文字符的重复统计#327

修复方法_approx_token_len逻辑，排除中文字符的重复统计#327
zzhRooT1998 wants to merge 1 commit intodatawhalechina:mainfrom
zzhRooT1998:fix/approx_token_len

zzhRooT1998 commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zzhRooT1998 commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants