Hanzi Chaizi | 汉字拆字

Chinese character decomposition library for NLP and deep learning applications.

Decompose Chinese characters into basic structural components. Characters with similar structures yield similar decomposition results, making this useful as a glyph feature for deep learning models.

汉字拆字：将汉字拆解为基础部件。字形相似的字会得到相似的拆解结果，可用于深度学习模型的字形特征提取。

Features | 特性

Covers 20,000+ Chinese characters | 覆盖 20,000+ 汉字
Zero third-party dependencies | 零第三方依赖
Python 3.10+ support | 支持 Python 3.10+

Installation | 安装

pip install hanzi_chaizi

Usage | 使用

from hanzi_chaizi import HanziChaizi

hc = HanziChaizi()
result = hc.query('名')

print(result)

Output | 输出:

['夕', '口']

Notes | 说明

Some characters (e.g. 农, 表, 衣, 囊) contain \uf7ee in their decomposition results. This is a Unicode Private Use Area character representing the bottom part of 衣 (the downward strokes), which has no standard Unicode codepoint.

部分汉字（如农、表、衣、囊）的拆解结果中包含 \uf7ee。这是一个 Unicode 私有区域字符，用于表示"衣"的下半部分（撇捺结构），该部件在标准 Unicode 中没有独立编码。

Some characters cannot be decomposed. See non_decomposable.txt for the list.

部分汉字无法被拆解，详见 non_decomposable.txt。

Development | 开发

See develop.md for development guide. | 参见 develop.md 开发指南。

Credits | 致谢

Data from 漢語拆字字典 (CC BY 3.0) | 数据来自漢語拆字字典 (CC BY 3.0)

Citation | 引用

@misc{kong2018hanzichaizi,
  title={Hanzi Chaizi},
  author={Xiaoquan Kong},
  howpublished={https://github.com/howl-anderson/hanzi_chaizi},
  year={2018}
}

If the package is cited in books, seminars, and academic research papers, or used in company products, you are welcome (but not required) to email me about this. I'm glad to see the package being used and valuable to everyone.

如果本项目被书籍、学术论文引用，或被公司产品使用，欢迎（但不强求）告知我。很高兴看到这个项目对大家有价值。

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
hanzi_chaizi		hanzi_chaizi
raw_data		raw_data
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
develop.md		develop.md
makefile		makefile
non_decomposable.txt		non_decomposable.txt
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hanzi Chaizi | 汉字拆字

Features | 特性

Installation | 安装

Usage | 使用

Notes | 说明

Development | 开发

Credits | 致谢

Citation | 引用

About

Uh oh!

Uh oh!

Languages

License

howl-anderson/hanzi_chaizi

Folders and files

Latest commit

History

Repository files navigation

Hanzi Chaizi | 汉字拆字

Features | 特性

Installation | 安装

Usage | 使用

Notes | 说明

Development | 开发

Credits | 致谢

Citation | 引用

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages