feat: 添加mineru_loader服务,允许使用mineru的API服务作为PDF文档解析器,并更新相关配置#1549
Open
pzc163 wants to merge 4 commits intodataelement:mainfrom
Open
feat: 添加mineru_loader服务,允许使用mineru的API服务作为PDF文档解析器,并更新相关配置#1549pzc163 wants to merge 4 commits intodataelement:mainfrom
pzc163 wants to merge 4 commits intodataelement:mainfrom
Conversation
pzc163
commented
Aug 15, 2025
- 新增mineru_loader.py服务文件
- 更新knowledge_imp.py服务
- 更新nginx配置
- 更新initdb配置
- 新增mineru_loader.py服务文件 - 更新knowledge_imp.py服务 - 更新nginx配置 - 更新initdb配置
- 修改mineru解析器逻辑,只支持PDF、PNG、JPG、JPEG格式 - 其他格式(doc、docx、ppt、pptx)继续使用原有解析方案 - 在filetype_load_map中添加图片格式支持 - 保持代码向后兼容性
- 新增MinerUTextSplitter类,专门处理MinerU解析结果 - 智能识别并保持Markdown结构:标题、表格、公式、代码块 - 限制MinerU解析器仅支持PDF和图片格式 - 优化chunk切分逻辑,确保标题与内容在同一chunk中 - 添加详细的调试日志,便于问题排查
- 统一所有Office文档类型(doc/docx/html/mhtml/ppt/pptx)使用相同切分逻辑 - 修复分隔符转义问题:将'\n\n'和'\n'转换为真实换行符 - 使用RecursiveCharacterTextSplitter支持多分隔符切分 - 保持使用传入的separator参数,不强制覆盖 - 清理调试日志,保持代码简洁 - 修复texts变量未定义问题 - 添加参数验证和默认值保护
|
这么好的功能怎么没有合并呢 |
Collaborator
|
2.3 版本合进来哈 |
|
这个分支合并进2.3版本了吗? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.