Skip to content

feat(parse): add AST-based code skeleton extraction mode#334

Merged
MaojiaSheng merged 1 commit intovolcengine:mainfrom
yangxinxin-7:feat/ast
Feb 27, 2026
Merged

feat(parse): add AST-based code skeleton extraction mode#334
MaojiaSheng merged 1 commit intovolcengine:mainfrom
yangxinxin-7:feat/ast

Conversation

@yangxinxin-7
Copy link
Collaborator

新增基于 tree-sitter 的 AST 代码骨架提取,作为 LLM 摘要的轻量替代方案。

核心逻辑

  • 新增 code_summary_mode 配置项("llm" | "ast" | "ast_llm"),默认值改为 "ast"
  • ast 模式:对 ≥100 行的代码文件提取结构骨架(类名、方法签名、注释),跳过 LLM 调用
  • ast_llm 模式:骨架提取后仍走 LLM,将骨架作为上下文辅助摘要
  • AST 不支持的语言或提取失败时,自动 fallback 到 LLM,并记录 log

支持语言

专属 extractor:Python、JavaScript/TypeScript、Rust、Go、Java、C/C++,其余语言直接走 LLM。

文件结构

openviking/parse/parsers/code/ast/
├── extractor.py # 语言检测 + 分发
├── skeleton.py # CodeSkeleton / FunctionSig / ClassSkeleton 数据结构
└── languages/ # 各语言专属 extractor(基于 tree-sitter)

变更文件

  • semantic_processor.py — 接入 AST 模式处理链路
  • parser_config.py — 新增 code_summary_mode 配置
  • examples/ov.conf.example — 同步示例配置
  • tests/parse/test_ast_extractor.py — 各语言提取测试

@MaojiaSheng MaojiaSheng merged commit fac6068 into volcengine:main Feb 27, 2026
21 of 22 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants