fix: use frontmatter description for skill vectorization instead of overview by ZaynJarvis · Pull Request #229 · volcengine/OpenViking

ZaynJarvis · 2026-02-20T05:31:39Z

Context

PR #228 fixed two issues with skill search ranking:

Retriever visited set bug — correctly fixed, most relevant results were being dropped
Skill embedding text — changed from frontmatter description to LLM-generated overview

This PR reverts change (2) while keeping (1). Skills should embed using the frontmatter description (abstract), not the overview.

Why revert to abstract/frontmatter

Skills are not resources. The two have fundamentally different retrieval patterns:

Resources are document collections where users search for content within files. Using overview/content for embedding makes sense — users query with natural language about what is inside the documents.
Skills are tools selected by matching a short description. In system prompts, agents see only name + description to decide which skill to activate. The embedding should match this same selection surface — the frontmatter description — so vector search aligns with how skills are actually discovered and used.

Using overview introduces unnecessary indirection:

The overview is an LLM-generated summary that may emphasize different aspects than what users/agents query for
The frontmatter description is human-authored and intentionally crafted for skill discovery (includes trigger keywords, use cases)
Embedding the description directly means search results reflect the same text the skill author optimized for matching

Recall testing confirms abstract works well:

Query	#1 Result	Correct?	Score
"adding memory"	adding-memory	✓	0.472
"search context"	searching-context	✓	0.551
"RAG semantic search"	openviking	✓	0.646
"remember this"	adding-memory	✓	0.317
"add file to knowledge base"	adding-resource	✓	0.448

5/5 semantic queries rank correctly with abstract-only embedding. The retriever fix from #228 was the real improvement — it ensures all skills appear in results regardless of embedding text quality.

Changes

skill_processor.py: Revert vectorization text from overview back to context.abstract (frontmatter description)

…verview Reverts the skill_processor embedding change from volcengine#228 while keeping the retriever fix. Skills should embed using the frontmatter description (abstract), not the LLM-generated overview.

fix: use frontmatter description for skill vectorization instead of o…

8bafab2

…verview Reverts the skill_processor embedding change from volcengine#228 while keeping the retriever fix. Skills should embed using the frontmatter description (abstract), not the LLM-generated overview.

github-project-automation bot added this to OpenViking project Feb 20, 2026

github-project-automation bot moved this to Backlog in OpenViking project Feb 20, 2026

MaojiaSheng approved these changes Feb 20, 2026

View reviewed changes

MaojiaSheng merged commit 5d70786 into volcengine:main Feb 20, 2026
5 checks passed

github-project-automation bot moved this from Backlog to Done in OpenViking project Feb 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use frontmatter description for skill vectorization instead of overview#229

fix: use frontmatter description for skill vectorization instead of overview#229
MaojiaSheng merged 1 commit intovolcengine:mainfrom
ZaynJarvis:fix/skill-vectorize-use-abstract

ZaynJarvis commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ZaynJarvis commented Feb 20, 2026

Context

Why revert to abstract/frontmatter

Changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants