Skip to content

fix: use frontmatter description for skill vectorization instead of overview#229

Merged
MaojiaSheng merged 1 commit intovolcengine:mainfrom
ZaynJarvis:fix/skill-vectorize-use-abstract
Feb 20, 2026
Merged

fix: use frontmatter description for skill vectorization instead of overview#229
MaojiaSheng merged 1 commit intovolcengine:mainfrom
ZaynJarvis:fix/skill-vectorize-use-abstract

Conversation

@ZaynJarvis
Copy link
Collaborator

Context

PR #228 fixed two issues with skill search ranking:

  1. Retriever visited set bug — correctly fixed, most relevant results were being dropped
  2. Skill embedding text — changed from frontmatter description to LLM-generated overview

This PR reverts change (2) while keeping (1). Skills should embed using the frontmatter description (abstract), not the overview.

Why revert to abstract/frontmatter

Skills are not resources. The two have fundamentally different retrieval patterns:

  • Resources are document collections where users search for content within files. Using overview/content for embedding makes sense — users query with natural language about what is inside the documents.

  • Skills are tools selected by matching a short description. In system prompts, agents see only name + description to decide which skill to activate. The embedding should match this same selection surface — the frontmatter description — so vector search aligns with how skills are actually discovered and used.

Using overview introduces unnecessary indirection:

  • The overview is an LLM-generated summary that may emphasize different aspects than what users/agents query for
  • The frontmatter description is human-authored and intentionally crafted for skill discovery (includes trigger keywords, use cases)
  • Embedding the description directly means search results reflect the same text the skill author optimized for matching

Recall testing confirms abstract works well:

Query #1 Result Correct? Score
"adding memory" adding-memory 0.472
"search context" searching-context 0.551
"RAG semantic search" openviking 0.646
"remember this" adding-memory 0.317
"add file to knowledge base" adding-resource 0.448

5/5 semantic queries rank correctly with abstract-only embedding. The retriever fix from #228 was the real improvement — it ensures all skills appear in results regardless of embedding text quality.

Changes

  • skill_processor.py: Revert vectorization text from overview back to context.abstract (frontmatter description)

…verview

Reverts the skill_processor embedding change from volcengine#228 while keeping
the retriever fix. Skills should embed using the frontmatter description
(abstract), not the LLM-generated overview.
@MaojiaSheng MaojiaSheng merged commit 5d70786 into volcengine:main Feb 20, 2026
5 checks passed
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants