Skip to content

Commit f2abea0

Browse files
Dale Hurleycursoragent
andcommitted
fix: filter stop words and short tokens from skill relevance scoring
Prevents false positive skill matches caused by common English stop words (e.g. "in" matching "guidel-in-es") by adding a STOP_WORDS constant and filtering out words shorter than 3 characters before scoring. Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent d4f8328 commit f2abea0

File tree

1 file changed

+21
-2
lines changed

1 file changed

+21
-2
lines changed

src/Skills/Skill.php

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -213,13 +213,32 @@ public function matchesQuery(string $query): bool
213213
return false;
214214
}
215215

216+
/**
217+
* Common English stop words that should be ignored during relevance scoring.
218+
*
219+
* These short, frequent words (like "in", "a", "the") cause false positives
220+
* because they appear as substrings in unrelated skill names/descriptions
221+
* (e.g. "in" matches "guidel-in-es", "f-in-ancial", "coauthor-in-g").
222+
*/
223+
private const STOP_WORDS = [
224+
'a', 'an', 'the', 'in', 'on', 'at', 'to', 'for', 'of', 'by',
225+
'is', 'it', 'or', 'and', 'but', 'not', 'no', 'so', 'if',
226+
'do', 'my', 'me', 'we', 'be', 'am', 'are', 'was', 'has',
227+
'can', 'will', 'how', 'what', 'who', 'this', 'that', 'with',
228+
];
229+
216230
/**
217231
* Calculate relevance score for a query (0.0 to 1.0).
232+
*
233+
* Words shorter than 3 characters and common stop words are filtered
234+
* out before scoring to prevent false positive matches.
218235
*/
219236
public function relevanceScore(string $query): float
220237
{
221-
$query = strtolower($query);
222-
$words = array_filter(explode(' ', $query));
238+
$query = strtolower(trim($query, '?!., '));
239+
$words = array_values(array_filter(explode(' ', $query), function (string $w): bool {
240+
return strlen($w) >= 3 && !in_array($w, self::STOP_WORDS, true);
241+
}));
223242
$score = 0.0;
224243
$maxScore = count($words) > 0 ? count($words) : 1;
225244

0 commit comments

Comments
 (0)