-
|
I don't understand why the results in my site's search are generaly right, but they're not when I search for Here's the search: https://nicolas-hoizey.photo/search/?q=sphinx It finds this content, which is not relevant: https://nicolas-hoizey.photo/galleries/travels/europe/spain/andalusia/the-arch-from-once-upon-a-time-in-the-west-in-texas-hollywood/ But it doesn't find this one, which should be found: https://nicolas-hoizey.photo/galleries/animals/arthropods/insects/butterflies-and-moths/a-sphinx-moth-in-the-making/ Any advice on how to understand why the results are wrong? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
|
Ah, yes this is an interesting one. At the moment, Pagefind doesn't include metadata in the searchable index — they're two separate systems, essentially. In this case, your title: Is outside your indexed body: This puts it in the metadata, but not in the index. As a result, nothing is in the index with Fix here is to chuck a This is a common pitfall, so the longer term fix for Pagefind is #532 |
Beta Was this translation helpful? Give feedback.
-
|
After some wrangling, I have been able to remove the title from the except. Note that in the following PagefindUI, I have chosen to use the highlightParam. As a result, the code here also does the work of removing the and tags from the excerpt as they have to be removed to enable identifying the duplication. Hopefully, some will find this helpful. Note also that I am only processing subResults here, not the main result. document.addEventListener("DOMContentLoaded", () => {
new PagefindUI({
element: "#search",
translations: {
placeholder: "Enter search terms...",
zero_results: "Count not find [SEARCH_TERM]",
},
excerptLength: 100,
highlightParam: "highlight",
resetStyles: true,
pageSize: 5,
showImages: false,
showEmptyFilters: false,
showSubResults: true,
processResult: function (result) {
if (result.sub_results && Array.isArray(result.sub_results)) {
result.sub_results.forEach((subResult) => {
// --- Remove Title from Excerpt ---
const title = subResult.title;
let excerpt = subResult.excerpt;
// Remove all <mark> and </mark> tags from excerpt for comparison
const cleanExcerpt = excerpt.replace(/<\/?mark>/g, "");
// Check if cleaned excerpt starts with title
if (cleanExcerpt.startsWith(title)) {
// Find the position in the original excerpt where the title ends
let charCount = 0;
let position = 0;
while (charCount < title.length && position < excerpt.length) {
if (excerpt.substring(position).startsWith("<mark>")) {
position += 6; // Skip "<mark>"
} else if (excerpt.substring(position).startsWith("</mark>")) {
position += 7; // Skip "</mark>"
} else {
charCount++;
position++;
}
}
// Remove the title portion and trim
let newExcerpt = excerpt.substring(position).trim();
// Remove any leading <mark> or </mark> tags
newExcerpt = newExcerpt.replace(/^(<\/?mark>)+/, "");
// Remove leading punctuation and whitespace
newExcerpt = newExcerpt.replace(/^[.,;:!?\s]+/, "");
if (newExcerpt.length > 0) {
// Capitalize first letter (skip over any leading <mark> tag)
const markMatch = newExcerpt.match(/^(<mark>)?(.)/);
if (markMatch) {
const prefix = markMatch[1] || "";
const firstChar = markMatch[2].toUpperCase();
newExcerpt =
prefix + firstChar + newExcerpt.slice(prefix.length + 1);
}
subResult.excerpt = newExcerpt;
}
}
});
}
// Return the modified result object
return result;
},
});
}); |
Beta Was this translation helpful? Give feedback.
Ah, yes this is an interesting one.
At the moment, Pagefind doesn't include metadata in the searchable index — they're two separate systems, essentially.
In this case, your title:
Is outside your indexed body:
This puts it in the metadata, but not in the index. As a result, nothing is in the index with
sphinx, and Pagefind regresses your search term all the way back tosto try find some result. (Possibly not the most helpful step, but Pagefind really likes giving some result over nothing).Fix here is to chuck a
data-pagefind-bodyon your h1 as we…