-
-
Notifications
You must be signed in to change notification settings - Fork 193
chore: add SEO stragegy cookbook #894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
📝 WalkthroughWalkthroughA new file, SEO-STRATEGY.md, was added describing a technical SEO strategy for npmx.dev. It specifies serving an SSR-hosted npm registry mirror for organic crawling, returning real HTTP 404 responses, and using robots.txt to block high-cost paths. It documents i18n choices (default English, no language URL prefixes), a single canonical URL per package, meta-tag rules for noindex/nofollow, internal linking and dynamic SEO metadata handling, and a decision not to generate a sitemap due to scale. 🚥 Pre-merge checks | ❌ 1❌ Failed checks (1 warning)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
SEO-STRATEGY.md (1)
23-25: Soften the “immediately discard” claim for 404s.Search engines can take time to drop URLs or treat them as soft 404s. Consider wording like “eventually de‑indexes” and optionally mention 410 for permanently removed packages.
| Based on the `robots.txt` strategy, the following Vue pages should explicitly include the `<meta name="robots" content="noindex, nofollow">` tag via `useSeoMeta`. This acts as a second layer of defense against indexing low-value content. | ||
|
|
||
| - **`app/pages/search.vue`**: Internal search results. | ||
| - **`app/pages/settings.vue`**: User preferences. | ||
| - **`app/pages/compare.vue`**: Dynamic comparison tool. | ||
| - **`app/pages/package-code/[...path].vue`**: Source code explorer. | ||
| - **`app/pages/package-docs/[...path].vue`**: Generated documentation (consistent with robots.txt block). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Robots.txt disallow prevents crawlers from seeing the noindex meta.
If a path is disallowed, most bots won’t fetch the page, so the meta tag is not an effective “second layer.” Either allow crawling and use noindex,follow, or keep the disallow and drop the meta requirement for those paths.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~87-~87: Loose punctuation mark.
Context: ...e content. - app/pages/search.vue: Internal search results. - **`app/pages...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~88-~88: Loose punctuation mark.
Context: ... results. - app/pages/settings.vue: User preferences. - **`app/pages/compar...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~89-~89: Loose punctuation mark.
Context: ...eferences. - app/pages/compare.vue: Dynamic comparison tool. - **`app/pages...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~90-~90: Loose punctuation mark.
Context: ...app/pages/package-code/[...path].vue**: Source code explorer. - **`app/pages/pa...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~91-~91: Loose punctuation mark.
Context: ...app/pages/package-docs/[...path].vue**: Generated documentation (consistent wit...
(UNLIKELY_OPENING_PUNCTUATION)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, technically if it's blocked, the metadata isn't read. But we keep the Disallow directive to prioritize the crawl budget. The noindex directive is proposed as a defensive security measure in case crawling is accidentally allowed or for bots that don't strictly adhere to robots.txt but do respect the metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@userquin, that makes perfect sense—thank you for the clarification! You're absolutely right that this "defense in depth" approach is a solid SEO best practice. The Disallow directive optimizes your crawl budget for well-behaved bots, while the noindex meta tags provide a safety net for edge cases like accidental configuration changes or non-compliant crawlers. This dual-layer strategy is prudent for production environments.
✏️ Learnings added
Learnt from: userquin
Repo: npmx-dev/npmx.dev PR: 894
File: SEO-STRATEGY.md:85-91
Timestamp: 2026-02-03T22:21:10.658Z
Learning: For SEO strategy, using both `robots.txt` Disallow directives and `noindex` meta tags together is a valid "defense in depth" approach. The robots.txt optimizes crawl budget for compliant bots, while meta tags serve as a fallback for bots that ignore robots.txt or in case of accidental configuration changes.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
| We do not use a massive `sitemap.xml`. We rely on natural link discovery by bots (Googlebot, Bingbot, etc.): | ||
|
|
||
| 1. **Entry Point:** The Home page (`/`) links to popular packages. | ||
| 2. **Expansion:** Each package page links to its **Dependencies**, **DevDependencies**, and **PeerDependencies**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this step.
That makes it possible that even super popular packages may be missed out completely, provided that no-one has created a package that depends on them, or that this problem exists further down the line.
In other words, we would only index stuff that - roughly - would get installed if you'd run pnpm install nuxt vue nitro react svelte vite next astro typescript angular - plus their devDependencies - which to me, intuitively, sounds like a tiny fraction of useful packages out there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
infinite recursion since the bot will follow that links (peer, dev deps and deps)
| ```txt | ||
| User-agent: * | ||
| Allow: / | ||
|
|
||
| # Block internal search results (duplicate/infinite content) | ||
| Disallow: /search | ||
|
|
||
| # Block user utilities and settings | ||
| Disallow: /settings | ||
| Disallow: /compare | ||
| Disallow: /auth/ | ||
|
|
||
| # Block code explorer and docs (high crawl cost, low SEO value for general search) | ||
| Disallow: /package-code/ | ||
| Disallow: /package-docs/ | ||
|
|
||
| # Block internal API endpoints | ||
| Disallow: /api/ | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
npmjs also blocks old versions from being indexed:
https://www.npmjs.com/robots.txt
I think it makes a lot of sense.
|
|
||
| - Search traffic is predominantly in English (package names, technical terms). | ||
| - We avoid the complexity of managing `hreflang` and duplicate content across 20+ languages. | ||
| - User Experience (UX) remains localized: users land on the page (indexed in English), and the client hydrates the app in their preferred language. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...and a vast majority of READMEs are in English anyway, and they take up a significant amount of npmx's displayed content.
| ### Canonical URLs & i18n | ||
|
|
||
| - **Canonical Rule:** The canonical URL is **always the English (default) URL**, regardless of the user's selected language or browser settings. | ||
| - Example: `https://npmx.dev/package/react` | ||
| - **Reasoning:** Since we do not use URL prefixes for languages (e.g., `/es/...`), there is technically only _one_ URL per resource. The language change happens client-side. Therefore, the canonical tag must point to this single, authoritative URL to prevent confusion for search engines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the i18n mechanics we have, is this section even relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just add a draft version, we should discuss the document at discord: https://discord.com/channels/1464542801676206113/1468368119528685620
Co-authored-by: Wojciech Maj <kontakt@wojtekmaj.pl>
No description provided.