-
Notifications
You must be signed in to change notification settings - Fork 5
feat: add sitemap.xml and robots.txt for SEO #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add dynamic sitemap generation from nostrdb cache to improve search engine discoverability of Nostr content. New routes: - GET /robots.txt - crawler directives with sitemap reference - GET /sitemap.xml - dynamic sitemap from cached notes/profiles/articles The sitemap queries local nostrdb for: - Notes (kind:1) → note1xxx URLs - Long-form articles (kind:30023) → naddr1xxx URLs - Profiles (kind:0) → npub1xxx URLs Ref: damus-io#26 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Track aggregate stats (privacy-preserving, no user tracking): - sitemap_generations_total: counter for generation requests - sitemap_generation_duration_seconds: time to generate - sitemap_urls_total: total URLs in sitemap - sitemap_notes_count: notes included - sitemap_articles_count: articles included - sitemap_profiles_count: profiles included Metrics available at /metrics endpoint. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Skip kind:30023 entries with missing/empty d-tag to avoid ambiguous URLs and potential collisions across authors - Add since filter (90 days) to notes and articles queries to prioritize recent content for SEO freshness - Log warning when NOTECRUMBS_BASE_URL is not set, to surface potential misconfiguration in production 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Cache base URL with OnceLock to avoid logging warning on every request - Use separate lookback periods: 90 days for notes, 365 days for evergreen article content (kind:30023) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use early returns and let-else patterns to reduce nesting depth in generate_sitemap loops. Improves readability by making the happy path linear instead of deeply indented. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
@coderabbitai review please |
|
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughIntroduces sitemap and robots.txt generation functionality by adding a new sitemap module and two HTTP routes. The sitemap module queries the database for notes, articles, and profiles to generate an XML sitemap and robots.txt file. Routes serve these static files with appropriate caching headers and error handling. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/sitemap.rs`:
- Around line 269-276: Update the metrics calls to use the metrics 0.21.0 API:
replace the incorrect usage of counter!("sitemap_generations_total", 1) with
either the increment_counter! convenience macro or call
counter!("sitemap_generations_total").increment(1), and replace
gauge!("sitemap_generation_duration_seconds", duration.as_secs_f64()) and the
other gauge calls for "sitemap_urls_total", "sitemap_notes_count",
"sitemap_articles_count", and "sitemap_profiles_count" with
gauge("...").set(value) (or use a suitable gauge-set helper) so the macros
return handles before invoking .increment(...) or .set(...); use the existing
local variables start, duration (computed from start.elapsed()), entries.len(),
notes_count, articles_count, and profiles_count as the values passed to the
handle methods.
🧹 Nitpick comments (2)
src/sitemap.rs (2)
71-79: Redundant datetime conversion.Lines 75-79 create a
datetimefromtimestamp, then immediately extract the same value back assecs_since_epoch. This is unnecessary; you can usetimestampdirectly.♻️ Suggested simplification
fn format_lastmod(timestamp: u64) -> String { - use std::time::{Duration, UNIX_EPOCH}; - - let datetime = UNIX_EPOCH + Duration::from_secs(timestamp); - let secs_since_epoch = datetime - .duration_since(UNIX_EPOCH) - .unwrap_or_default() - .as_secs(); - // Simple date formatting without external dependencies - let days_since_epoch = secs_since_epoch / 86400; + let days_since_epoch = timestamp / 86400; let mut year = 1970i32; let mut remaining_days = days_since_epoch as i32;
160-160: Consider logging query failures for observability.Using
unwrap_or_default()provides resilience, but query failures are silently ignored. Consider logging a warning when queries fail to help diagnose issues in production. The same applies to lines 187 and 235.♻️ Optional: Log query errors
- let results = ndb.query(&txn, &[notes_filter], MAX_SITEMAP_URLS as i32).unwrap_or_default(); + let results = ndb + .query(&txn, &[notes_filter], MAX_SITEMAP_URLS as i32) + .unwrap_or_else(|e| { + tracing::warn!("Failed to query notes for sitemap: {e}"); + vec![] + });
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/main.rssrc/sitemap.rs
🧰 Additional context used
🧬 Code graph analysis (1)
src/main.rs (1)
src/sitemap.rs (2)
generate_robots_txt(282-294)generate_sitemap(129-279)
🔇 Additional comments (10)
src/sitemap.rs (7)
1-22: LGTM!Constants are well-documented and reasonable. Using
OnceLockfor lazy initialization of the base URL is appropriate for avoiding repeated environment variable lookups.
24-44: LGTM!The lazy initialization pattern with a one-time warning for missing configuration is a good approach. The normalization correctly handles trailing slashes.
46-53: LGTM!Good defensive coding with
unwrap_or_default()andsaturating_sub()to handle edge cases gracefully.
55-69: LGTM!Correct XML escaping implementation covering all required special characters with appropriate capacity pre-allocation.
80-118: LGTM!The date calculation logic is correct, including proper leap year handling with the standard formula.
281-294: LGTM!The robots.txt output is well-structured with appropriate directives. The wildcard pattern
/*.jsonis supported by major search engines (Google, Bing).
296-356: LGTM!Good test coverage for the utility functions with meaningful test cases including edge cases like leap years.
src/main.rs (3)
36-36: LGTM!Module declaration is correctly placed.
143-150: LGTM!The route follows the established pattern for static content routes. The 24-hour cache duration is appropriate for robots.txt.
151-167: LGTM!Good error handling pattern with logging before returning 500. The 1-hour cache is appropriate for dynamic content, and the XML content type is correct.
Note:
generate_sitemapperforms synchronous database queries, which is consistent with how other routes in this file usendb. If sitemap generation becomes slow with larger databases, consider offloading to a blocking task withtokio::task::spawn_blocking.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| // Record metrics (aggregate stats, not user-tracking) | ||
| let duration = start.elapsed(); | ||
| metrics::counter!("sitemap_generations_total", 1); | ||
| metrics::gauge!("sitemap_generation_duration_seconds", duration.as_secs_f64()); | ||
| metrics::gauge!("sitemap_urls_total", entries.len() as f64); | ||
| metrics::gauge!("sitemap_notes_count", notes_count as f64); | ||
| metrics::gauge!("sitemap_articles_count", articles_count as f64); | ||
| metrics::gauge!("sitemap_profiles_count", profiles_count as f64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Check for metrics crate version in Cargo.toml
find . -name "Cargo.toml" -type f | head -5 | xargs grep -l "metrics" 2>/dev/nullRepository: damus-io/notecrumbs
Length of output: 75
🏁 Script executed:
# Also verify the actual code at the specified lines
if [ -f "src/sitemap.rs" ]; then
sed -n '269,276p' src/sitemap.rs
fiRepository: damus-io/notecrumbs
Length of output: 565
🌐 Web query:
metrics crate Rust 0.21.0 counter macro syntax increment
💡 Result:
Use the counter! macro to get a Counter handle, then call increment(n); or use the convenience macro increment_counter! to add 1 (or pass labels). Examples:
-
Get handle and increment by n:
let c = counter!("my_metric");
c.increment(5);
[1] -
Increment by 1 directly:
increment_counter!("my_metric");
// or with labels:
increment_counter!("my_metric", "service" => "http");
[2]
Docs (macro reference and examples) show the counter! and increment_counter! usage. [1] [2]
Sources:
[1] docs.rs: metrics::counter! macro.
[2] docs.rs / metrics macro increment_counter! (examples).
🏁 Script executed:
# Read the Cargo.toml to find the metrics version
cat Cargo.toml | grep -A 2 "^\[dependencies\]" | head -20Repository: damus-io/notecrumbs
Length of output: 147
🏁 Script executed:
# Or search more directly for metrics
grep "metrics" Cargo.tomlRepository: damus-io/notecrumbs
Length of output: 116
🌐 Web query:
metrics crate Rust gauge macro syntax 0.21
💡 Result:
The gauge! macro (as of metrics 0.21) accepts these forms:
- gauge!(
$name:expr $ (,$label_key:expr $ (=>$label_value:expr)? )* $ (,)? ) - gauge!( level: $level:expr,
$name:expr $ (,$label_key:expr $ (=>$label_value:expr)? )* $ (,)? ) - gauge!( target: $target:expr,
$name:expr $ (,$label_key:expr $ (=>$label_value:expr)? )* $ (,)? ) - gauge!( target: $target:expr, level: $level:expr,
$name:expr $ (,$label_key:expr $ (=>$label_value:expr)? )* $ (,)? )
Examples:
- let g = gauge!("my_gauge");
- let g = gauge!("my_gauge", "service" => "http");
- let g = gauge!(level: Level::DEBUG, "my_gauge");
- let g = gauge!(format!("{}{}", prefix, "g"), &labels);
(You get a Gauge handle which supports .increment(), .decrement(), .set().) (docs.rs)
If you want the docs page for that macro/version, see the metrics macro docs and the 0.21.0 changelog. (docs.rs)
Citations:
- 1: https://docs.rs/metrics/latest/metrics/macro.gauge.html?utm_source=openai
- 2: https://docs.rs/metrics/latest/metrics/macro.gauge.html?utm_source=openai
Fix metrics macro syntax for version 0.21.0.
The counter!() and gauge!() macros in metrics 0.21.0 return handles that require method calls. The current syntax is incorrect:
counter!("name", value)should becounter!("name").increment(value)(or useincrement_counter!("name")convenience macro)gauge!("name", value)should begauge!("name").set(value)
All metrics calls in lines 269-276 need to be updated to use the correct API.
🤖 Prompt for AI Agents
In `@src/sitemap.rs` around lines 269 - 276, Update the metrics calls to use the
metrics 0.21.0 API: replace the incorrect usage of
counter!("sitemap_generations_total", 1) with either the increment_counter!
convenience macro or call counter!("sitemap_generations_total").increment(1),
and replace gauge!("sitemap_generation_duration_seconds",
duration.as_secs_f64()) and the other gauge calls for "sitemap_urls_total",
"sitemap_notes_count", "sitemap_articles_count", and "sitemap_profiles_count"
with gauge("...").set(value) (or use a suitable gauge-set helper) so the macros
return handles before invoking .increment(...) or .set(...); use the existing
local variables start, duration (computed from start.elapsed()), entries.len(),
notes_count, articles_count, and profiles_count as the values passed to the
handle methods.
Summary
Adds dynamic sitemap generation and robots.txt to improve search engine discoverability of Nostr content.
Closes #26
New Routes
/robots.txt/sitemap.xmlSitemap Content
Queries local nostrdb cache for:
note1xxxURLs, 90-day lookbacknaddr1xxxURLs, 365-day lookback (evergreen content)npub1xxxURLsLimited to 10,000 URLs per content type.
robots.txt
Prometheus Metrics
sitemap_generations_totalsitemap_generation_duration_secondssitemap_urls_totalsitemap_notes_countsitemap_articles_countsitemap_profiles_countConfiguration
NOTECRUMBS_BASE_URL=https://damus.io # Required for correct sitemap URLsTest plan
/robots.txtreturns correct content/sitemap.xmlgenerates valid XML/metrics🤖 Generated with Claude Code
Summary by CodeRabbit
/sitemap.xmlendpoint that generates dynamic sitemaps including homepage, recent notes, articles, and user profiles for search engine indexing./robots.txtendpoint to guide search engine crawlers.✏️ Tip: You can customize this high-level summary in your review settings.