Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 20, 2026

What type of PR is this?

  • feature

What this PR does / why we need it:

Adds comprehensive OpenTelemetry tracing across the full stack with proper trace context propagation from frontend to backend to database.

Backend (backend/internal/sys/otel/)

  • otel.go - Provider init with OTLP/stdout exporters, configurable sampling
  • middleware.go - HTTP middleware via otelhttp
  • database.go - Ent ORM tracing via otelsql
  • proxy.go - Authenticated proxy endpoint at /api/v1/telemetry for frontend spans

Backend Config (backend/internal/sys/config/conf_otel.go)

  • Standard OTEL env vars: HBOX_OTEL_ENABLED, HBOX_OTEL_ENDPOINT, HBOX_OTEL_PROTOCOL, HBOX_OTEL_SAMPLE_RATE, etc.
  • Toggles for HTTP/DB tracing and frontend proxy

Backend Status Endpoint (/api/v1/status)

  • Added telemetry.enabled field to APISummary response
  • Frontend queries this endpoint to automatically enable telemetry when backend has it enabled

Frontend (frontend/lib/otel/)

  • Custom BackendProxyExporter sends spans through authenticated backend endpoint
  • Auto-instrumentation via FetchInstrumentation (same-origin only for security)
  • useOTel() composable for manual span creation in components
  • Automatic enablement via backend status endpoint (no manual frontend config required)

Usage:

# Backend - enabling this automatically enables frontend telemetry
HBOX_OTEL_ENABLED=true HBOX_OTEL_ENDPOINT=localhost:4317 HBOX_OTEL_INSECURE=true

Which issue(s) this PR fixes:

Special notes for your reviewer:

  • Frontend telemetry is automatically enabled when backend telemetry is enabled (via /api/v1/status endpoint)
  • Frontend trace headers restricted to same-origin requests to prevent leaking trace context to external domains
  • Proxy payload limit (1MB) defined as named constant
  • Status API call has a 3-second timeout to prevent blocking app startup
  • Pre-existing TypeScript errors in the repo are unrelated to these changes

Testing

  • Backend builds and passes go vet
  • Frontend OTel files pass eslint
  • CodeQL security analysis: 0 alerts
Original prompt

Add OpenTelemtry to the backend and frontend, all the standard OpenTelemtry configuration options should be available for configuration on the backend, and front-end Otel information should be proxied by the backend (with authentication to the proxy route handled by standard user auth).

For the database ORM interactions use the otelsql package to log database traces correctly.

Additionally, the tracing should be proper full chain in that API calls from the front-end should have a trace, that trace should get attached to the backend traces for that call, and the response from the API should then be attached to whatever the front-end does with it.

Ensure that all code generated is maintainable by humans, easy to work with, and easy to adjust as needed.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Summary by CodeRabbit

Release Notes

  • New Features
    • Added comprehensive OpenTelemetry telemetry and tracing integration for enhanced system monitoring and observability
    • Backend can now optionally trace database operations and HTTP requests when telemetry is enabled
    • Frontend can send telemetry data to the backend proxy endpoint
    • API status endpoint now reports telemetry configuration status
    • New telemetry configuration options available for service setup

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI and others added 4 commits January 20, 2026 16:58
Co-authored-by: tankerkiller125 <3457368+tankerkiller125@users.noreply.github.com>
…acing, and HTTP middleware

Co-authored-by: tankerkiller125 <3457368+tankerkiller125@users.noreply.github.com>
…proxy, and composables

Co-authored-by: tankerkiller125 <3457368+tankerkiller125@users.noreply.github.com>
…yload limit

Co-authored-by: tankerkiller125 <3457368+tankerkiller125@users.noreply.github.com>
Copilot AI changed the title [WIP] Add OpenTelemetry configuration for backend and frontend Add OpenTelemetry tracing to backend and frontend with full chain support Jan 20, 2026
Copilot AI requested a review from tankerkiller125 January 20, 2026 17:27
@tankerkiller125
Copy link
Contributor

@copilot the frontend telemetry should be enabled automatically when the backend telemetry is enabled. This should be done via the backend info endpoint which the front-end queries for information.

…bled via status endpoint

Co-authored-by: tankerkiller125 <3457368+tankerkiller125@users.noreply.github.com>
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jan 20, 2026

Deploying homebox-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: ea84a16
Status:🚫  Build failed.

View logs

Copy link
Contributor Author

Copilot AI commented Jan 20, 2026

@copilot the frontend telemetry should be enabled automatically when the backend telemetry is enabled. This should be done via the backend info endpoint which the front-end queries for information.

Implemented in ea84a16. The frontend now automatically enables telemetry when the backend has it enabled:

  1. Added telemetry.enabled field to the /api/v1/status endpoint response
  2. Frontend plugin now queries this endpoint during initialization
  3. Removed the NUXT_PUBLIC_OTEL_ENABLED env var requirement
  4. Added a 3-second timeout to prevent blocking app startup if backend is slow

When you enable backend telemetry with HBOX_OTEL_ENABLED=true, the frontend will automatically detect it and enable client-side tracing.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 20, 2026

Walkthrough

This PR adds comprehensive OpenTelemetry integration to both backend and frontend systems. Backend changes include a new OTEL provider, database tracing instrumentation, HTTP middleware integration, and a proxy endpoint for frontend telemetry collection. Frontend changes provide Vue composables, client initialization, and automatic trace instrumentation. Configuration structures support OTEL enablement, exporters, protocols, and per-component tracing toggles. Dependencies are updated with OTEL packages across SDK, exporters, and instrumentation libraries.

Changes

Cohort / File(s) Summary
Backend OTEL Package Core
backend/internal/sys/otel/otel.go
Introduces comprehensive OTEL Provider type with tracer, meter, and logger initialization. Implements factory NewProvider with conditional exporter creation (OTLP, stdout, noop), resource setup, span filtering, and graceful shutdown sequencing. ~550 LOC of dense initialization logic.
Backend OTEL Database Tracing
backend/internal/sys/otel/database.go
Adds database instrumentation via otelsql, conditional on EnableDatabaseTracing flag. Exposes OpenDatabase, OpenEntDriver, and helpers for driver name mapping and semantic attributes. ~130 LOC.
Backend OTEL Proxy & Frontend Bridge
backend/internal/sys/otel/proxy.go
Implements OTLP proxy endpoint (ProxyHandler) for frontend telemetry. Defines FrontendSpan, SpanStatus, SpanEvent, TelemetryPayload DTOs. Includes gRPC and HTTP OTLP client implementations with conversion utilities. ~420 LOC.
Backend OTEL Logging Bridge
backend/internal/sys/otel/zerolog_bridge.go
Bridges zerolog to OTEL logging via ZerologOTelHook, translating zerolog levels to OTEL severities and emitting records.
Backend App Initialization
backend/app/api/main.go
Integrates OTEL provider initialization, database instrumentation, HTTP tracing middleware, and graceful shutdown hooks. Reorganizes imports for OTEL packages; conditionally wraps database driver. ~106 lines added.
Backend App Setup & Routing
backend/app/api/app.go, backend/app/api/routes.go
Adds otel field to app struct; registers /telemetry proxy route when OTEL enabled.
Backend Logger Setup
backend/app/api/logger.go
Adds setupOtelZerologBridge function to attach OTEL logging hook to zerolog when enabled.
Backend API Status Endpoint
backend/app/api/handlers/v1/controller.go
Introduces TelemetryStatus struct and extends APISummary with Telemetry field, exposing ctrl.config.Otel.Enabled to frontend.
Backend Configuration
backend/internal/sys/config/conf.go, backend/internal/sys/config/conf_otel.go
Adds new OTelConfig struct with ~16 configuration fields (Enabled, ServiceName, ServiceVersion, Exporter, Endpoint, Protocol, Insecure, Headers, SampleRate, EnableMetrics, EnableLogging, EnableDatabaseTracing, EnableHTTPTracing, ProxyEnabled).
Backend Dependencies
backend/go.mod
Adds OpenTelemetry SDK packages (otel, otelchi, otelsql, exporters for OTLP/stdout, instrumentation for HTTP/gRPC), updates to modernc.org/sqlite.
Backend Data Layer Instrumentation
backend/internal/data/repo/repo_item_attachments.go
Adds named OTEL tracing spans to Create, Delete, CreateThumbnail, CreateMissingThumbnails, UploadFile, processThumbnailFromImage methods with span events for processing steps.
Backend Repo Formatting & Imports
backend/internal/data/repo/repo_items.go, backend/internal/data/repo/repo_labels.go, backend/internal/data/repo/repo_group.go, backend/internal/data/repo/repo_item_templates.go, backend/internal/data/repo/repo_wipe_integration_test.go, backend/internal/data/repo/repo_items_test.go
Minor import reordering (ent/tag imports repositioned) and whitespace/formatting adjustments. No semantic changes.
Backend Error Handling
backend/internal/core/services/reporting/import.go
Updates readRawCsv to use errors.Is for proper handling of wrapped EOF errors.
Frontend OTEL Core Library
frontend/lib/otel/index.ts
Implements full frontend OTEL integration with BackendProxyExporter, batch processor, resource setup. Exports initializeOTel, getTracer, startSpan, withSpan, injectTraceHeaders, getCurrentTraceContext, isOTelInitialized, shutdownOTel. Auto-instruments fetch calls with origin-restricted header propagation. ~336 LOC.
Frontend OTEL Plugin
frontend/plugins/otel.client.ts
Nuxt client plugin that fetches backend telemetry status and conditionally initializes OTEL with 3-second timeout. Provides otelEnabled flag to application context.
Frontend OTEL Composable
frontend/composables/use-otel.ts
Vue composable exposing traced, trace, getTraceContext, tracePage, traceInteraction utilities with graceful no-op fallbacks when disabled.
Frontend Types & Config
frontend/lib/api/types/data-contracts.ts, frontend/nuxt.config.ts, frontend/package.json
Updates APISummary type with TelemetryStatus; adds runtime config for otelServiceName, otelServiceVersion, otelSampleRate, otelDebug; adds @opentelemetry/* dependencies (api, context-zone, core, exporter-trace-otlp-http, instrumentation, instrumentation-fetch, resources, sdk-trace-web, semantic-conventions).

Sequence Diagram(s)

sequenceDiagram
    participant Frontend as Frontend<br/>(Browser)
    participant Plugin as OTEL Plugin<br/>(otel.client.ts)
    participant Backend as Backend App<br/>(main.go)
    participant BEStatus as /api/v1/status<br/>Endpoint
    participant OTelLib as Frontend OTel<br/>Library

    Frontend->>Plugin: Page Load
    Plugin->>BEStatus: Fetch Backend Status<br/>(3s timeout)
    alt Backend Telemetry Enabled
        BEStatus-->>Plugin: APISummary {<br/>telemetry.enabled: true}
        Plugin->>OTelLib: initializeOTel(config)
        OTelLib->>OTelLib: Create Provider +<br/>BackendProxyExporter
        OTelLib-->>Plugin: Ready
        Plugin->>Frontend: Provide otelEnabled: true
        Frontend->>Frontend: Instrument fetch,<br/>spans
    else Backend Telemetry Disabled
        BEStatus-->>Plugin: APISummary {<br/>telemetry.enabled: false}
        Plugin->>Frontend: Provide otelEnabled: false
    end
Loading
sequenceDiagram
    participant Frontend as Frontend Span<br/>(FrontendSpan)
    participant Exporter as BackendProxy<br/>Exporter
    participant BEProxy as Backend<br/>/telemetry<br/>ProxyHandler
    participant OTLPClient as OTLP Client<br/>(gRPC/HTTP)
    participant OTLP as OTLP Collector<br/>(User's Endpoint)

    Frontend->>Exporter: Batch Spans
    Exporter->>BEProxy: POST /telemetry<br/>(TelemetryPayload JSON)
    BEProxy->>BEProxy: Parse & Validate<br/>Payload
    BEProxy->>BEProxy: Convert to OTLP<br/>Format
    BEProxy->>OTLPClient: Forward Spans
    OTLPClient->>OTLP: Send OTLP Protobuf<br/>(gRPC/HTTP)
    OTLP-->>OTLPClient: ACK
    OTLPClient-->>BEProxy: Success
    BEProxy-->>Exporter: HTTP 200
    Exporter-->>Frontend: Sent
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

The PR introduces substantial new OTEL integration logic across backend and frontend with ~1.5K lines of new code in the OTEL package alone (otel.go, proxy.go, database.go). Multiple file types (configuration structs, initialization code, instrumentation hooks, client plugins) and heterogeneous changes demand careful review. Key areas requiring attention: OTEL provider initialization sequencing, database instrumentation conditional logic, proxy endpoint security (payload size limits, validation), frontend initialization timing with backend status fetch, and dependency version compatibility.


Security Recommendations

  • Payload Size Limit: The proxy.go implements MaxTelemetryPayloadSize guard, but validate that the limit is appropriate and consistently enforced across all payload acceptance paths to prevent memory exhaustion.
  • Header Validation: In proxy.go, ensure that custom headers from configuration are sanitized and that no sensitive data (API keys, tokens) can be accidentally forwarded to OTLP endpoints.
  • Endpoint Validation: Verify that OTEL_EXPORTER_OTLP_ENDPOINT from configuration is validated (e.g., HTTPS enforcement, allowed hostnames) to prevent exfiltration to untrusted OTLP collectors.
  • Trace Context Propagation: Frontend's injectTraceHeaders with origin-restricted propagation is good, but confirm that trace IDs/span IDs cannot be correlated across isolated contexts unintentionally.
  • Frontend Fetch Auto-Instrumentation: Ensure auto-instrumentation of fetch calls doesn't inadvertently expose sensitive request/response headers in trace attributes; validate that sensitive headers are redacted.
  • Database Query Inclusion: database.go includes SQL comments and potentially query content in spans (EnableHTTPTracing, SQL in spans) — ensure that sensitive data (passwords, PII in WHERE clauses) isn't exposed in traces.
  • Timeout Configuration: Backend status fetch in otel.client.ts uses a 3-second timeout; confirm this is resilient and doesn't block page load if the backend is slow.

Observability blooms bright,
Traces dance from dawn to night,
Frontend whispers to the back,
Spans flow down the telemetry track,
Data glowing, insights taking flight! 🔍✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main feature: adding OpenTelemetry tracing across backend and frontend with full chain support, which aligns with the changeset's primary objective.
Description check ✅ Passed The PR description covers all required sections: type (feature), detailed explanation of changes across backend/frontend with bullet points, and special notes. However, it lacks a specific issue number in the 'Which issue(s) this PR fixes' section.
Docstring Coverage ✅ Passed Docstring coverage is 89.19% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch copilot/add-opentelemetry-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tankerkiller125 tankerkiller125 marked this pull request as ready for review January 26, 2026 02:47
@tankerkiller125
Copy link
Contributor

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 26, 2026

✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/internal/sys/config/conf.go (1)

19-35: Redact OTEL headers when printing config.
Line 34: Config.Print() will now include Otel.Headers, which can carry OTLP auth tokens. This risks leaking secrets into logs/console. Consider redacting before marshaling.

🔒 Proposed redaction in Config.Print()
 func (c *Config) Print() {
-    res, err := json.MarshalIndent(c, "", "  ")
+    redacted := *c
+    if redacted.Otel.Headers != "" {
+        redacted.Otel.Headers = "[redacted]"
+    }
+    res, err := json.MarshalIndent(redacted, "", "  ")
     if err != nil {
         panic(err)
     }
🤖 Fix all issues with AI agents
In `@backend/app/api/main.go`:
- Around line 228-236: The middleware list registers
otelchimetric.NewResponseSizeBytes twice causing duplicate response-size
metrics; edit the router.Use call in main.go to remove the duplicate
otelchimetric.NewResponseSizeBytes entry (keep one instance alongside
otelchi.Middleware and otelchimetric.NewRequestInFlight), or replace the second
call with the intended different metric if a second metric was meant to be
added.

In `@backend/internal/sys/otel/otel.go`:
- Around line 368-379: The metric gRPC exporter uses
otlpmetricgrpc.WithInsecure() while traces use
grpc.WithTransportCredentials(insecure.NewCredentials()) via WithDialOption;
update the metric (otlpmetricgrpc.New call) and log exporter usage to use
otlpmetricgrpc.WithDialOption(grpc.WithTransportCredentials(insecure.NewCredentials()))
when cfg.Insecure is true instead of WithInsecure(), preserve existing endpoint
and headers options (WithEndpoint, WithHeaders), and add a short comment near
the cfg.Insecure branch stating that insecure mode disables TLS (only for
dev/testing or trusted networks).

In `@backend/internal/sys/otel/proxy.go`:
- Around line 238-253: The gRPC branch builds dial options only for the insecure
case and omits setting TLS transport credentials when p.cfg.Insecure is false;
update the grpc dial option construction used before grpc.NewClient so that when
p.cfg.Insecure is false you append a secure transport credential (e.g.,
grpc.WithTransportCredentials(credentials.NewClientTLSFromCert(nil, "")) or
another appropriate credentials.TransportCredentials) instead of leaving
dialOpts empty, ensure the change is applied where grpc.NewClient is called and
that the credentials package is imported, leaving the grpcClient /
collogspb.NewTraceServiceClient creation unchanged.

In `@frontend/lib/otel/index.ts`:
- Around line 104-109: The code incorrectly tries to read parentSpanId via a
fragile cast on span; instead read the parent span ID from the span's
parentSpanContext. Replace the cast and use span.parentSpanContext?.spanId (from
the ReadableSpan's parentSpanContext?: SpanContext) to populate parentSpanId so
the returned object uses spanContext.traceId, spanContext.spanId, and
parentSpanContext?.spanId for parentSpanId.
🧹 Nitpick comments (14)
backend/internal/data/repo/repo_item_attachments.go (1)

867-889: Consider recording errors on spans for better observability.

The span instrumentation is good, but you could enhance debugging by recording errors when they occur. This helps correlate failures in your tracing backend.

♻️ Optional: Add error recording to spans
 func (r *AttachmentRepo) processThumbnailFromImage(ctx context.Context, groupId uuid.UUID, img image.Image, title string, orientation uint16) (UploadResult, error) {
 	ctx, span := otel.Tracer("data").Start(ctx, "repo.AttachmentRepo.processThumbnailFromImage")
 	defer span.End()
 
 	bounds := img.Bounds()
 
 	span.AddEvent("applying EXIF orientation")
 	// Apply EXIF orientation if needed
 	if orientation > 1 {
 		img = utils.ApplyOrientation(img, orientation)
 		bounds = img.Bounds()
 	}
 
 	span.AddEvent("resizing image")
 	newWidth, newHeight := calculateThumbnailDimensions(bounds.Dx(), bounds.Dy(), r.thumbnail.Width, r.thumbnail.Height)
 	dst := image.NewRGBA(image.Rect(0, 0, newWidth, newHeight))
 	draw.CatmullRom.Scale(dst, dst.Rect, img, img.Bounds(), draw.Over, nil)
 
 	span.AddEvent("encoding thumbnail as webp")
 	buf := new(bytes.Buffer)
 	err := webp.Encode(buf, dst, webp.Options{Quality: 80, Lossless: false})
 	if err != nil {
+		span.RecordError(err)
 		return UploadResult{}, err
 	}
frontend/plugins/otel.client.ts (1)

18-41: AbortController refactoring requires API modifications.

The suggested improvement is architecturally sound—using AbortController is the proper way to cancel HTTP requests and prevent resource leaks—but requires changes beyond the current code:

  1. The RequestArgs type in frontend/lib/requests/requests.ts does not include a signal property
  2. PublicApi.status() takes no parameters; it would need to accept options
  3. The Requests.get() method would need to forward the signal to the fetch API

While the current Promise.race implementation works correctly and safely handles timeout scenarios, it does leave the API request running after the timeout fires, which is inefficient. To implement this refactoring cleanly, first extend RequestArgs to support signal: AbortSignal | undefined, then update the API method signatures accordingly.

From a security perspective: The current timeout handling is safe but could benefit from proper request cancellation to avoid potential memory leaks in long-lived applications.

backend/internal/sys/config/conf_otel.go (1)

33-34: Add validation for SampleRate bounds at config load time.

The SampleRate field should be validated to be within [0.0, 1.0]. While the initialization code gracefully handles out-of-bounds values, early validation at config load time would catch misconfigurations immediately and provide clearer error messages. Additionally, invalid sampling rates have security implications—misconfiguration could inadvertently disable tracing (audit trail loss) or apply unexpected sampling, affecting compliance and observability.

backend/internal/sys/otel/zerolog_bridge.go (2)

24-39: Consider propagating request context for better trace correlation.

The comment on line 37 acknowledges this, but using context.Background() means log records won't be correlated with active spans. If zerolog's Hook interface allowed passing context, this would enable better distributed tracing correlation.

For now, this is acceptable given the zerolog.Hook interface constraints.


54-57: PanicLevel maps to SeverityError - consider SeverityFatal4 for distinction.

Both FatalLevel and PanicLevel indicate critical failures, but mapping PanicLevel to SeverityError loses some severity information. OpenTelemetry defines multiple fatal severities (Fatal, Fatal2, Fatal3, Fatal4) that could differentiate these.

♻️ Optional enhancement
 	case zerolog.FatalLevel:
 		return otellog.SeverityFatal
 	case zerolog.PanicLevel:
-		return otellog.SeverityError
+		return otellog.SeverityFatal4
backend/internal/sys/otel/proxy.go (2)

312-332: 🔐 Security: Validate trace/span ID lengths to prevent malformed data.

Frontend-supplied trace IDs (32 hex chars = 16 bytes) and span IDs (16 hex chars = 8 bytes) should be validated. Malformed IDs could cause issues downstream or indicate malicious input.

♻️ Proposed validation
 func convertSpanToProto(frontendSpan FrontendSpan) (*tracepb.Span, error) {
 	// Parse trace ID
+	if len(frontendSpan.TraceID) != 32 {
+		return nil, fmt.Errorf("invalid trace ID length: expected 32, got %d", len(frontendSpan.TraceID))
+	}
 	traceID, err := hex.DecodeString(frontendSpan.TraceID)
 	if err != nil {
 		return nil, fmt.Errorf("invalid trace ID: %w", err)
 	}

 	// Parse span ID
+	if len(frontendSpan.SpanID) != 16 {
+		return nil, fmt.Errorf("invalid span ID length: expected 16, got %d", len(frontendSpan.SpanID))
+	}
 	spanID, err := hex.DecodeString(frontendSpan.SpanID)

386-393: Potential integer overflow when converting milliseconds to nanoseconds.

Multiplying int64 millisecond timestamps by 1000000 and casting to uint64 can overflow for timestamps far in the future (though unlikely in practice). Consider using explicit safe conversion.

The current approach works for dates until ~year 292471, so this is low-risk but worth noting for defensive coding.

frontend/lib/otel/index.ts (1)

203-204: Static analysis ReDoS warning - safe in this context.

The static analysis tool flags new RegExp(^${window.location.origin}) as a potential ReDoS risk. However, window.location.origin is browser-controlled and contains only scheme://host:port format without regex metacharacters that could cause catastrophic backtracking.

For defense in depth, you could escape the origin, but it's not strictly necessary here:

♻️ Optional defensive escaping
+function escapeRegExp(str: string): string {
+  return str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
+}
+
 new FetchInstrumentation({
-  propagateTraceHeaderCorsUrls: [new RegExp(`^${window.location.origin}`)],
+  propagateTraceHeaderCorsUrls: [new RegExp(`^${escapeRegExp(window.location.origin)}`)],
frontend/composables/use-otel.ts (1)

26-29: Empty span object may cause runtime errors if callback uses span methods.

When OTEL is disabled, passing {} as Span to the callback could cause runtime errors if the callback attempts to call span methods like setAttribute() or setStatus(). Consider providing a no-op span implementation.

♻️ Proposed no-op span implementation
+// No-op span for when OTel is disabled
+const noopSpan: Span = {
+  spanContext: () => ({ traceId: '', spanId: '', traceFlags: 0 }),
+  setAttribute: () => noopSpan,
+  setAttributes: () => noopSpan,
+  addEvent: () => noopSpan,
+  setStatus: () => noopSpan,
+  updateName: () => noopSpan,
+  end: () => {},
+  isRecording: () => false,
+  recordException: () => {},
+} as unknown as Span;

   async function traced<T>(
     name: string,
     fn: (span: Span) => Promise<T> | T,
     options?: { ... }
   ): Promise<T> {
     if (!isEnabled.value) {
-      return fn({} as Span);
+      return fn(noopSpan);
     }
     return withSpan(name, fn, options);
   }
backend/internal/sys/otel/database.go (2)

40-43: 🔐 Security consideration: SQL queries in spans may expose sensitive data.

DisableQuery: false includes full SQL queries in trace spans. While useful for debugging, this could expose sensitive data (emails, passwords in queries, PII) in your telemetry backend.

Consider:

  1. Using DisableQuery: true in production
  2. Adding a configuration option to toggle this behavior
  3. Ensuring your telemetry backend has appropriate access controls
♻️ Optional: make configurable
 		otelsql.WithSpanOptions(otelsql.SpanOptions{
-			DisableQuery: false, // Include SQL queries in spans
+			DisableQuery: !p.cfg.IncludeSQLInSpans, // Configurable SQL query inclusion
 		}),

99-108: WrapEntDriver doesn't actually wrap - consider removing or documenting clearly.

The function name suggests it wraps a driver with tracing, but it simply returns the input driver unchanged. The comment explains the assumption, but callers may be confused.

Either:

  1. Remove this function if it serves no purpose
  2. Rename to clarify behavior (e.g., ValidateTracedDriver)
  3. Actually implement wrapping logic for drivers not created via OpenEntDriver
backend/internal/sys/otel/otel.go (3)

175-181: Security: Avoid logging potentially sensitive header values.

The cfg.Endpoint is logged here. While endpoint logging is generally acceptable, be cautious if headers contain authentication tokens or API keys. The current code does not log cfg.Headers, which is good, but consider adding a note or ensuring that sensitive configuration is never accidentally logged in future changes.


461-464: Hardcoded environment value limits observability utility.

getEnvironment() always returns "production", which may not accurately reflect actual deployment stages (development, staging, etc.). Consider making this configurable via OTelConfig or reading from an environment variable.

♻️ Suggested improvement
 // getEnvironment returns the current deployment environment.
-func getEnvironment() string {
-	return "production"
+func getEnvironment(cfg *config.OTelConfig) string {
+	if cfg.Environment != "" {
+		return cfg.Environment
+	}
+	return "production"
 }

Then update the call site at line 127 to pass cfg.


192-198: Silent failure on invalid MetricsInterval configuration.

If time.ParseDuration fails, the error is silently ignored and the default 15-second interval is used. Consider logging a warning so operators are aware their configuration was not applied.

♻️ Suggested improvement
 		metricsInterval := 15 * time.Second
 		if cfg.MetricsInterval != "" {
-			if parsed, err := time.ParseDuration(cfg.MetricsInterval); err == nil {
+			parsed, err := time.ParseDuration(cfg.MetricsInterval)
+			if err != nil {
+				log.Warn().Str("interval", cfg.MetricsInterval).Err(err).Msg("Invalid metrics interval, using default 15s")
+			} else {
 				metricsInterval = parsed
 			}
 		}

Comment on lines +228 to +236
if otelProvider.IsEnabled() && cfg.Otel.EnableHTTPTracing {
otelChiBaseCfg := otelchimetric.NewBaseConfig(cfg.Otel.ServiceName, otelchimetric.WithMeterProvider(otelProvider.MeterProvider()))
router.Use(
otelchi.Middleware(cfg.Otel.ServiceName, otelchi.WithChiRoutes(router)),
otelchimetric.NewResponseSizeBytes(otelChiBaseCfg),
otelchimetric.NewRequestInFlight(otelChiBaseCfg),
otelchimetric.NewResponseSizeBytes(otelChiBaseCfg),
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Duplicate NewResponseSizeBytes middleware - likely copy-paste error.

otelchimetric.NewResponseSizeBytes is added twice (lines 232 and 234), which will record duplicate metrics for response sizes. This appears to be a copy-paste error.

🐛 Proposed fix
 		otelChiBaseCfg := otelchimetric.NewBaseConfig(cfg.Otel.ServiceName, otelchimetric.WithMeterProvider(otelProvider.MeterProvider()))
 		router.Use(
 			otelchi.Middleware(cfg.Otel.ServiceName, otelchi.WithChiRoutes(router)),
-			otelchimetric.NewResponseSizeBytes(otelChiBaseCfg),
 			otelchimetric.NewRequestInFlight(otelChiBaseCfg),
 			otelchimetric.NewResponseSizeBytes(otelChiBaseCfg),
 		)
🤖 Prompt for AI Agents
In `@backend/app/api/main.go` around lines 228 - 236, The middleware list
registers otelchimetric.NewResponseSizeBytes twice causing duplicate
response-size metrics; edit the router.Use call in main.go to remove the
duplicate otelchimetric.NewResponseSizeBytes entry (keep one instance alongside
otelchi.Middleware and otelchimetric.NewRequestInFlight), or replace the second
call with the intended different metric if a second metric was meant to be
added.

Comment on lines +368 to +379
case "grpc", "":
opts := []otlpmetricgrpc.Option{}
if cfg.Endpoint != "" {
opts = append(opts, otlpmetricgrpc.WithEndpoint(cfg.Endpoint))
}
if cfg.Insecure {
opts = append(opts, otlpmetricgrpc.WithInsecure())
}
if len(headers) > 0 {
opts = append(opts, otlpmetricgrpc.WithHeaders(headers))
}
return otlpmetricgrpc.New(ctx, opts...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n backend/internal/sys/otel/otel.go | head -400 | tail -150

Repository: sysadminsmedia/homebox

Length of output: 5610


🌐 Web query:

otlpmetricgrpc.WithInsecure() grpc credentials insecure

💡 Result:

otlpmetricgrpc.WithInsecure() tells the OpenTelemetry OTLP metric gRPC exporter to disable TLS on its client connection (i.e., plaintext gRPC), similar in effect to the deprecated grpc.WithInsecure(). This is mainly for local/dev or when your collector endpoint is not using TLS. [1]

If your goal is specifically “use the new gRPC insecure credentials API (instead of grpc.WithInsecure)”, pass a dial option:

import (
  "google.golang.org/grpc"
  "google.golang.org/grpc/credentials/insecure"
  "go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc"
)

exp, err := otlpmetricgrpc.New(ctx,
  otlpmetricgrpc.WithEndpoint("localhost:4317"),
  otlpmetricgrpc.WithDialOption(grpc.WithTransportCredentials(insecure.NewCredentials())),
)

This is the direct replacement pattern for deprecated grpc.WithInsecure() (use grpc.WithTransportCredentials(insecure.NewCredentials())). [2]

You can also set it via env vars (OTEL_EXPORTER_OTLP_METRICS_INSECURE=true / OTEL_EXPORTER_OTLP_INSECURE=true) when using the OTLP metric gRPC exporter. [1]

Security note: disabling TLS means data is unencrypted in transit, so avoid it on untrusted networks. [3]

Sources: [1] [2] [3]


🏁 Script executed:

cat -n backend/internal/sys/otel/otel.go | sed -n '400,450p'

Repository: sysadminsmedia/homebox

Length of output: 1881


Align all gRPC exporters to use modern insecure credentials pattern for consistency and best practices.

The trace exporter (line 320) applies grpc.WithTransportCredentials(insecure.NewCredentials()) via WithDialOption(), but the metric exporter (line 374) and log exporter (line 428) only use WithInsecure(). Standardize all three gRPC exporters to the modern credentials API pattern for consistency. Additionally, document that insecure mode disables TLS—data travels unencrypted—so this should only be used in secure network environments (dev/testing).

🤖 Prompt for AI Agents
In `@backend/internal/sys/otel/otel.go` around lines 368 - 379, The metric gRPC
exporter uses otlpmetricgrpc.WithInsecure() while traces use
grpc.WithTransportCredentials(insecure.NewCredentials()) via WithDialOption;
update the metric (otlpmetricgrpc.New call) and log exporter usage to use
otlpmetricgrpc.WithDialOption(grpc.WithTransportCredentials(insecure.NewCredentials()))
when cfg.Insecure is true instead of WithInsecure(), preserve existing endpoint
and headers options (WithEndpoint, WithHeaders), and add a short comment near
the cfg.Insecure branch stating that insecure mode disables TLS (only for
dev/testing or trusted networks).

Comment on lines +238 to +253
case "grpc", "":
dialOpts := []grpc.DialOption{}
if p.cfg.Insecure {
dialOpts = append(dialOpts, grpc.WithTransportCredentials(insecure.NewCredentials()))
}

conn, err := grpc.NewClient(p.cfg.Endpoint, dialOpts...)
if err != nil {
return nil, fmt.Errorf("failed to create gRPC client: %w", err)
}

client := collogspb.NewTraceServiceClient(conn)
return &grpcClient{
client: client,
conn: conn,
}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🔐 Security: gRPC client missing TLS credentials when Insecure is false.

When p.cfg.Insecure is false, the gRPC client is created without any transport credentials. This will likely cause connection failures or, worse, fall back to insecure connections depending on the gRPC version defaults.

🐛 Proposed fix - add TLS credentials for secure connections
+import "google.golang.org/grpc/credentials"

 	case "grpc", "":
 		dialOpts := []grpc.DialOption{}
 		if p.cfg.Insecure {
 			dialOpts = append(dialOpts, grpc.WithTransportCredentials(insecure.NewCredentials()))
+		} else {
+			dialOpts = append(dialOpts, grpc.WithTransportCredentials(credentials.NewClientTLSFromCert(nil, "")))
 		}
🤖 Prompt for AI Agents
In `@backend/internal/sys/otel/proxy.go` around lines 238 - 253, The gRPC branch
builds dial options only for the insecure case and omits setting TLS transport
credentials when p.cfg.Insecure is false; update the grpc dial option
construction used before grpc.NewClient so that when p.cfg.Insecure is false you
append a secure transport credential (e.g.,
grpc.WithTransportCredentials(credentials.NewClientTLSFromCert(nil, "")) or
another appropriate credentials.TransportCredentials) instead of leaving
dialOpts empty, ensure the change is applied where grpc.NewClient is called and
that the credentials package is imported, leaving the grpcClient /
collogspb.NewTraceServiceClient creation unchanged.

Comment on lines +104 to +109
// Get parent span ID from the span context if available
const parentSpanContext = (span as unknown as { parentSpanId?: string }).parentSpanId;
return {
traceId: spanContext.traceId,
spanId: spanContext.spanId,
parentSpanId: parentSpanContext || undefined,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n frontend/lib/otel/index.ts | head -120 | tail -30

Repository: sysadminsmedia/homebox

Length of output: 1323


🌐 Web query:

OpenTelemetry SDK ReadableSpan interface parentSpanId property

💡 Result:

In opentelemetry-js, the SDK ReadableSpan no longer has a parentSpanId property. The current ReadableSpan interface exposes the parent as an optional parentSpanContext?: SpanContext instead. (open-telemetry.github.io)

So, to get the parent span id you now do:

const parentSpanId = span.parentSpanContext?.spanId; // undefined for root spans

This was a documented breaking change: parentSpanId was removed and replaced by parentSpanContext (migration: span.parentSpanIdspan.parentSpanContext?.spanId). (classic.yarnpkg.com)

This aligns with the OpenTelemetry trace SDK spec guidance that implementations must expose at least the full parent SpanContext. (opentelemetry.io)

Citations:


🏁 Script executed:

grep -r "ReadableSpan" frontend/lib/otel/ --include="*.ts"

Repository: sysadminsmedia/homebox

Length of output: 366


🏁 Script executed:

grep -r "from.*opentelemetry" frontend/lib/otel/index.ts -A 5

Repository: sysadminsmedia/homebox

Length of output: 991


Fix incorrect parentSpanId access on ReadableSpan.

The fragile type assertion is attempting to access parentSpanId, which doesn't exist on the OpenTelemetry SDK's ReadableSpan interface. The parentSpanId property was removed in favor of parentSpanContext?: SpanContext. Access the parent span ID correctly via the span context:

♻️ Correct implementation
-    const parentSpanContext = (span as unknown as { parentSpanId?: string }).parentSpanId;
+    const parentSpanId = span.parentSpanContext?.spanId;
     return {
       traceId: spanContext.traceId,
       spanId: spanContext.spanId,
-      parentSpanId: parentSpanContext || undefined,
+      parentSpanId: parentSpanId,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Get parent span ID from the span context if available
const parentSpanContext = (span as unknown as { parentSpanId?: string }).parentSpanId;
return {
traceId: spanContext.traceId,
spanId: spanContext.spanId,
parentSpanId: parentSpanContext || undefined,
// Get parent span ID from the span context if available
const parentSpanId = span.parentSpanContext?.spanId;
return {
traceId: spanContext.traceId,
spanId: spanContext.spanId,
parentSpanId: parentSpanId,
🤖 Prompt for AI Agents
In `@frontend/lib/otel/index.ts` around lines 104 - 109, The code incorrectly
tries to read parentSpanId via a fragile cast on span; instead read the parent
span ID from the span's parentSpanContext. Replace the cast and use
span.parentSpanContext?.spanId (from the ReadableSpan's parentSpanContext?:
SpanContext) to populate parentSpanId so the returned object uses
spanContext.traceId, spanContext.spanId, and parentSpanContext?.spanId for
parentSpanId.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants