This document describes the internal architecture of the OpenShift Lightspeed Operator codebase.
The operator follows a modular, component-based architecture where each major component (application server, Lightspeed Core/Llama Stack, PostgreSQL, Console UI) is managed by its own dedicated package with independent reconciliation logic.
- Modularity: Each component is self-contained
- Maintainability: Changes to one component don't affect others
- Testability: Independent test suites per component
- Code Organization: Clear boundaries and responsibilities
- Avoid Circular Dependencies: Components don't import main controller
- Clean Testing: Easy to create test implementations
- Flexibility: Main controller can evolve without breaking components
- Owned Resources (ResourceVersion): Auto-cleanup on deletion, Kubernetes-native lifecycle
- External Resources (Data Comparison): Respects user ownership, supports cross-namespace sharing
- Right Tool for Job: Operator resources need lifecycle management, user resources need change tracking without interference
- Efficiency: Only update when resources actually change
- Reliability: Leverages Kubernetes' built-in change tracking
- Simplicity: No custom hash computation or state management
- Correctness: Kubernetes guarantees ResourceVersion changes on modification
Core Orchestration:
- Main
Reconcile()method coordinates all reconciliation phases SetupWithManager()configures controller watches and event handlers- Selects backend: calls either
appserver.ReconcileAppServer()ORlcore.ReconcileLCore()based on--enable-lcoreflag
Support Functions:
- Implements
reconciler.Reconcilerinterface (provides config/images to components) - Watcher predicate helpers for filtering events
- Status management and deployment health checks
- External resource annotation for change tracking
Operator Infrastructure:
- ServiceMonitor for operator metrics
- NetworkPolicy for operator security
Responsibilities:
- Parse command-line flags (images, namespace, reconcile interval, backend selection)
- Initialize controller manager with TLS, metrics, health probes
- Configure WatcherConfig - declarative setup defining all watched external resources (secrets, configmaps)
- Detect OpenShift version and select appropriate images
- Start controller and handle graceful shutdown
Key Flags: --enable-lcore (backend selection), --controller-namespace. See cmd/main.go for complete list.
Provides clean contract between main controller and component packages:
- Dependency Injection: Components receive only what they need
- No Circular Dependencies: Components don't import main controller
- Testability: Easy to mock for unit tests
- Exposes: Kubernetes client, logger, namespace, image getters, configuration
Purpose: Manages OpenShift Lightspeed application server (LEGACY backend - LLM API proxy)
Entry Point: ReconcileAppServer(reconciler.Reconciler, context, *OLSConfig)
Purpose: Manages Lightspeed Core + Llama Stack server (NEW backend - agent-based with MCP support)
Entry Point: ReconcileLCore(reconciler.Reconciler, context, *OLSConfig)
Key Features:
- Dynamic LLM configuration (supports OpenAI, Azure OpenAI, others)
- CA certificate support for custom TLS
- RAG support with vector database
- MCP (Model Context Protocol) integration
- Metrics with K8s authentication
Purpose: Manages PostgreSQL database for conversation cache storage
Entry Point: ReconcilePostgres(reconciler.Reconciler, context, *OLSConfig)
Purpose: Manages OpenShift Console plugin for web UI integration
Entry Points: ReconcileConsoleUI() (setup), RemoveConsoleUI() (cleanup when disabled)
Purpose: Shared functionality across all components
Contains:
- Constants (resource names, labels, annotations, error messages)
- Helper functions (hash computation, resource comparison, equality checks)
- Status utilities (condition management)
- Validation (certificates, version detection)
- Test helpers (shared fixtures, test reconciler, CR generators)
Purpose: External resource watching with multi-level filtering
Architecture:
- Predicate Filtering - Fast O(1) event filtering at watch level
- Data Comparison - Deep equality checks using
apiequality.Semantic.DeepEqual() - Restart Logic - Maps changed resources to affected deployments via WatcherConfig
Configuration: All watcher behavior defined in cmd/main.go via WatcherConfig (data-driven, no hardcoded resource names)
Watches: OpenShift system resources and user-provided resources referenced in OLSConfig.
See internal/controller/watchers/ and cmd/main.go for implementation details.
High-level reconciliation sequence:
1. Reconcile operator-level resources (ServiceMonitor, NetworkPolicy)
2. Check if CR is being deleted → run finalizer cleanup if needed
3. Add finalizer if not present
4. Validate OLSConfig CR exists
5. Reconcile LLM Secrets (validate credentials)
6. Reconcile Components:
- Console UI (if enabled)
- PostgreSQL (if conversation cache enabled)
- Backend (AppServer OR LCore - mutually exclusive, controlled by --enable-lcore flag)
7. Update Status Conditions based on deployment readiness
The operator uses a finalizer (ols.openshift.io/finalizer) to ensure proper cleanup when OLSConfig CR is deleted.
Why Needed:
- Console UI cleanup: ConsolePlugin is cluster-scoped and not cascade-deleted by owner references
- PVC cleanup: PersistentVolumeClaims can block deletion if not properly released
- Race condition prevention: Ensures complete cleanup before CR can be recreated (important for tests and sequential deployments)
Implementation (internal/controller/olsconfig_controller.go):
// Finalizer is added on first reconciliation
if !controllerutil.ContainsFinalizer(olsconfig, utils.OLSConfigFinalizer) {
controllerutil.AddFinalizer(olsconfig, utils.OLSConfigFinalizer)
r.Update(ctx, olsconfig)
return
}
// On deletion, run cleanup before removing finalizer
if !olsconfig.DeletionTimestamp.IsZero() {
if controllerutil.ContainsFinalizer(olsconfig, utils.OLSConfigFinalizer) {
r.finalizeOLSConfig(ctx, olsconfig) // Cleanup logic
controllerutil.RemoveFinalizer(olsconfig, utils.OLSConfigFinalizer)
r.Update(ctx, olsconfig)
}
return
}Cleanup Sequence (finalizeOLSConfig):
- Remove Console UI: Deactivate plugin from Console CR, delete ConsolePlugin CR
- Wait for owned resources: Poll for up to 3 minutes until deployments, services, PVCs are deleted (cascade deletion)
- Remove finalizer: Allows Kubernetes to remove CR from etcd
Error Handling:
- Cleanup errors are logged but don't block finalizer removal
- Prevents CRs from being stuck in
Terminatingstate - Console UI removal handles missing Console CR gracefully (test environments, non-OpenShift clusters)
Testing:
- Unit tests:
internal/controller/olsconfig_finalizer_test.go - Test helper:
cleanupOLSConfig()insuite_test.go(removes finalizers, waits for deletion) - E2E test timeout: 3 minutes for
DeleteAndWait()to account for finalizer cleanup
Implementation: See internal/controller/watchers/ for watcher logic, cmd/main.go for WatcherConfig, olsconfig_helpers.go for annotation logic.
The operator uses two distinct approaches for different resource ownership models:
Resources created and fully managed by the operator (Deployments, Services, operator-generated ConfigMaps/Secrets).
Change Detection:
- Uses Kubernetes owner references (
controllerutil.SetControllerReference) - Monitored via
Owns()inSetupWithManager() - Changes detected through ResourceVersion tracking in deployment annotations
- Automatic reconciliation on modification/deletion
Benefits: Auto-cleanup on CR deletion, Kubernetes-native lifecycle, efficient change detection
Resources created by users, referenced but not owned by the operator (LLM credentials, user TLS certs, CA ConfigMaps, OpenShift system resources).
Change Detection:
- Uses watcher annotations (
ols.openshift.io/watch-olsconfig) and name-based filtering - Monitored via
Watches()with custom event handlers - Changes detected through data comparison (
apiequality.Semantic.DeepEqual) - Targeted deployment restarts configured via
WatcherConfigincmd/main.go
Benefits: Respects user ownership, supports cross-namespace sharing, fine-grained restart control
Testing: See CONTRIBUTING.md for testing strategy, test helpers, and running tests. Unit tests use Ginkgo/Gomega, E2E tests in test/e2e/. Always use make test (never go test directly).
OLM Documentation: For operators deployed via OLM, see comprehensive guides in docs/:
- OLM Bundle Management
- OLM Catalog Management
- OLM Integration & Lifecycle
- OLM Testing & Validation
- OLM RBAC & Security
Contributing: For adding components or modifying existing ones, see CONTRIBUTING.md for detailed step-by-step instructions.
Coding Conventions: See AGENTS.md for coding conventions and patterns used in this codebase.
For user-facing documentation, see README.md.