Best practices for AI coding agents on NetObserv Operator.
Note: Symlinked as CLAUDE.md for Claude Code auto-loading.
NetObserv Operator - Kubernetes/OpenShift operator for network observability (operator-sdk)
Components:
- eBPF Agent: Network flow generation from packets (DaemonSet)
- flowlogs-pipeline: Flow collection, enrichment, export (Deployment/StatefulSet) - Console Plugin: OpenShift visualization (optional)
- CRD:
FlowCollectorv1beta2 - single cluster-wide resource namedcluster - Integrations: Loki (optional), Prometheus, Kafka (optional)
Key Directories:
api/flowcollector/v1beta2/: CRD definitionsinternal/controller/: Reconciliation logicconfig/: Kustomize manifestsdocs/: FlowCollector spec, architecture
Only ONE FlowCollector allowed, named cluster:
if flowCollector.Name != constants.FlowCollectorName {
return fmt.Errorf("only one FlowCollector allowed, named %s", constants.FlowCollectorName)
}FlowCollector v1beta2 is stable:
- ✅ Add optional fields with defaults, use
+optionalmarker - ❌ Never remove/rename fields or change types
After CRD/CSV changes: make update-bundle
Never hardcode. Use env vars:
RELATED_IMAGE_EBPF_AGENTRELATED_IMAGE_FLOWLOGS_PIPELINERELATED_IMAGE_CONSOLE_PLUGIN
Support: amd64, arm64, ppc64le, s390x
Good Example:
Update internal/controller/flowcollector_controller.go to add validation for
spec.agent.ebpf.logLevel (valid: trace, debug, info, warn, error).
Add webhook validation. Include unit tests and run make update-bundle.
Bad Example:
Add log level validation
Key Principles:
- Specify file paths explicitly
- Reference existing patterns
- Mention testing requirements
- Check dependencies in go.mod first
Add spec.agent.ebpf.newFeature (bool, default: false):
1. Update api/flowcollector/v1beta2/flowcollector_types.go (+kubebuilder markers)
2. Modify internal/controller/ to use field
3. Add unit tests
4. Run make update-bundle
Update RELATED_IMAGE_FLOWLOGS_PIPELINE to vX.Y.Z.
Check main.go and internal/controller/flp/ deployment templates.
FlowCollector reconciliation failing with error "X".
Check internal/controller/flowcollector_controller.go:
- Reconcile() logic
- Error handling
- Status conditions
Suggest fixes with proper error handling patterns.
Modify Kafka producer config in eBPF agent.
Context: spec.deploymentModel=Kafka
Update internal/controller/ for Kafka-enabled agent configuration.
Update console plugin UI columns, filters, or scopes.
Files to modify:
1. internal/controller/consoleplugin/config/static-frontend-config.yaml
- columns: Define table columns (id, name, field, filters, features)
- filters: Define filter components and UI behavior
- scopes: Define aggregation scopes (namespace, node, owner, etc.)
- fields: Field definitions for documentation
2. internal/controller/consoleplugin/config/config.go
- Update Go structs if adding new config properties
3. Rebuild: Changes are embedded at compile time via go:embed
Note: Static config changes require operator rebuild/redeploy.
Review for:
1. Code style consistency
2. Error handling (wrap with context)
3. Unit test coverage (Ginkgo/Gomega)
4. CRD validation markers
5. Documentation updates
6. Backward compatibility
7. Security (RBAC, TLS, input validation)
8. Performance and Resource utilization, including watching for memory usage impact for large scale clusters.
Generate tests for detectSubnets in internal/controller/flp/detect_subnets.go:
- Valid CIDR ranges
- Invalid input
- Edge cases (empty, nil)
Use Ginkgo/Gomega patterns.
Test on Kind cluster:
1. IMAGE="quay.io/me/netobserv:test" make image-build image-push deploy
2. make deploy-sample-cr
3. Verify logs and functionality
Three deployment modes (check spec.loki.mode):
- Monolithic: Single instance
- LokiStack: Loki Operator (multi-tenancy enabled)
- Microservices: Distributed
- Sampling: Default 50 (1:50 packets). Lower = more flows/resources
- Batching:
cacheMaxFlows,cacheActiveTimeout(agent);writeBatchWait,writeBatchSize(Loki) - Memory: Default limits 800MB
- Metrics: Prefix
netobserv_*, watch cardinality
- OpenShift:
openshift-netobserv-operator - Community:
netobserv - Use
flowCollector.Spec.Namespacefor deployed resources
Two types of configuration:
- Dynamic (FlowCollector CR):
spec.consolePlugin.*- reconciled at runtimeportNaming,quickFilters,logLevel,replicas, etc.
- Static (Embedded YAML):
static-frontend-config.yaml
- Table columns, filters, scopes, field definitions
- Embedded via
go:embeddirective - requires rebuild - Merged with dynamic config in consoleplugin_objects.go
Before modifying workflows:
- Run
hack/test-workflow.sh - Test on
workflow-testbranch - Verify images on Quay.io
Essential Commands:
make build lint test # Build and test
make update-bundle # After CRD changes
make deploy-sample-cr # Deploy FlowCollector
make undeploy # Clean upKey Files:
- CRD: api/flowcollector/v1beta2/flowcollector_types.go
- Controller: internal/controller/flowcollector_controller.go
- FLP: internal/controller/flp/flp_transfo_reconciler.go
- Console Plugin Static Config: internal/controller/consoleplugin/config/static-frontend-config.yaml
- Docs: docs/FlowCollector.md
- Sample: config/samples/flows_v1beta2_flowcollector.yaml
API Stability:
- FlowCollector: v1beta2 (stable - backward compatible changes only)
- Min OpenShift: 4.10+
- Min Kubernetes: 1.23+
1. Research: "Explain packet drop detection in eBPF agent"
2. Plan: "Add field for drop reasons filtering - suggest changes"
3. Implement: "Implement with validation and tests"
4. Review: "Review for edge cases and errors"
5. Bundle: "Run make update-bundle to regenerate docs"
6. Test: "Provide test scenarios"
Before commit:
- AI code review
make build lint testmake update-bundle(if CRD/CSV changed)- Update docs
- Conventional commit messages
- DEVELOPMENT.md - Build, test, deploy
- docs/Architecture.md - Component relationships
- docs/FlowCollector.md - API reference
- FAQ.md - Troubleshooting
- Contributing
Remember: AI agents need clear context. Always review generated code, test thoroughly, and follow project conventions.