Skip to content

NETOBSERV-2534 Have a way to pause Network Observability functions#2362

Open
jpinsonneau wants to merge 2 commits intonetobserv:mainfrom
jpinsonneau:2534
Open

NETOBSERV-2534 Have a way to pause Network Observability functions#2362
jpinsonneau wants to merge 2 commits intonetobserv:mainfrom
jpinsonneau:2534

Conversation

@jpinsonneau
Copy link
Member

@jpinsonneau jpinsonneau commented Jan 22, 2026

Description

  • Add a HOLD flag in CSV to hold controller reconciliation and delete all resources appart from user CRs and Namespaces
  • Report hold state in FlowCollector status and conditions with a clear text stating how to rollback the operator reconcile mechanism
Hold mode is active. All operator-managed resources have been deleted while preserving FlowCollector, FlowCollectorSlice, and FlowMetric CRDs and namespaces. To disable hold mode, set the HOLD environment variable to false in the operator CSV (ClusterServiceVersion) in the openshift-netobserv-operator namespace, or restart the operator with --hold=false.
image image

Also polished the plugin for clarity: netobserv/network-observability-console-plugin#1224

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labeled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@openshift-ci
Copy link

openshift-ci bot commented Jan 22, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jotak for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jpinsonneau jpinsonneau changed the title NETOBSERV-2534 hold flag NETOBSERV-2534 Have a way to pause Network Observability functions Jan 22, 2026
@jpinsonneau jpinsonneau requested review from jotak and stleerh January 22, 2026 18:07
@jpinsonneau jpinsonneau added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 22, 2026
@github-actions
Copy link

New images:

  • quay.io/netobserv/network-observability-operator:93e5ad9
  • quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-93e5ad9
  • quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-93e5ad9

They will expire after two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:93e5ad9 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-93e5ad9

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-93e5ad9
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@codecov
Copy link

codecov bot commented Jan 22, 2026

Codecov Report

❌ Patch coverage is 55.43478% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.93%. Comparing base (58a780c) to head (ed67290).
⚠️ Report is 32 commits behind head on main.

Files with missing lines Patch % Lines
internal/pkg/cleanup/cleanup.go 60.37% 14 Missing and 7 partials ⚠️
internal/controller/flowcollector_controller.go 0.00% 7 Missing and 1 partial ⚠️
internal/controller/static/static_controller.go 25.00% 1 Missing and 2 partials ⚠️
internal/controller/flp/flp_controller.go 0.00% 1 Missing and 1 partial ⚠️
...nal/controller/monitoring/monitoring_controller.go 0.00% 1 Missing and 1 partial ⚠️
internal/controller/networkpolicy/np_controller.go 0.00% 1 Missing and 1 partial ⚠️
internal/pkg/manager/status/status_manager.go 90.00% 2 Missing ⚠️
main.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2362      +/-   ##
==========================================
- Coverage   72.23%   71.93%   -0.30%     
==========================================
  Files          93       93              
  Lines       10346    10485     +139     
==========================================
+ Hits         7473     7542      +69     
- Misses       2401     2468      +67     
- Partials      472      475       +3     
Flag Coverage Δ
unittests 71.93% <55.43%> (-0.30%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
api/flowcollector/v1beta2/flowcollector_types.go 100.00% <ø> (ø)
internal/pkg/manager/config.go 0.00% <ø> (ø)
main.go 0.00% <0.00%> (ø)
internal/controller/flp/flp_controller.go 75.59% <0.00%> (+5.99%) ⬆️
...nal/controller/monitoring/monitoring_controller.go 69.36% <0.00%> (-1.28%) ⬇️
internal/controller/networkpolicy/np_controller.go 81.25% <0.00%> (-3.54%) ⬇️
internal/pkg/manager/status/status_manager.go 86.92% <90.00%> (+0.26%) ⬆️
internal/controller/static/static_controller.go 79.31% <25.00%> (-2.51%) ⬇️
internal/controller/flowcollector_controller.go 72.81% <0.00%> (-6.14%) ⬇️
internal/pkg/cleanup/cleanup.go 60.75% <60.37%> (-20.01%) ⬇️

... and 9 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jan 26, 2026
if err := deleteResourcesByType(ctx, cl, listObj, labelSelector); err != nil {
return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Is there any reason to separate it into two lists?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was for clarity but there is no technical reason here.
Would you prefer all in one keeping the comments to separate namespaced resources and cluster ones ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nit.

/lgtm

}).WithTimeout(timeout).WithPolling(interval).Should(BeTrue())
})
})
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's a lot of testing code for something that's not likely to break. This is more a comment than something to do.

The most likely issue in the future is that a new resource is used and the list of resources to delete from isn't updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's a defensive approach and we can simplify it. The goal is to cover the most important resources types and ensure it doesn't delete the ones not managed by the operator.

If we prefer, we can reduce that test to the strict minimal and check the list of types in another test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, we could even parse the RBAC to ensure the list of kinds we delete is in sync with the roles 🤔
But the test file will be even bigger 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants