fix: init NNC from reconciler instead of directly by rbtr · Pull Request #4199 · Azure/azure-container-networking

rbtr · 2026-01-22T00:30:48Z

Reason for Change:
There's no clear reason that CNS initializes with an NNC that it gets out of band from the typical NNC reconciler flow.
This separate path for fetching the NNC leads to bugs such as ingesting an NNC for a previous Node of the same name and state corruption once the reconciler (which is smart enough to wait for the actual NNC for its Node) starts.
Additionally, any caching, filtering, or other selection done to NNCs for efficiency is not applied or must be duplicated.

Instead, initialization can be run from the first NNC that the reconciler is pushed. This consolidates all CNS NNC reads to a single location where list-watch optimizations and filtering can be centralized.

Copilot

Pull request overview

This pull request refactors the CNS (Container Networking Service) initialization to use the NNC (NodeNetworkConfig) reconciler flow instead of a separate out-of-band initialization path. This consolidates NNC reads to a single location, preventing bugs from ingesting stale NNCs and ensuring caching/filtering optimizations are consistently applied.

Changes:

Removed direct NNC client initialization and replaced it with an initializer callback passed to the reconciler
Modified the reconciler to invoke the initializer function on the first reconcile
Updated function signatures to accept the initializer as a parameter

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File	Description
cns/service/main.go	Removed direct NNC client creation and retry logic; replaced with initialization closure passed to reconciler
cns/kubecontroller/nodenetworkconfig/reconciler.go	Added initializer callback field and invocation logic in Reconcile method
cns/kubecontroller/nodenetworkconfig/reconciler_test.go	Updated test calls to include no-op initializer function parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cns/service/main.go

cns/kubecontroller/nodenetworkconfig/reconciler.go

cns/service/main.go

cns/kubecontroller/nodenetworkconfig/reconciler.go

cns/service/main.go

rbtr · 2026-01-22T16:57:46Z

/azp run Azure Container Networking PR

azure-pipelines · 2026-01-22T16:57:59Z

Azure Pipelines successfully started running 1 pipeline(s).

rbtr · 2026-01-23T19:38:29Z

/azp run Azure Container Networking PR

azure-pipelines · 2026-01-23T19:38:41Z

Azure Pipelines successfully started running 1 pipeline(s).

cns/kubecontroller/nodenetworkconfig/reconciler.go

cns/service/main.go

timraymond · 2026-01-23T20:00:54Z

cns/kubecontroller/nodenetworkconfig/reconciler.go

+			logger.Errorf("[cns-rc] initializer failed during reconcile: %v", err)
+			return reconcile.Result{}, errors.Wrap(err, "initializer failed during reconcile")
+		}
+		r.initializer = nil


I think it's the same behavior for this to be wedged until initialization succeeds, but do we want to eventually circuit break? It seems like that was the original intention of the existing logic as I read the comment above it... whether that was true in practice may be dubious though because of swallowing the retry error and the UntilSucceeded().

You read the prior intent correctly - we wanted to retry for a while and eventually fail out as a signal to the user that something was wrong. However, the retry was long enough that CNS wouldn't exit frequently enough to end up in CrashLoopBackoff and instead the Pod would just clock 5-6 restarts per hour.
Eventually someone might notice, but in practice this effectively was a retry-forever.

By tying this in to the Reconcile loop, we will be able to lean on the ctrlruntime machinery - it automatically does retry backoff and exposes metrics for failed reconciles which we can collect and use as a signal.

nddq

minor comments, but the changes make sense

nddq · 2026-01-23T20:33:05Z

cns/kubecontroller/nodenetworkconfig/reconciler.go

+	// call initFunc on first reconcile and never again
+	if r.initializer != nil {
+		if err := r.initializer(nnc); err != nil {
+			logger.Errorf("[cns-rc] initializer failed during reconcile: %v", err)


cns-rc? that's new 🙂

someone else started this actually, it's all over this file?

maybe it was me c970d07#diff-694b7c4cfba89c7ad55426495e4aa08416a34859e2b01df8c5e140c2a9906db6

cns/kubecontroller/nodenetworkconfig/reconciler.go

rbtr requested a review from a team as a code owner January 22, 2026 00:30

rbtr requested review from Copilot and timraymond January 22, 2026 00:30

Copilot started reviewing on behalf of rbtr January 22, 2026 00:31 View session

Copilot AI reviewed Jan 22, 2026

View reviewed changes

rbtr requested review from nddq and thatmattlong January 22, 2026 17:05

rbtr added 2 commits January 22, 2026 19:26

fix: init NNC from reconciler instead of directly

de2b3ac

add tests

14b9b41

rbtr force-pushed the fix/init-from-reconcile branch from cfab669 to 14b9b41 Compare January 23, 2026 17:48

rbtr added bug cns Related to CNS. labels Jan 23, 2026

rbtr added the fix Fixes something. label Jan 23, 2026

timraymond reviewed Jan 23, 2026

View reviewed changes

nddq reviewed Jan 23, 2026

View reviewed changes

rbtr linked an issue Feb 2, 2026 that may be closed by this pull request

Replacing a Node with the same name in Overlay can corrupt CNS state #4210

Open

rbtr mentioned this pull request Feb 5, 2026

fix: do not ingest duplicate IPs from a different NC #4201

Open

cleanup

ca625ec

Conversation

rbtr commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rbtr commented Jan 22, 2026

Uh oh!

azure-pipelines bot commented Jan 22, 2026

Uh oh!

rbtr commented Jan 23, 2026

Uh oh!

azure-pipelines bot commented Jan 23, 2026

Uh oh!

Uh oh!

Uh oh!

timraymond Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

rbtr Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

nddq left a comment

Choose a reason for hiding this comment

Uh oh!

nddq Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

rbtr Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

rbtr Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

rbtr Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rbtr commented Jan 22, 2026 •

edited

Loading