Skip to content

fix: allowing the exceptionhandler to see reconnection exceptions#7367

Draft
shawkins wants to merge 1 commit intofabric8io:mainfrom
shawkins:iss7164
Draft

fix: allowing the exceptionhandler to see reconnection exceptions#7367
shawkins wants to merge 1 commit intofabric8io:mainfrom
shawkins:iss7164

Conversation

@shawkins
Copy link
Contributor

closes: #7164

draft of changes to allow the ExceptionHandler to deal with reconnection exceptions as well.

To keep this compatible with current behavior the WatcherException is converted to a KubernetesClientException when passed into the the ExceptionHandler - by default that will mean we'll keep retrying.

Users would need to update / set their own ExceptionHandler to cause the informer to give up in this case. This is mostly inline with what was requested on #7164

Alternatives to this are:

  • Pass the exception in as a WatcherException, and let the informer stop watching - by itself this would be a breaking change.
  • Alternatively don't use the ExceptionHandler mechanism, and add Informer configuration for the number of watch reconnections it will attempt before giving up - it would default to unlimited for backwards compatibility. This is more inline with what is requested on Infinite loop in AbstractWatchManager #7365

cc @csviri @manusa @metacosm what would you like to see done here?

Description

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change
  • Chore (non-breaking change which doesn't affect codebase;
    test, version modification, documentation, etc.)

Checklist

  • Code contributed by me aligns with current project license: Apache 2.0
  • I Added CHANGELOG entry regarding this change
  • I have implemented unit tests to cover my changes
  • I have added/updated the javadocs and other documentation accordingly
  • No new bugs, code smells, etc. in SonarCloud report
  • I tested my code in Kubernetes
  • I tested my code in OpenShift

closes: fabric8io#7164

Signed-off-by: Steve Hawkins <shawkins@redhat.com>
*
* @deprecated to be removed in a future release.
*/
@Deprecated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@Deprecated
@Deprecated(forRemoval = true)

* Called to reestablish the connection. Should only be called once per request.
*/
void scheduleReconnect(WatchRequestState state) {
void scheduleReconnect(WatchRequestState state, Throwable t) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to keep the original, one-arg version and add the two-arg version to which it would delegate so that call sites do not need to change? Looks also cleaner than to pass null explicitly… I wish Java had a way to declare optional parameters…

@metacosm
Copy link
Collaborator

metacosm commented Feb 6, 2026

Regarding the overall question of which approach to choose, I'm not sure… The more correct isWatching behavior is causing issues upstream because it's difficult to figure out, without human intervention that knows of the context, whether or not a watcher getting repeatedly disconnected is a "real" issue or not.

@shawkins
Copy link
Contributor Author

shawkins commented Feb 6, 2026

Regarding the overall question of which approach to choose, I'm not sure… The more correct isWatching behavior is causing issues upstream because it's difficult to figure out, without human intervention that knows of the context, whether or not a watched getting repeatedly disconnected is a "real" issue or not.

I understand it's now causing problems, but that's more because it was not implemented correctly - it was nearly identicaly to isRunning, which was not the intent.

As for the correct behavior it's just a question of:

  • do we just want alternative behavior to be driven via a custom ExceptionHandler, in which case this pr is fine as is
  • or do we want a new property, either on the informer or in the client configuration or both that dictactes how many times an informer is allowed to retry starting a failed watch? It would probably have to default to unlimited to not be breaking.

@metacosm
Copy link
Collaborator

metacosm commented Feb 6, 2026

* or do we want a new property, either on the informer or in the client configuration or both that dictactes how many times an informer is allowed to retry starting a failed watch? It would probably have to default to unlimited to not be breaking.

This solution would probably lead down the way of replicating the kube probe configuration 😄

@shawkins
Copy link
Contributor Author

shawkins commented Feb 6, 2026

* or do we want a new property, either on the informer or in the client configuration or both that dictactes how many times an informer is allowed to retry starting a failed watch? It would probably have to default to unlimited to not be breaking.

This solution would probably lead down the way of replicating the kube probe configuration 😄

It's a little different. Here we're talking about what behavior to exhibit when the watch fails to re-establish. Currently it's unlimited retries, but there's been two user requests to refine that behavior.

In any case there's still absolutely no guarentee that restarting the application will help in these circumstances. In older versions of fabric8 there were some issues with watches not re-establishing, or cache sync issues, but I don't believe we have seen an issue along those lines in a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SharedIndexInformer endlessly attempts to reconnect to the down Kube API server

2 participants