-
Notifications
You must be signed in to change notification settings - Fork 136
IGNITE-27381 Support MessageService metrics #7488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Throwable rootEx = unwrapRootCause(e); | ||
|
|
||
| if (!(rootEx instanceof NodeStoppingException)) { | ||
| log.error("Failed to unregister metrics source {}", e, src.name()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use ExceptionUtils.hasCause().
Also, how can an exception caused by a NodeStoppingException be thrown by this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's thrown by the busy lock in this fragment of code, then let's just use an idiom 'try to take a lock and if we're not successful, don't do anything' instead of producing the exception and immediately catching it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MessagingService-related metrics, when being unregistered, were triggering the NodeStoppingException in certain integration tests. While not a fatal error in itself (since the cluster and node were anyway shutting down), it was polluting test console output (and would likely pollute production output as well). I added this check to prevent that, while still logging any other, unexpected cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use ExceptionUtils.hasCause().
👍
| registry.unregisterSource(src); | ||
| }); | ||
| try { | ||
| if (metricSources().contains(src)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a left-over artifact from an intermediate development stage. Will remove.
| registry.unregisterSource(srcName); | ||
| }); | ||
| } | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern doesn't look very beautiful, and there is also duplication on top. Let's discuss it higher first and come back here after we are done there
| ChannelTypeRegistry channelTypeRegistry, | ||
| IgniteLogger log | ||
| IgniteLogger log, | ||
| @Nullable MetricManager metricManager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are they nullable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CriticalStripedExecutors may be used outside of the messaging service in the future, and metrics may not be so important elsewhere as to be made mandatory.
| IgniteMessageServiceThreadFactory.create(nodeName, "MessagingService-outbound", LOG, NOTHING_ALLOWED) | ||
| ); | ||
| IgniteMessageServiceThreadFactory.create(nodeName, "MessagingService-outbound", LOG, NOTHING_ALLOWED)); | ||
| outboundExecutor.initMetricSource(metricManager, "network.messaging.default.executor.outbound", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does 'default' stand here for?
| private void handleHandshakeError(Throwable ex, UUID nodeId, ChannelType type, InetSocketAddress addr) { | ||
| if (ex != null) { | ||
| if (hasCause(ex, CriticalHandshakeException.class)) { | ||
| metrics.incrementConnectionFailures(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we even need a metric for this case as the case should be eliminated
| if (fut.isCompletedExceptionally()) { | ||
| metrics.incrementInvokeTimeouts(); | ||
| } else { | ||
| fut.complete(response); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a race between checking and completing. Also, why is it about timeouts?
modules/network/src/main/java/org/apache/ignite/internal/network/DefaultMessagingService.java
Outdated
Show resolved
Hide resolved
| * @param name the name of the metric. | ||
| * @param description the metric description. | ||
| */ | ||
| public void initMetricSource(MetricManager metricManager, String name, String description) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about doing this in the constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doable, but I wanted to keep the initialization similar to that of the CriticalStripedThreadPoolExecutor.
...es/workers/src/main/java/org/apache/ignite/internal/worker/CriticalSingleThreadExecutor.java
Outdated
Show resolved
Hide resolved
…r/CriticalSingleThreadExecutor.java Co-authored-by: Roman Puchkovskiy <roman.puchkovskiy@gmail.com>
Co-authored-by: Roman Puchkovskiy <roman.puchkovskiy@gmail.com>
https://issues.apache.org/jira/browse/IGNITE-27381