[digiplex] Fix runtime exceptions by jlaur · Pull Request #17864 · openhab/openhab-addons

jlaur · 2024-12-08T00:02:44Z

This fixes possible NumberFormatException and IndexOutOfBoundsException for unexpected input.

Also adds test coverage and debug logging of messages that could not be parsed. The coverage is based on the current logic rather than real messages, since I don't know what they look like.

Related to #17829
Resolves #17828

Add test coverage Signed-off-by: Jacob Laursen <jacob-github@vindvejr.dk>

Signed-off-by: Jacob Laursen <jacob-github@vindvejr.dk>

rmichalak

It looks good on paper, however be aware I currently have no ways of testing it live with my alarm system. I'm still on openhab v2 and slowly upgrading my setup to v4 (which triggered me to raise the original issue after a few years of absence in the project).
I need to upgrade my dev env and it can take me a while to do so and be fully up to speed with everything that changed since v2.

rmichalak · 2024-12-08T20:57:36Z

Actually, there may be more to it... before your fix, these malformed responses where throwing an exception and this was triggering restart immediately. Now, they just return UnknownResponse which is considered a valid (but unknown) response and counted as such here.
There's also a check for 'stalled' messages (these that did not receive a response) here and it triggers restart as well.
With your changes, digiplex may return rubbish all the time and we will never know about it (except for log entries)

I agree that the changes you made make sense, but perhaps we should also distinguish between 'unknown' response (as in - not handled by this binding, like here) and erroneous response, which we should count and restart the binding if we cross the threshold.

This may require more changes and I would need to go down to the protocol details once more, plus I will need to setup my dev environment from scratch 😓

rmichalak · 2024-12-08T21:02:00Z

Also mind that there's an alternative binding for the same alarm system: https://www.openhab.org/addons/bindings/paradoxalarm/
which uses an IP150 module (an add-on module sold separately) with more functionality. Mine uses RS232 communication included in the base board (which is cheaper, less functional, and apparently not so stable, at least for me, hence all the hacks and restarts).

jlaur · 2024-12-08T22:31:23Z

Actually, there may be more to it... before your fix, these malformed responses where throwing an exception and this was triggering restart immediately. Now, they just return UnknownResponse which is considered a valid (but unknown) response and counted as such here.

I agree, see also #17829 (comment). So in other words, it seems we already had this issue with this existing sanity check:

openhab-addons/bundles/org.openhab.binding.digiplex/src/main/java/org/openhab/binding/digiplex/internal/communication/DigiplexResponseResolver.java

Lines 42 to 44 in f39415b

    
           if (message.length() < 4) { // sanity check: try to filter out malformed responses 
        
               return new UnknownResponse(message); 
        
           }

which I stayed consistent with, but expanded to cover all cases, so we avoid throwing RuntimeExceptions. But if in fact reconnection is needed, something must be changed. How do you distinguish between an unknown message and a malformed message that should cause reconnection?

And how is it currently handled when e.g. a NumberFormatException is thrown, I guess the binding will stop working?

This may require more changes and I would need to go down to the protocol details once more, plus I will need to setup my dev environment from scratch 😓

I guess there is no rush, or have you had this reported by others as well? If it won't make it for 4.3, it can still be backported into a 4.3.x patch later.

rmichalak · 2024-12-09T07:35:56Z

How do you distinguish between an unknown message and a malformed message that should cause reconnection?

I don't know yet, I guess I need to check the protocol again and make sure we handle all the proper responses, even if only with UnknownResponse case for responses/events we don't want to handle explicitly. The rest will be erroneous...

And how is it currently handled when e.g. a NumberFormatException is thrown, I guess the binding will stop working?

Correct, and that's perhaps why it makes sense to still merge #17829 (limited to receiver thread) as it applies a bandaid on this wound at least (even if not fitting perfectly).

I guess there is no rush, or have you had this reported by others as well?

No, I didn't have any reports. Honestly, I'm not sure if anyone is using this binding, apart from me... I guess the one with IP150 module is more popular (and the module itself is cheaper now, than it used to be a few years ago). I may even consider buying it, but since I'm already in this rabbit hole, I would like to leave my extension stable...

jlaur · 2024-12-10T19:53:03Z

And how is it currently handled when e.g. a NumberFormatException is thrown, I guess the binding will stop working?

Correct, and that's perhaps why it makes sense to still merge #17829 (limited to receiver thread) as it applies a bandaid on this wound at least (even if not fitting perfectly).

But doesn't that PR suffer from this same problem:

be aware I currently have no ways of testing it live with my alarm system.

?

So maybe it does way too many (potentially unneeded) reconnections because of running into the parser bugs? Maybe it doesn't restart enough, if malformed messages are shorter than 4 characters?

And then there is the already mentioned issue that it may hide bugs because it catches all exceptions - which is why I decided to give this PR a shot.

Preferable you could test this when you have your development system up running again, although I realize that might be too late for 4.3. Have in mind that we can also include such bugfixes in patch releases. I think as a temporary solution (any maybe no one, even including you, is currently using the 4.2/4.3 version of the binding) this PR could be merged, because it must be better to gracefully handle malformed message (without triggering reconnection) than to let the binding crash, which will not trigger reconnection either, or..?

rmichalak · 2024-12-16T19:37:45Z

But doesn't that PR suffer from this same problem:

The original PR I proposed (#17829) is tested as it's been working in my environment for over three years (on openhab2). I just reapplied it to openhab v4 codebase and as it's very simple in nature, it should work there as well. Agree it's rather rudimentary and we can do better, but it works at least....

So maybe it does way too many (potentially unneeded) reconnections because of running into the parser bugs? Maybe it doesn't restart enough, if malformed messages are shorter than 4 characters?

What I observed with my alarm system (which may be very specific to my env), is that it starts with a few malformed responses and then the alarm system stops responding at all. So when we receive a malformed message, there's really no going back from this stage. we need to restart the connection. I tried with a few different baud rates and other connection settings but I couldn't achieve long-term stability. I tried exchanging all the components in the middle (cable, rs232 -> usb converter) without any luck.

Perhaps some restarts are not needed because of shortcoming of the current implementation and you are right that some of the messages shorter than 4 characters may not trigger restart, but for this there's another safeguard to count the number of responses. If it does not match the number of messages sent (by a threshold), we trigger the restart as well.

I may be able to run more tests in January and I think there's really no rush with it... Seems like I may be the only one using this binding (or it's indeed only about my faulty alarm system going south), as otherwise we would have had some reports about binding not working.

jlaur · 2024-12-16T19:44:14Z

But doesn't that PR suffer from this same problem:

The original PR I proposed (#17829) is tested as it's been working in my environment for over three years (on openhab2). I just reapplied it to openhab v4 codebase and as it's very simple in nature, it should work there as well. Agree it's rather rudimentary and we can do better, but it works at least....

Sorry, I missed this point, that actually it is an old fix and already tested.

I may be able to run more tests in January and I think there's really no rush with it...

👍 Since there doesn't seem to be any rush, do we agree to leave both PR's open for the time being, and then try to "get it right" once you find the time?

rmichalak · 2024-12-16T19:50:12Z

Since there doesn't seem to be any rush, do we agree to leave both PR's open for the time being, and then try to "get it right" once you find the time?

Sure thing. let's do that!

jlaur · 2025-01-08T21:22:18Z

Superceded by #18035.

jlaur added 2 commits December 8, 2024 01:00

Fix RuntimeExceptions

2cac465

Add test coverage Signed-off-by: Jacob Laursen <jacob-github@vindvejr.dk>

Remove duplicated parsing

15c7684

Signed-off-by: Jacob Laursen <jacob-github@vindvejr.dk>

jlaur requested a review from rmichalak as a code owner December 8, 2024 00:02

jlaur added the bug An unexpected problem or unintended behavior of an add-on label Dec 8, 2024

This was referenced Dec 8, 2024

[digiplex] Received thread stops when an exception happens inside #17828

Closed

[digiplex] Fix exception handling in sender and receiver threads #17829

Closed

rmichalak approved these changes Dec 8, 2024

View reviewed changes

jlaur added the additional testing preferred The change works for the pull request author. A test from someone else is preferred though. label Dec 8, 2024

rmichalak mentioned this pull request Jan 4, 2025

[digiplex] Handle erroneous responses and restart the bridge #18035

Merged

jlaur closed this Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

[digiplex] Fix runtime exceptions#17864

[digiplex] Fix runtime exceptions#17864
jlaur wants to merge 2 commits intoopenhab:mainfrom
jlaur:digiplex-fix-runtime-exceptions

jlaur commented Dec 8, 2024 •

edited

Loading

Uh oh!

rmichalak left a comment

Uh oh!

rmichalak commented Dec 8, 2024 •

edited

Loading

Uh oh!

rmichalak commented Dec 8, 2024 •

edited

Loading

Uh oh!

jlaur commented Dec 8, 2024

Uh oh!

rmichalak commented Dec 9, 2024 •

edited

Loading

Uh oh!

jlaur commented Dec 10, 2024 •

edited

Loading

Uh oh!

rmichalak commented Dec 16, 2024

Uh oh!

jlaur commented Dec 16, 2024

Uh oh!

rmichalak commented Dec 16, 2024

Uh oh!

jlaur commented Jan 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

jlaur commented Dec 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rmichalak left a comment

Choose a reason for hiding this comment

Uh oh!

rmichalak commented Dec 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rmichalak commented Dec 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlaur commented Dec 8, 2024

Uh oh!

rmichalak commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jlaur commented Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rmichalak commented Dec 16, 2024

Uh oh!

jlaur commented Dec 16, 2024

Uh oh!

rmichalak commented Dec 16, 2024

Uh oh!

jlaur commented Jan 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jlaur commented Dec 8, 2024 •

edited

Loading

rmichalak commented Dec 8, 2024 •

edited

Loading

rmichalak commented Dec 8, 2024 •

edited

Loading

rmichalak commented Dec 9, 2024 •

edited

Loading

jlaur commented Dec 10, 2024 •

edited

Loading