[TEST][NO-MERGE] Stress test domain sockets#382
Draft
bell-db wants to merge 10 commits intoscalapb:masterfrom
Draft
[TEST][NO-MERGE] Stress test domain sockets#382bell-db wants to merge 10 commits intoscalapb:masterfrom
bell-db wants to merge 10 commits intoscalapb:masterfrom
Conversation
This was referenced Sep 23, 2024
3b2bb9e to
d0d7cd0
Compare
d0d7cd0 to
8bbb0a5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
It's found that if we use domain sockets to get around a port conflict issue, the communication is still not very reliable on macOS. With short messages, it gets stuck 6 out of 100k times. With 100KB messages, it gets stuck very frequently (2 out of 100 times).
The same test (with
nc -N) can pass on Linux (Ubuntu 20.04.6 LTS, Xeon(R) Platinum 8375C).Netcat client missing EOF
It turns out that the netcat bundled with macOS is pretty old and buggy (there is no version number.
man ncsays 2001, while there is some speculation that it is from 2005). It gets stuck frequently being a domain socket client (the server is a reliablesocatecho server), especially with large messages (100KB), which is evident withnmap ncat or
socatare, on the other hand, reliable clients:However, neither is bundled on macOS.
Confusingly, this problem seems gone just by having a Scala server read timeout:
Netcat client incomplete message
Another (potentially unrelated) issue is that a netcat client pair can result in incomplete messages in bash scripts if the server doesn't send anything back (instead of e.g. echoing):
(There is no obvious way to implement an echo server with macOS netcat)
This doesn't reproduce directly in the Terminal or with nmap ncat /
socatclients:It's unclear the root cause but might have to do with the fact the server doesn't send anything back, causing an incorrectly early termination.
Ncat / Socat
While nmap ncat and
socatclients are reliable on their own, the stress test can still fail due to stuck timeouts. It's unclear if there is a problem with the implementation here or thejunixsocketlibrary.Confusingly, this problem is also gone just by having the server read timeout mentioned before: