-
Notifications
You must be signed in to change notification settings - Fork 729
Handle vsock spurious wakeups gracefully instead of asserting #3501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
vsock (virtual sockets) used by Apple's Containerization framework may not provide the same kqueue synchronization guarantees as regular TCP/Unix sockets. This causes spurious POLLIN notifications where no data is actually available. Instead of asserting on readResult == .none, handle it gracefully by calling readIfNeeded0() and returning .normal(.none). Refs: apple#3500 Refs: apple/containerization#503
EBADF (bad file descriptor) was already tolerated on iOS because file descriptors can be reaped in the background. The same issue occurs on macOS when using vsock through Apple's Containerization framework. The vsock hypervisor layer may invalidate file descriptors during normal operation, causing EBADF in fcntl and other syscalls. This change extends the iOS workaround to all Darwin platforms. Refs: apple#3500 Refs: apple/containerization#503
simonjbeaumont
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking the time to file the patch. I left some comments on the patch itself, but I think we also need to answer some higher-level questions first:
- Is this hypervisor behaviour something we should tolerate in this way?
- Have you tested this patch resolves the issue?
- Could we expand this PR to include tests?
| if readResult == .none { | ||
| // Spurious wakeup - no data available despite POLLIN notification. | ||
| // This can happen with vsock or other non-standard socket types that | ||
| // don't have the same kqueue synchronization guarantees. | ||
| self.readIfNeeded0() | ||
| return .normal(.none) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch and early-return has actually changed the semantics for io_uring -- when readResult == .none, we'll no longer conditionally fireChannelReadComplete down the pipeline.
| // NOTE: The original assert assumed kqueue/epoll always have implicit sync, | ||
| // but vsock (virtual sockets) may not provide the same guarantees as regular | ||
| // TCP/Unix sockets. Handle .none gracefully instead of crashing. | ||
| // See: https://github.com/apple/swift-nio/issues/3500 | ||
| // https://github.com/apple/containerization/issues/503 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is referring to an assert that will not exist once this patch is merged, so will be confusing to maintainers. I appreciate you added a link to some GitHub issues, but it would be better if the comments in the code explain the code that's actually here.
Relatedly, the lengthy commentary above, about skipping the assertion for io_uring, no longer makes sense.
| // issues. The vsock hypervisor layer may close descriptors in ways that cause EBADF | ||
| // during normal operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we confirm this statement is true? Specifically "during normal operation" -- should we really expect the hypervisor to be closing the FD and have we ruled out this being a bug in Container?
IIUC a file descriptor that is closed out-of-band, under NIO's feat, violates NIOs model, and it would be problematic to just gloss over it for everything.
The commentary above about tolerating for iOS is very specific, but this patch is opening it up to everything on macOS, not just VSOCK, even if that is something we wanted to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fwiw, I (containerization maintainer) cannot repro the issue Adrian is seeing so far. I can dig in on the hypervisor end, but I'm mostly in Simon's camp here in terms of his comment left. I don't think we should ignore EBADF either, or at the very least offer some escape hatch for folks who don't want the library to assert on EBADFs. On the hypervisor/VMM end, Virtualization.framework does seem to close all vsock FDs it has vended when the VM is stopped, but I imagine this is intended and I view this as a good thing. We had quite a few bugs originally for our container stop logic that would trigger the ebadf assert in NIO, but never on startup that I've seen yet apple/containerization#503.
The previous commit tolerated EBADF in isUnacceptableErrno() but the close() path uses a separate function isUnacceptableErrnoOnClose() which still treated EBADF as unacceptable. When vsock connections are closed by the hypervisor layer (e.g., when a container stops), NIO may attempt to close an already-invalid file descriptor, resulting in EBADF. This is normal behavior for vsock and should not crash. Refs: apple#3500 Refs: apple/containerization#503 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When vsock connections are closed by the hypervisor (e.g., container shutdown), the event loop may exit abnormally without going through the proper shutdown state machine. This would cause an assertion failure because the state is still runningAndAcceptingNewRegistrations instead of noLongerRunning. On Darwin, we now gracefully handle this case by forcing the state transition instead of asserting. This allows the event loop to clean up properly even after abnormal vsock disconnections. Refs: apple#3500 Refs: apple/containerization#503 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Motivation
When using vsock (virtual sockets) through Apple's Containerization framework, the app crashes with:
in
NIOPosix/BaseSocketChannel.swift:1205(inreadable0()).Root Cause
The assertion assumes kqueue/epoll always have implicit synchronization such that POLLIN notifications are only received when data is actually available. However, vsock (virtio-vsock) used for host-guest communication goes through a hypervisor layer and may not provide the same kqueue synchronization guarantees as regular TCP/Unix sockets.
This creates a race condition similar to io_uring where POLLIN fires but no data is ready.
Solution
Instead of asserting on
readResult == .none, handle it gracefully by:readIfNeeded0()to re-register for the next read notification.normal(.none)to indicate no data was readThis matches the existing pattern for handling io_uring spurious wakeups.
Related Issues
Testing