-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Kill build hook with SIGTERM instead of SIGKILL #15105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Kill build hook with SIGTERM instead of SIGKILL #15105
Conversation
| /* Give custom build hooks the chance to cleanup. */ | ||
| pid.setKillSignal(SIGTERM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm doesn't this mean that we'd use SIGTERM in the destructor and could get stuck on waitpid after an interrupt indefinitely for a misbehaving build hook? Shouldn't we still use SIGKILL in the destructor and have a timeout in the happy path (busywait?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How's this? The alternative is adding a busywait + kill directly to Pid::wait, which seems messy.
a71d6b1 to
f72e16a
Compare
f72e16a to
d8e4a9c
Compare
src/libutil/unix/processes.cc
Outdated
| if (killTimeout > 0) | ||
| killThread = std::thread([&]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, should we bother doing this if we are using SIGKILL already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow, this patch uses SIGTERM initially, and falls back to SIGKILL (via the killThread) if SIGTERM doesn't do the trick after 500ms. Since SIGKILL is unhandleable, we need to use SIGTERM for the build hook if we want to give them the ability to do any sort of cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I meant for the non-hook case. Don't make sense to spawn a thread to kill a derivation builder for example, but I guess that is guarded by the timeout, which is 0 by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was the intention behind the guard. I'll make the logic a little clearer by also checking killSignal.
| pid_t pid = -1; | ||
| bool separatePG = false; | ||
| int killSignal = SIGKILL; | ||
| unsigned killTimeout = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe std::chrono::duration would be a better fit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used std::chrono::milliseconds to avoid templating the function and because all chrono literals are automatically convertible.
xokdvium
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also one trivial improvement I think we could make is to have the progress bar indicate that nix is shutting down is isInterrupted() is set. Since the bar is rendered by another thread that should be doable and would provide immediate feedback to users in such cases.
d8e4a9c to
1143030
Compare
|
Example with a misbehaving build hook getting killed: https://asciinema.org/a/GMoTcJkTYHLvRfH7 |
| actPostBuildHook = 110, | ||
| actBuildWaiting = 111, | ||
| actFetchTree = 112, | ||
| actShuttingDown = 113, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so you think there could be a way to avoid adding a new activity type to handle this? The progress bar thread could install a callback for interrupts with ReceiveCallbacks.
Using a static activity seems hacky to me too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I didn't really like it either. I added a callback which just redraws the progress bar to display the shutdown message. https://asciinema.org/a/zpp6NYB15xtdjRDu
9be29a9 to
573ad90
Compare
| if (killTimeout > 0ms && killSignal != SIGKILL) | ||
| killThread = std::thread([&]() { | ||
| auto elapsed = 0ms; | ||
| while (elapsed < killTimeout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will block the Worker::run() loop from making progress for up to killTimeout milliseconds, right? That's probably not too bad, but not ideal...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an ideal world we could try doing async cleanup if we had better facilities for async things (think a task that either polls pidfd or hits a timeout). I think ASIO + Boost.process2 might have some helpers for this sort of thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if these conditions are met:
- The user interrupts nix
- The user is using a nonstandard build hook (the regular build hook responds timely to SIGTERM)
- The build hook is misbehaving during shutdown
573ad90 to
0a2b5e5
Compare
This gives custom build hooks the chance to perform any needed cleanup, which is impossible when killed with SIGKILL. After 500ms, if the hook still hasn't exited, it will be forcibly killed with SIGKILL. Resolves NixOS#14760 Signed-off-by: Lisanna Dettwyler <lisanna.dettwyler@gmail.com>
Signed-off-by: Lisanna Dettwyler <lisanna.dettwyler@gmail.com>
0a2b5e5 to
f25ec83
Compare
This feels like the most straightforward way to handle #14760. If there's a more preferred solution, I'd be happy to change my patch.
Motivation
This gives custom build hooks the chance to perform any needed cleanup, which is impossible when killed with SIGKILL. After 500ms, if the hook still hasn't exited, it will be forcibly killed with SIGKILL.
Context
#14760
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.