Skip to content

Conversation

@lisanna-dettwyler
Copy link
Contributor

@lisanna-dettwyler lisanna-dettwyler commented Jan 28, 2026

This feels like the most straightforward way to handle #14760. If there's a more preferred solution, I'd be happy to change my patch.

Motivation

This gives custom build hooks the chance to perform any needed cleanup, which is impossible when killed with SIGKILL. After 500ms, if the hook still hasn't exited, it will be forcibly killed with SIGKILL.

Context

#14760


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added the new-cli Relating to the "nix" command label Jan 28, 2026
Comment on lines +75 to +78
/* Give custom build hooks the chance to cleanup. */
pid.setKillSignal(SIGTERM);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm doesn't this mean that we'd use SIGTERM in the destructor and could get stuck on waitpid after an interrupt indefinitely for a misbehaving build hook? Shouldn't we still use SIGKILL in the destructor and have a timeout in the happy path (busywait?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's this? The alternative is adding a busywait + kill directly to Pid::wait, which seems messy.

@lisanna-dettwyler lisanna-dettwyler force-pushed the build-hook-signals branch 2 times, most recently from a71d6b1 to f72e16a Compare January 28, 2026 21:33
Comment on lines 73 to 74
if (killTimeout > 0)
killThread = std::thread([&]() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, should we bother doing this if we are using SIGKILL already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow, this patch uses SIGTERM initially, and falls back to SIGKILL (via the killThread) if SIGTERM doesn't do the trick after 500ms. Since SIGKILL is unhandleable, we need to use SIGTERM for the build hook if we want to give them the ability to do any sort of cleanup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I meant for the non-hook case. Don't make sense to spawn a thread to kill a derivation builder for example, but I guess that is guarded by the timeout, which is 0 by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was the intention behind the guard. I'll make the logic a little clearer by also checking killSignal.

pid_t pid = -1;
bool separatePG = false;
int killSignal = SIGKILL;
unsigned killTimeout = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe std::chrono::duration would be a better fit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used std::chrono::milliseconds to avoid templating the function and because all chrono literals are automatically convertible.

Copy link
Contributor

@xokdvium xokdvium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also one trivial improvement I think we could make is to have the progress bar indicate that nix is shutting down is isInterrupted() is set. Since the bar is rendered by another thread that should be doable and would provide immediate feedback to users in such cases.

@lisanna-dettwyler
Copy link
Contributor Author

lisanna-dettwyler commented Jan 30, 2026

Example with a misbehaving build hook getting killed: https://asciinema.org/a/GMoTcJkTYHLvRfH7

actPostBuildHook = 110,
actBuildWaiting = 111,
actFetchTree = 112,
actShuttingDown = 113,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, so you think there could be a way to avoid adding a new activity type to handle this? The progress bar thread could install a callback for interrupts with ReceiveCallbacks.

Using a static activity seems hacky to me too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I didn't really like it either. I added a callback which just redraws the progress bar to display the shutdown message. https://asciinema.org/a/zpp6NYB15xtdjRDu

@lisanna-dettwyler lisanna-dettwyler force-pushed the build-hook-signals branch 4 times, most recently from 9be29a9 to 573ad90 Compare February 1, 2026 16:23
if (killTimeout > 0ms && killSignal != SIGKILL)
killThread = std::thread([&]() {
auto elapsed = 0ms;
while (elapsed < killTimeout) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will block the Worker::run() loop from making progress for up to killTimeout milliseconds, right? That's probably not too bad, but not ideal...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an ideal world we could try doing async cleanup if we had better facilities for async things (think a task that either polls pidfd or hits a timeout). I think ASIO + Boost.process2 might have some helpers for this sort of thing

Copy link
Contributor Author

@lisanna-dettwyler lisanna-dettwyler Feb 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if these conditions are met:

  • The user interrupts nix
  • The user is using a nonstandard build hook (the regular build hook responds timely to SIGTERM)
  • The build hook is misbehaving during shutdown

This gives custom build hooks the chance to perform any needed cleanup,
which is impossible when killed with SIGKILL. After 500ms, if the hook
still hasn't exited, it will be forcibly killed with SIGKILL.

Resolves NixOS#14760

Signed-off-by: Lisanna Dettwyler <lisanna.dettwyler@gmail.com>
Signed-off-by: Lisanna Dettwyler <lisanna.dettwyler@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new-cli Relating to the "nix" command

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants