Skip to content

libct: prepareCgroupFD: fall back to container init cgroup#5101

Open
kolyshkin wants to merge 3 commits intoopencontainers:mainfrom
kolyshkin:fix-exec
Open

libct: prepareCgroupFD: fall back to container init cgroup#5101
kolyshkin wants to merge 3 commits intoopencontainers:mainfrom
kolyshkin:fix-exec

Conversation

@kolyshkin
Copy link
Contributor

Previously, when prepareCgroupFD would not open container's cgroup
(as configured in config.json and saved to state.json), it returned
a fatal error, as we presumed a container can't exist without its own
cgroup.

Apparently, it can. In a case when container is configured without
cgroupns (i.e. it uses hosts cgroups), and /sys/fs/cgroup is mounted
read-write, a rootful container's init can move itself to an entirely
different cgroup (even a new one that it just created), and then the
original container cgroup is removed by the kernel (or systemd?) as
it has no processes left. By the way, from the systemd point of view
the container is gone. And yet it is still there, and users want
runc exec to work!

And it worked, thanks to the "let's try container init's cgroup"
fallback as added by commit c91fe9a ("cgroup2: exec: join the
cgroup of the init process on EBUSY"). The fallback was added for
the entirely different reason, but it happened to work in this very
case, too.

This behavior was broken with the introduction of CLONE_INTO_CGROUP
support.

While it is debatable whether this is a valid scenario when a container
moves itself into a different cgroup, this very setup is used by e.g.
buildkitd running in a privileged kubernetes container (see issue #5089).

To restore the way things are expected to work, add the same "try
container init's cgroup" fallback into prepareCgroupFD.

Fixes: #5089.

@kolyshkin
Copy link
Contributor Author

TODO: add an integration test case.

@kolyshkin kolyshkin marked this pull request as draft February 6, 2026 01:54
Separate initProcessCgroupPath code out of addIntoCgroupV2.
To be used by the next patch.

While at it, describe the new scenario in which the container's
configured cgroup might not be available.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Previously, when prepareCgroupFD would not open container's cgroup
(as configured in config.json and saved to state.json), it returned
a fatal error, as we presumed a container can't exist without its own
cgroup.

Apparently, it can. In a case when container is configured without
cgroupns (i.e. it uses hosts cgroups), and /sys/fs/cgroup is mounted
read-write, a rootful container's init can move itself to an entirely
different cgroup (even a new one that it just created), and then the
original container cgroup is removed by the kernel (or systemd?) as
it has no processes left. By the way, from the systemd point of view
the container is gone. And yet it is still there, and users want
runc exec to work!

And it worked, thanks to the "let's try container init's cgroup"
fallback as added by commit c91fe9a ("cgroup2: exec: join the
cgroup of the init process on EBUSY"). The fallback was added for
the entirely different reason, but it happened to work in this very
case, too.

This behavior was broken with the introduction of CLONE_INTO_CGROUP
support.

While it is debatable whether this is a valid scenario when a container
moves itself into a different cgroup, this very setup is used by e.g.
buildkitd running in a privileged kubernetes container (see issue 5089).

To restore the way things are expected to work, add the same "try
container init's cgroup" fallback into prepareCgroupFD.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin force-pushed the fix-exec branch 2 times, most recently from 1b7acc8 to 6397833 Compare February 7, 2026 00:03
@kolyshkin
Copy link
Contributor Author

Added a test case. Testing it fails (w/o a fix) in #5102

Add a test case to reproduce runc issue 5089.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin marked this pull request as ready for review February 7, 2026 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/1.4-todo A PR in main branch which needs to backported to release-1.4

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Probe with exec fails because no cgroup directory is found (can't open cgroup: openat2 /sys/fs/cgroup/...: no such file or directory)

1 participant