Conversation
Added a check to the osc-kata-install script to ensure the installation only proceeds when the NODE_LABEL environment variable is set. This prevents unintended behavior during daemonset deployment. Signed-off-by: Patrik Fodor <patrik.fodor@ibm.com>
…emonSet - Migrated peer-pods configuration handling into the osc-rpm DaemonSet. - Prepares for the transition where config files are bundled with the rpm package. - Simplifies the overall installation process and operator logic. - Lays groundwork for installing Kata Containers without requiring node reboot. Signed-off-by: Patrik Fodor <patrik.fodor@ibm.com>
|
Hi @Pacho20. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Hey @Pacho20, I may be mistaken, but I think that That would explain why you need to kill and restart the process. |
|
The title of the first commit is a big long. And GitHub still does not have a way to comment directly on commit messages. |
| exit 1 | ||
| } | ||
|
|
||
| [[ -z "${NODE_LABEL:-}" ]] && { |
There was a problem hiding this comment.
The -u option is set, so if the variable is unset, it results in an error. Therefore, we don’t necessarily need :-, but using it allows us to set the variable to an empty string and thereby control the error message.
There was a problem hiding this comment.
I meant: You are adding a test to check if NODE_LABEL is not set, with an error message that essentially is a almost exactly what bash itself would give you.
I see no functional difference between
ERROR: NODE_LABEL env var must be set
and
NODEL_LABEL: unbound variable
So that code segment seems pointless to me.
Alternative: make the error message better, and send it to stderr instead of stdout.
| RUN mkdir -p /scripts | ||
|
|
||
| ADD osc-kata-install.sh osc-configs-script.sh osc-log-level.sh lib.sh /scripts/ | ||
| ADD osc-kata-install.sh osc-log-level.sh lib.sh /scripts/ |
There was a problem hiding this comment.
I don't really understand why merging two scripts into one makes things simpler. Isn't it clearer what each step does when they are separated? Aren't there cases where you would want to reconfigure without reinstalling?
There was a problem hiding this comment.
I removed most of the content from that script, so I thought moving it into the other one wouldn’t be a problem. I don’t think there are any cases that require reconfiguration. The two functions I moved to the other script will be removed anyway, since these config files will be part of the RPM package. Nevertheless, I think you’re right - it makes more sense to keep them in separate scripts.
| rm -rf /host/tmp/extensions/ | ||
|
|
||
| # Copy configs | ||
| copy_kata_remote_config_files |
There was a problem hiding this comment.
If the goal is to simplify the workflow, the same overall simplification could be achieved by simply invoking the config script, no?
There was a problem hiding this comment.
Yup, you're right about that.
controllers/daemonset_reconcile.go
Outdated
| for _, node := range nodes { | ||
| r.Log.Info("node must be rebooted", "node", node.Name) | ||
| } | ||
| //r.scheduleInstallation(UninstallKata) |
There was a problem hiding this comment.
Is that a TODO or a leftover? AFAICT Uninstall is implemented, so looks to me like that code should be uncommented and the code above it removed?
There was a problem hiding this comment.
Ah, I think that the problem is that uninstall without reboot won't work. Add a comment here then
There was a problem hiding this comment.
correct. uninstall ran into issues so it's just disabled. As the primary use case I want fixed with this is worker updates, uninstall I'm fine leaving as customer must manually reboot worker to finish uninstall. It was left as a may implement later.
| exec_on_host "systemctl daemon-reload" | ||
| exec_on_host "systemctl restart crio" | ||
|
|
||
| wait_till_node_is_ready |
There was a problem hiding this comment.
Can we add a timeout on the various wait and error out if we exceed it?
If you want to use the timeout command, you will need to either export the wait functions or put them in some separate script. Or you can add your own custom timeout. But having no node reach the ready state seems like a condition we should be ready to deal with gracefully.
I offered node debug and lsof might answer this also |
I had a similar hypothesis. I checked the file descriptors with I checked where it comes from, and the config validator seems to ignore it and load the new runtime anyway. The relevant part of But even with successful reloading, when kubelet tries to invoke the You can find the code related to the error message here and the method using it, which is used in every The other thing I noticed (which is probably worse) while checking the kubelet logs is that the If anyone has more insight, I’d gladly hear it. |
|
@Pacho20 In the current state, this clearly needs quite a bit more work. Would you mind flagging it as do-not-merge until the PR is in a more complete state? |
This change introduces the use of rpm-ostree apply-live and restarts CRI-O, allowing both kata and kata-remote to function without requiring node reboots. Signed-off-by: Patrik Fodor <patrik.fodor@ibm.com>
09b0a6c to
47d28ac
Compare
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| # Wait again: rpm-ostree install stages changes, requiring a reboot | ||
| wait_for_reboot_clear | ||
| # Install SELinux policy | ||
| semodule -i /usr/share/kata-containers/defaults/osc_monitor.cil |
There was a problem hiding this comment.
Why ? This is supposed to be done by the RPM already...
There was a problem hiding this comment.
Maybe it's different on RHEL and RHCOS but it does not load after the installation.
I got this error for the openshift-sandboxed-containers-monitor pods:
Error: container create failed: write to /proc/self/attr/keycreate: Invalid argument
And after this modification the error message was gone.
There was a problem hiding this comment.
This seems to indicate that the kata-containers RPM scriplets did not run. This change is thus basically a band-aid over something that is clearly wrong here and that must be investigated.
This is a work-in-progress commit. Probably won't work properly. Signed-off-by: Patrik Fodor <patrik.fodor@ibm.com>
|
Small Status Update Currently, I am facing the following issues with this installation method: systemctl reload crio does not work as expectedsystemctl reload crio does not behave as expected, so I need to use restart instead (which can lead to issues depending on the crio and kubelet configurations). More details about this issue can be found in my previous comment. rpm-ostree uninstall with apply-live does not work as expectedUsing the following commands: results in a different filesystem then expected. As a short term solution I created functions that find all effected files and directories and back them up then restore them after apply-live runs (pushed that as a separate commit so you can check it). I thought this would solve the issues with the uninstallation. But it does not solve all the issues. After the uninstall and apply-live if I restart kubelet or try to reboot the node, kubelet does not start. I always used a ROKS cluster for the whole process so I can't access the worker node without kubelet running. I don't know what exactly happens but I had another running worker node (after apply-live but did not restart anything). systemctl status shows a degraded state with a failed service called I tried running systemctl daemon-reload, but that was not enough to resolve the issue. So, uninstall with apply-live does not work properly and prevents kubelet from starting again. I have not yet identified the root cause. |
Let's take a step back. You're trying to re-implement concepts that are already implemented in the MCO. Did you consider looking into their code to see how they use |
Especially, MCO doesn't use Let's go back to something more aligned with what the MCO does :
|
Yeah, MCO uses a different approach. I hoped it would be easy to do the install/uninstall with live-apply, but clearly it’s not. I agree that install/uninstall, cordon and drain, then reboot would be the way forward. |
I suggest we keep this PR for reference of a tentative to use |
| # Run install inside chroot | ||
| echo "Running in chroot: $install_cmd" | ||
| chroot /host bash -c "$install_cmd" | ||
| chroot /host bash -c "rpm-ostree apply-live --allow-replacement" |
There was a problem hiding this comment.
After reading a bit more on apply-live :
- https://github.com/coreos/rpm-ostree/blob/main/docs/apply-live.md
- https://github.com/coreos/rpm-ostree/blob/main/man/rpm-ostree.xml#L811
This adds a transient overlayfs layer that doesn't survive reboots. If the worker gets rebooted for some arbitrary reason later on, is rpm-ostree apply-live run again ?
There was a problem hiding this comment.
apply-live also creates a deployment that persists reboots so this should be fine.
Analysis of MCO and rpm-ostree source code shows that apply-live has an asymmetry problem during uninstallation: - /etc changes are immediate and persistent (deletions happen now) - /usr changes are transient via overlayfs (deletions deferred to reboot) This mismatch causes CRI-O config to be deleted while QEMU binaries still exist, leading to CRI-O restart failures when it can't find the runtime handler config. MCO never uses apply-live for extensions - it always requires a reboot (see pkg/daemon/update.go:737-739). Also fixes a bug where stale RPMs from previous runs could be mixed with new RPMs by clearing the temp directory before copying. References: - rpm-ostree/docs/apply-live.md: "/etc changes are persistent" - rpm-ostree/rust/src/live.rs: overlayfs is transient - openshift/sandboxed-containers-operator#1349 Co-Authored-By: Claude <noreply@anthropic.com>
|
Regarding the use of
No— This design keeps live changes minimal, avoids running arbitrary scripts on a live system, and acknowledges that live changes are non-transactional and can “leak” (e.g.,
By contrast, in a normal
Best web references
|
- Description of the problem which is fixed/What is the use case
The DaemonSet installation mode requires manual node reboots, which complicates both installation and uninstallation. This can confuse users and results in a poor user experience. The introduced changes aim to eliminate the need for reboots and make the process more seamless, although the current solution is not fully complete.
- What I did
Removed the configuration script and DaemonSet, as they made the installation process unnecessarily complex. Added
rpm-ostree live-applyto enable Kata on worker nodes without rebooting. Extended the controller to schedule the installation process so that nodes are updated one at a time.Initially, I tried to use
systemctl reloadfor CRI-O instead of a full restart. This would have been a better solution because it avoids interrupting both CRI-O and kubelet and does not rely on their state restoration (which can fail in some cases). Whilereloadworks and CRI-O reloads its configuration, it fails to locate the executable for Kata runtime, returning the error:stat: no such file or directoryfor/usr/bin/containerd-shim-kata.The binary exists and works, but CRI-O cannot find it. I investigated multiple possibilities: checking the file in CRI-O’s namespace using
nsenter, verifying permissions, SELinux flags, mount options, kernel parameters—everything suggests CRI-O should be able to invoke the binary. I still have a few ideas to check the interaction between CRI-O and the kernel during this lookup. If you have any insight into why this happens or how to fix it, that would greatly simplify the installation process.As a fallback, I verified that CRI-O can invoke the Kata runtime after a full restart, so that is the current approach. However, this is not ideal because restarting CRI-O also triggers a kubelet restart (at least on ROKS). Installation works reliably because
rpm-ostree installtakes time, giving kubelet a chance to recover. Uninstallation, however, fails: although the script waits for the node to be in a "Ready" state, the node never becomes "NotReady" during kubelet restart. This means uninstall runs on other nodes while kubelet is still recovering, and triggers kubelet restarts on those nodes too, leaving the cluster in a broken state where most pods enterImagePullBackOfforCrashLoopBackOff. Recovery is possible by restarting pods in the right order, but this should not happen. I could not find a reliable way to detect when kubelet and CRI-O are fully restored, so for now I reintroduced manual reboot for uninstallation.One reason I pursued this approach is that the community operator uses a similar method successfully. They use the kata-deploy script - see:
https://github.com/kata-containers/kata-containers/blob/main/tools/packaging/kata-deploy/scripts/kata-deploy.sh#L764C10-L778.
Currently, I see three possible paths forward:
systemctl reloadwork for CRI-O without breaking Kata runtime.- How to verify it
Build the operator using the updated
scripts/kata-install/Dockerfile. Apply theKataConfigCR and wait until all nodes reach the "installed" status.- Description for the changelog
Change DaemonSet mode to eliminate node reboots (installation only; uninstallation still requires reboot for now).
EDIT: this is expected to fix https://issues.redhat.com/browse/KATA-4233