Skip to content

Add daemonset to disable core dumps#7588

Merged
yuvipanda merged 5 commits into2i2c-org:mainfrom
sunu:daemonset-to-disable-coredumps
Feb 25, 2026
Merged

Add daemonset to disable core dumps#7588
yuvipanda merged 5 commits into2i2c-org:mainfrom
sunu:daemonset-to-disable-coredumps

Conversation

@sunu
Copy link
Contributor

@sunu sunu commented Feb 5, 2026

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

Merging this PR will trigger the following deployment actions.

Support deployments

Cloud Provider Cluster Name Reason for Redeploy
gcp cloudbank Support helm chart has been modified
gcp dubois Support helm chart has been modified
kubeconfig projectpythia-binder Support helm chart has been modified
aws jupyter-health Support helm chart has been modified
aws strudel Support helm chart has been modified
aws maap Support helm chart has been modified
gcp hhmi Support helm chart has been modified
aws aimatx-2i2c-hub Support helm chart has been modified
gcp leap Support helm chart has been modified
aws 2i2c-aws-us Support helm chart has been modified
gcp 2i2c-uk Support helm chart has been modified
aws bnext-bio Support helm chart has been modified
gcp 2i2c Support helm chart has been modified
aws disasters Support helm chart has been modified
aws nasa-cryo Support helm chart has been modified
aws nasa-ghg-hub Support helm chart has been modified
kubeconfig utoronto Support helm chart has been modified
aws reflective Support helm chart has been modified
aws smithsonian Support helm chart has been modified
aws projectpythia Support helm chart has been modified
kubeconfig 2i2c-jetstream2 Support helm chart has been modified
aws ucmerced Support helm chart has been modified
aws earthscope Support helm chart has been modified
aws nmfs-openscapes Support helm chart has been modified
aws berkeley-geojupyter Support helm chart has been modified
aws victor Support helm chart has been modified
gcp awi-ciroh Support helm chart has been modified
aws openscapeshub Support helm chart has been modified
aws temple Support helm chart has been modified
aws opensci Support helm chart has been modified
aws nasa-veda Support helm chart has been modified

Staging deployments

Cloud Provider Cluster Name Hub Name Reason for Redeploy
gcp cloudbank staging Core infrastructure has been modified
aws jupyter-health staging Core infrastructure has been modified
aws strudel staging Core infrastructure has been modified
aws maap staging Core infrastructure has been modified
gcp hhmi staging Core infrastructure has been modified
aws aimatx-2i2c-hub staging Core infrastructure has been modified
gcp leap staging Core infrastructure has been modified
aws 2i2c-aws-us staging Core infrastructure has been modified
gcp 2i2c-uk staging Core infrastructure has been modified
aws bnext-bio staging Core infrastructure has been modified
gcp 2i2c staging Core infrastructure has been modified
gcp 2i2c dask-staging Core infrastructure has been modified
aws disasters staging Core infrastructure has been modified
aws nasa-cryo staging Core infrastructure has been modified
aws nasa-ghg-hub staging Core infrastructure has been modified
kubeconfig utoronto staging Core infrastructure has been modified
kubeconfig utoronto r-staging Core infrastructure has been modified
aws reflective staging Core infrastructure has been modified
aws smithsonian staging Core infrastructure has been modified
aws projectpythia staging Core infrastructure has been modified
kubeconfig 2i2c-jetstream2 staging Core infrastructure has been modified
aws ucmerced staging Core infrastructure has been modified
aws earthscope staging Core infrastructure has been modified
aws nmfs-openscapes staging Core infrastructure has been modified
aws berkeley-geojupyter staging Core infrastructure has been modified
aws victor staging Core infrastructure has been modified
gcp awi-ciroh staging Core infrastructure has been modified
aws openscapeshub staging Core infrastructure has been modified
aws temple staging Core infrastructure has been modified
aws opensci staging Core infrastructure has been modified
aws nasa-veda staging Core infrastructure has been modified

Production deployments

Cloud Provider Cluster Name Hub Name Reason for Redeploy
gcp cloudbank ahs Core infrastructure has been modified
gcp cloudbank authoring Core infrastructure has been modified
gcp cloudbank bcc Core infrastructure has been modified
gcp cloudbank bmcc Core infrastructure has been modified
gcp cloudbank chaffey Core infrastructure has been modified
gcp cloudbank ccsf Core infrastructure has been modified
gcp cloudbank chabot Core infrastructure has been modified
gcp cloudbank chicagostate Core infrastructure has been modified
gcp cloudbank cmu Core infrastructure has been modified
gcp cloudbank csm Core infrastructure has been modified
gcp cloudbank csum Core infrastructure has been modified
gcp cloudbank deanza Core infrastructure has been modified
gcp cloudbank demo Core infrastructure has been modified
gcp cloudbank dvc Core infrastructure has been modified
gcp cloudbank elac Core infrastructure has been modified
gcp cloudbank elcamino Core infrastructure has been modified
gcp cloudbank evc Core infrastructure has been modified
gcp cloudbank etsu Core infrastructure has been modified
gcp cloudbank fresno Core infrastructure has been modified
gcp cloudbank foothill Core infrastructure has been modified
gcp cloudbank glendale Core infrastructure has been modified
gcp cloudbank golden Core infrastructure has been modified
gcp cloudbank gwu Core infrastructure has been modified
gcp cloudbank gpu-demo Core infrastructure has been modified
gcp cloudbank high Core infrastructure has been modified
gcp cloudbank humboldt Core infrastructure has been modified
gcp cloudbank kean Core infrastructure has been modified
gcp cloudbank lacc Core infrastructure has been modified
gcp cloudbank lahc Core infrastructure has been modified
gcp cloudbank laney Core infrastructure has been modified
gcp cloudbank lavc Core infrastructure has been modified
gcp cloudbank lbcc Core infrastructure has been modified
gcp cloudbank mendocino Core infrastructure has been modified
gcp cloudbank merced Core infrastructure has been modified
gcp cloudbank merritt Core infrastructure has been modified
gcp cloudbank mmc Core infrastructure has been modified
gcp cloudbank miracosta Core infrastructure has been modified
gcp cloudbank mission Core infrastructure has been modified
gcp cloudbank moreno Core infrastructure has been modified
gcp cloudbank norco Core infrastructure has been modified
gcp cloudbank ocu Core infrastructure has been modified
gcp cloudbank palomar Core infrastructure has been modified
gcp cloudbank pasadena Core infrastructure has been modified
gcp cloudbank redwoods Core infrastructure has been modified
gcp cloudbank reedley Core infrastructure has been modified
gcp cloudbank riohondo Core infrastructure has been modified
gcp cloudbank saddleback Core infrastructure has been modified
gcp cloudbank santiago Core infrastructure has been modified
gcp cloudbank sbcc Core infrastructure has been modified
gcp cloudbank sbcc-dev Core infrastructure has been modified
gcp cloudbank sierra Core infrastructure has been modified
gcp cloudbank sjcc Core infrastructure has been modified
gcp cloudbank sjsu Core infrastructure has been modified
gcp cloudbank skyline Core infrastructure has been modified
gcp cloudbank sou Core infrastructure has been modified
gcp cloudbank spelman Core infrastructure has been modified
gcp cloudbank srjc Core infrastructure has been modified
gcp cloudbank tuskegee Core infrastructure has been modified
gcp cloudbank ucsc Core infrastructure has been modified
gcp cloudbank uchicago Core infrastructure has been modified
gcp cloudbank umd Core infrastructure has been modified
gcp cloudbank und Core infrastructure has been modified
gcp cloudbank virginia Core infrastructure has been modified
gcp cloudbank wlac Core infrastructure has been modified
gcp dubois ephemeral Core infrastructure has been modified
kubeconfig projectpythia-binder binderhub Core infrastructure has been modified
aws jupyter-health prod Core infrastructure has been modified
aws strudel prod Core infrastructure has been modified
aws strudel workshop Core infrastructure has been modified
aws maap prod Core infrastructure has been modified
gcp hhmi spyglass Core infrastructure has been modified
gcp hhmi binder Core infrastructure has been modified
aws aimatx-2i2c-hub prod Core infrastructure has been modified
gcp leap prod Core infrastructure has been modified
gcp leap public Core infrastructure has been modified
aws 2i2c-aws-us showcase Core infrastructure has been modified
gcp 2i2c-uk lis Core infrastructure has been modified
aws bnext-bio prod Core infrastructure has been modified
gcp 2i2c mtu Core infrastructure has been modified
aws disasters prod Core infrastructure has been modified
aws nasa-cryo prod Core infrastructure has been modified
aws nasa-ghg-hub prod Core infrastructure has been modified
aws nasa-ghg-hub binder Core infrastructure has been modified
kubeconfig utoronto prod Core infrastructure has been modified
kubeconfig utoronto r-prod Core infrastructure has been modified
kubeconfig utoronto highmem Core infrastructure has been modified
aws reflective prod Core infrastructure has been modified
aws reflective workshop Core infrastructure has been modified
aws smithsonian prod Core infrastructure has been modified
aws projectpythia prod Core infrastructure has been modified
aws projectpythia pythia-binder Core infrastructure has been modified
aws ucmerced prod Core infrastructure has been modified
aws earthscope prod Core infrastructure has been modified
aws earthscope binder Core infrastructure has been modified
aws nmfs-openscapes prod Core infrastructure has been modified
aws nmfs-openscapes workshop Core infrastructure has been modified
aws nmfs-openscapes noaa-only Core infrastructure has been modified
aws berkeley-geojupyter prod Core infrastructure has been modified
aws victor prod Core infrastructure has been modified
gcp awi-ciroh prod Core infrastructure has been modified
aws openscapeshub prod Core infrastructure has been modified
aws openscapeshub workshop Core infrastructure has been modified
aws temple prod Core infrastructure has been modified
aws temple advanced Core infrastructure has been modified
aws temple research Core infrastructure has been modified
aws opensci sciencecore Core infrastructure has been modified
aws opensci climaterisk Core infrastructure has been modified
aws opensci small-binder Core infrastructure has been modified
aws opensci big-binder Core infrastructure has been modified
aws nasa-veda prod Core infrastructure has been modified
aws nasa-veda binder Core infrastructure has been modified

sunu added 3 commits February 5, 2026 15:23
The scheduler.alpha.kubernetes.io/critical-pod annotation is deprecated
and has been replaced by priorityClassName. This daemonset already uses
priorityClassName: system-node-critical which is the proper way to mark
critical pods.

See https://kubernetes.io/docs/reference/labels-annotations-taints/#scheduler-alpha-kubernetes-io-critical-pod-deprecated
Adds a readiness probe that checks if kernel.core_pattern is still set
to |/bin/false. The pod will be marked not ready if the setting changes,
providing observability without active enforcement.
@yuvipanda
Copy link
Member

What else is needed to get this through?

@sunu
Copy link
Contributor Author

sunu commented Feb 12, 2026

I still need to test this on a hub but otherwise this is ready for review

@sunu sunu marked this pull request as ready for review February 12, 2026 14:19
@sunu
Copy link
Contributor Author

sunu commented Feb 13, 2026

Tested on the VEDA hub and everything is working as expected.

Before deploying the daemonset:

core_pattern was set to pipe core dumps to systemd:

cat /proc/sys/kernel/core_pattern
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

Triggered a core dump to confirm they were landing on the host node:

(notebook) jovyan@jupyter-sunu:~$ python
>>> import os
>>> import subprocess
>>> pid = os.getpid()
>>> subprocess.run(["kill", "-SIGABRT", str(pid)])
Aborted (core dumped)

And confirmed the core dump file showed up at /host/var/lib/systemd/coredump/ on the host:

kubectl debug node/ip-192-168-23-56.us-west-2.compute.internal -it --image=alpine
/ # ls -lh /host/var/lib/systemd/coredump/
total 1M
-rw-r-----  1 root  root  1.0M Feb 13 07:04 core.python.1000.9cc87ea62b3b426e98884096cb6dd442.9417.1770966244000000.xz

Cleared that file, then deployed the daemonset.

After deploying the daemonset:

core_pattern is now set to |/bin/false:

cat /proc/sys/kernel/core_pattern
|/bin/false

Triggered another core dump the same way, then checked the host node:

/ # ls -lh /host/var/lib/systemd/coredump/
total 0

No new core dump files created. 👍🏽

@agoose77
Copy link
Contributor

Thanks @sunu — we will address the review request in the next week and a bit!

@yuvipanda yuvipanda merged commit fd6fb8f into 2i2c-org:main Feb 25, 2026
41 checks passed
@github-project-automation github-project-automation bot moved this from Up Next to Done in Product and Services Feb 25, 2026
@github-actions
Copy link
Contributor

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/22414717695

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants