Description of problem: crio sometimes runs with a zero umask and passes that umask along to its children, potentially causing them to create files with world-writable permissions. Version-Release number of selected component (if applicable): 4.6.0-0.nightly-2022-01-12-101005 [4.7 untested] 4.8.0-0.nightly-2022-01-13-164749 4.9.0-0.nightly-2022-01-14-003615 How reproducible: Only happens on some launches of crio. I've seen this on roughly one node per fresh cluster. Steps to Reproduce: 1. Start a cluster in e.g. AWS or GCP with cluster-bot 1. In the OpenShift console, Compute > Nodes > (a node) > Terminal 2. Run `umask` Actual results: 0000 Expected results: 0022 Additional info: conmon appears to set an 0022 umask on its children, so processes inside containers (and the `oc debug` shell) are seemingly not affected. The OpenShift console terminal doesn't run via conmon and does inherit the zero umask. Any files written by an affected process, including config files written from the debug console, will be world-writable by default. To list all processes with a zero umask, run this from `oc debug` or the Terminal: ps lp $(grep -l "Umask:[[:space:]]0000" /proc/[0-9]*/status | cut -f3 -d/) | grep -v "]$" I have not seen this on 4.10.0-0.nightly-2022-01-13-061145 in a couple attempts, but am not confident that it's unaffected.
Reproduced on 4.10.0-0.nightly-2022-01-13-061145.
On affected nodes, I found these items being created world-writable: - Several direct descendants of /run/containers/storage/overlay-containers/*/userdata/ (but the userdata directory itself is still mode 700) - Files in /run/crio/exits
As reported in the attached support case, "oc exec pod/some-pod-on-affected-node -ti -- /bin/sh" spawns a shell with a umask of 0000.
I can reproduce the issue on my local 4.10.13 cluster while a worker node seems to run CRI-O with a 0o00 umask: > F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND > 4 0 1 0 20 0 245012 18092 do_epo Ss ? 67:01 /usr/lib/systemd/systemd --switched-root --system --deserialize 16 > 4 0 2881 1 20 0 5919688 196496 - Ssl ? 60:24 /usr/bin/crio > 1 0 4490 1 20 0 143820 2432 x64_sy Ssl ? 0:00 /usr/bin/conmon -b /run/containers/storage/overlay-containers/13c1111c6f053b3b598e8e62cb2266fd77607fbb56554a3c0fb0610d3f2d1a56/userdata -c 13c1111c6f053b3b598e8e62cb2266fd77607fbb56554a3c0fb0610d3f2d1a56 --exit-dir /var/run/crio/exits -l /var/log/pods/openshift-storage_csi-> > 1 0 4886 1 20 0 143820 2308 x64_sy Ssl ? 0:00 /usr/bin/conmon -b /run/containers/storage/overlay-containers/61e9a67afec1b2c4daec17d9907257075d09e673a858d55be368a82a1f24caa1/userdata -c 61e9a67afec1b2c4daec17d9907257075d09e673a858d55be368a82a1f24caa1 --exit-dir /var/run/crio/exits -l /var/log/pods/openshift-sdn_sdn-s6qv> > 1 0 4889 1 20 0 143820 2308 x64_sy Ssl ? 0:00 /usr/bin/conmon -b /run/containers/storage/overlay-containers/25d692e5d8f93071f85c6d5bd908aa61476a8d38463ede11123ce2c11866affb/userdata … This means that all containers are affected as well, because CRI-O runs conmon which does not do any umask modification (any more?). I think we want to switch to a default umask if required, which is something I propose in https://github.com/cri-o/cri-o/pull/5904
https://github.com/cri-o/cri-o/pull/5904 has been merged into the CRI-O main branch and should be part of the 4.12 release.
Verified on 4.10.0-0.nightly-2022-06-08-150219 % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2022-06-08-150219 True False 107m Cluster version is 4.10.0-0.nightly-2022-06-08-150219 From web console terminal: sh-4.4# chroot /host sh-4.4# unmask sh: unmask: command not found sh-4.4# umask 0022 sh-4.4#
I would recommend verifying that this has been fixed with the command from the first comment from Benjamin: ps lp $(grep -l "Umask:[[:space:]]0000" /proc/[0-9]*/status | cut -f3 -d/) | grep -v "]$"
any update on this
Moving back to ON_QA as the target release is now changed to 4.12. Will verify this bug on 4.12 build.
% oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2022-08-01-151317 True False 3h14m Cluster version is 4.12.0-0.nightly-2022-08-01-151317 % oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-136-15.us-east-2.compute.internal Ready worker 3h29m v1.24.0+a9d6306 ip-10-0-152-10.us-east-2.compute.internal Ready control-plane,master 3h39m v1.24.0+a9d6306 ip-10-0-166-72.us-east-2.compute.internal Ready worker 3h29m v1.24.0+a9d6306 ip-10-0-179-112.us-east-2.compute.internal Ready control-plane,master 3h38m v1.24.0+a9d6306 ip-10-0-200-106.us-east-2.compute.internal Ready control-plane,master 3h40m v1.24.0+a9d6306 ip-10-0-206-21.us-east-2.compute.internal Ready worker 3h34m v1.24.0+a9d6306 % oc debug node/ip-10-0-136-15.us-east-2.compute.internal ... Starting pod/ip-10-0-136-15us-east-2computeinternal-debug ... ... sh-4.4# umask 0022 sh-4.4# ps lp $(grep -l "Umask:[[:space:]]0000" /proc/[0-9]*/status | cut -f3 -d/) | grep -v "]$" F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 4 0 1 0 20 0 177200 15456 do_epo Ss ? 0:48 /usr/lib/systemd/systemd --switched-root --system --deserialize 16
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days