Bug 2040612

Summary: crio umask sometimes set to 0000
Product: OpenShift Container Platform Reporter: Benjamin Gilbert <bgilbert>
Component: NodeAssignee: Sascha Grunert <sgrunert>
Node sub component: CRI-O QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: high CC: abraj, apaladug, bshaw, dpateriy, hyupark, iheim, mgokhool, mzasepa, nagrawal, openshift-bugs-escalate, schoudha, sgrunert, suc, travier, vsolanki, weinliu
Version: 4.8   
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.12 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 2105159 2106793 2106794 2106795 (view as bug list) Environment:
Last Closed: 2023-01-17 19:46:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2106795    

Description Benjamin Gilbert 2022-01-14 08:51:59 UTC
Description of problem:
crio sometimes runs with a zero umask and passes that umask along to its children, potentially causing them to create files with world-writable permissions.

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2022-01-12-101005
[4.7 untested]
4.8.0-0.nightly-2022-01-13-164749
4.9.0-0.nightly-2022-01-14-003615

How reproducible:
Only happens on some launches of crio.  I've seen this on roughly one node per fresh cluster.

Steps to Reproduce:
1. Start a cluster in e.g. AWS or GCP with cluster-bot
1. In the OpenShift console, Compute > Nodes > (a node) > Terminal
2. Run `umask`

Actual results:
0000

Expected results:
0022

Additional info:
conmon appears to set an 0022 umask on its children, so processes inside containers (and the `oc debug` shell) are seemingly not affected.  The OpenShift console terminal doesn't run via conmon and does inherit the zero umask.  Any files written by an affected process, including config files written from the debug console, will be world-writable by default.

To list all processes with a zero umask, run this from `oc debug` or the Terminal:

ps lp $(grep -l "Umask:[[:space:]]0000" /proc/[0-9]*/status | cut -f3 -d/) | grep -v "]$"

I have not seen this on 4.10.0-0.nightly-2022-01-13-061145 in a couple attempts, but am not confident that it's unaffected.

Comment 2 Benjamin Gilbert 2022-01-14 16:07:18 UTC
Reproduced on 4.10.0-0.nightly-2022-01-13-061145.

Comment 4 Benjamin Gilbert 2022-01-14 16:49:55 UTC
On affected nodes, I found these items being created world-writable:

- Several direct descendants of /run/containers/storage/overlay-containers/*/userdata/ (but the userdata directory itself is still mode 700)
- Files in /run/crio/exits

Comment 5 Benjamin Gilbert 2022-01-17 17:04:54 UTC
As reported in the attached support case, "oc exec pod/some-pod-on-affected-node -ti -- /bin/sh" spawns a shell with a umask of 0000.

Comment 7 Sascha Grunert 2022-05-23 08:23:08 UTC
I can reproduce the issue on my local 4.10.13 cluster while a worker node seems to run CRI-O with a 0o00 umask:

> F   UID     PID    PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
> 4     0       1       0  20   0 245012 18092 do_epo Ss   ?         67:01 /usr/lib/systemd/systemd --switched-root --system --deserialize 16
> 4     0    2881       1  20   0 5919688 196496 -    Ssl  ?         60:24 /usr/bin/crio
> 1     0    4490       1  20   0 143820  2432 x64_sy Ssl  ?          0:00 /usr/bin/conmon -b /run/containers/storage/overlay-containers/13c1111c6f053b3b598e8e62cb2266fd77607fbb56554a3c0fb0610d3f2d1a56/userdata -c 13c1111c6f053b3b598e8e62cb2266fd77607fbb56554a3c0fb0610d3f2d1a56 --exit-dir /var/run/crio/exits -l /var/log/pods/openshift-storage_csi->
> 1     0    4886       1  20   0 143820  2308 x64_sy Ssl  ?          0:00 /usr/bin/conmon -b /run/containers/storage/overlay-containers/61e9a67afec1b2c4daec17d9907257075d09e673a858d55be368a82a1f24caa1/userdata -c 61e9a67afec1b2c4daec17d9907257075d09e673a858d55be368a82a1f24caa1 --exit-dir /var/run/crio/exits -l /var/log/pods/openshift-sdn_sdn-s6qv>
> 1     0    4889       1  20   0 143820  2308 x64_sy Ssl  ?          0:00 /usr/bin/conmon -b /run/containers/storage/overlay-containers/25d692e5d8f93071f85c6d5bd908aa61476a8d38463ede11123ce2c11866affb/userdata
…

This means that all containers are affected as well, because CRI-O runs conmon which does not do any umask modification (any more?).


I think we want to switch to a default umask if required, which is something I propose in https://github.com/cri-o/cri-o/pull/5904

Comment 8 Sascha Grunert 2022-05-25 07:09:01 UTC
https://github.com/cri-o/cri-o/pull/5904 has been merged into the CRI-O main branch and should be part of the 4.12 release.

Comment 13 Sunil Choudhary 2022-06-20 14:59:35 UTC
Verified on 4.10.0-0.nightly-2022-06-08-150219

% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-06-08-150219   True        False         107m    Cluster version is 4.10.0-0.nightly-2022-06-08-150219

From web console terminal:

sh-4.4# chroot /host
sh-4.4# unmask
sh: unmask: command not found
sh-4.4# umask
0022
sh-4.4#

Comment 14 Timothée Ravier 2022-06-20 15:38:31 UTC
I would recommend verifying that this has been fixed with the command from the first comment from Benjamin:

ps lp $(grep -l "Umask:[[:space:]]0000" /proc/[0-9]*/status | cut -f3 -d/) | grep -v "]$"

Comment 20 vsolanki 2022-07-01 09:07:17 UTC
any update on this

Comment 46 Sunil Choudhary 2022-07-14 10:55:45 UTC
Moving back to ON_QA as the target release is now changed to 4.12. Will verify this bug on 4.12 build.

Comment 56 Sunil Choudhary 2022-08-02 14:17:59 UTC
% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-08-01-151317   True        False         3h14m   Cluster version is 4.12.0-0.nightly-2022-08-01-151317

% oc get nodes
NAME                                         STATUS   ROLES                  AGE     VERSION
ip-10-0-136-15.us-east-2.compute.internal    Ready    worker                 3h29m   v1.24.0+a9d6306
ip-10-0-152-10.us-east-2.compute.internal    Ready    control-plane,master   3h39m   v1.24.0+a9d6306
ip-10-0-166-72.us-east-2.compute.internal    Ready    worker                 3h29m   v1.24.0+a9d6306
ip-10-0-179-112.us-east-2.compute.internal   Ready    control-plane,master   3h38m   v1.24.0+a9d6306
ip-10-0-200-106.us-east-2.compute.internal   Ready    control-plane,master   3h40m   v1.24.0+a9d6306
ip-10-0-206-21.us-east-2.compute.internal    Ready    worker                 3h34m   v1.24.0+a9d6306

% oc debug node/ip-10-0-136-15.us-east-2.compute.internal
...
Starting pod/ip-10-0-136-15us-east-2computeinternal-debug ...
...

sh-4.4# umask
0022

sh-4.4# ps lp $(grep -l "Umask:[[:space:]]0000" /proc/[0-9]*/status | cut -f3 -d/) | grep -v "]$"
F   UID     PID    PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4     0       1       0  20   0 177200 15456 do_epo Ss   ?          0:48 /usr/lib/systemd/systemd --switched-root --system --deserialize 16

Comment 60 errata-xmlrpc 2023-01-17 19:46:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399

Comment 61 Red Hat Bugzilla 2023-09-18 04:30:13 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days