Description of problem: Since the 27th of July (for OpenShift 4.14), the versions of OpenShift running on an RHCOS based on RHEL 9.2 have been hitting this permission denied error when scheduling a privileged container. Version-Release number of selected component (if applicable): How reproducible: 1. Deploy a nightly of OpenShift 4.12,4.13, or 4.14. 2. Run an openshift-test that schedules a privileged container. E.g.: $ KUBE_TEST_REPO_LIST="" KUBE_TEST_REPO="quay.io/openshift/community-e2e-images" ./openshift-tests run-test '[sig-storage] In-tree Volumes [Driver: hostPath] [Testpattern: Inline-volume (default fs)] volumes should store data [Suite:openshift/conformance/parallel] [Suite:k8s]' 3. Monitor the journal logs on the worker for selinux errors Actual results: type=AVC msg=audit(1691587536.149:914): avc: denied { execmod } for pid=335139 comm="sh" path="/bin/sh" dev="dm-4" ino=138505864 scontext=system_u:system_r:spc_t:s0 tcontext=system_u:object_r:container_ro_file_t:s0 tclass=file permissive=0 Expected results: No avc errors.
Device labels are: sh-5.1# ls -Z | grep dm- system_u:object_r:fixed_disk_device_t:s0 dm-0 system_u:object_r:fixed_disk_device_t:s0 dm-1 system_u:object_r:fixed_disk_device_t:s0 dm-2 system_u:object_r:fixed_disk_device_t:s0 dm-3 system_u:object_r:fixed_disk_device_t:s0 dm-4 sh-5.1#
This issue seems to have been introduced between the RHCOS releases: "io.openshift.build.versions": "machine-os=414.92.202307261347-0" "io.openshift.build.versions": "machine-os=414.92.202307270631-0"
The specific updated packages in those builds where: container-selinux 3:2.208.0-2.rhaos4.13.el9 → 3:2.215.0-2.rhaos4.13.el9 cri-o 1.27.1-3.rhaos4.14.gited2afb7.el9 → 1.27.1-4.rhaos4.14.gitab7845e.el9 So I suspect this issue is in: 3:2.215.0-2.rhaos4.13.el9
This is allowed in container-selinux 2.219 $ audit2allow -i /tmp/t #============= spc_t ============== #!!!! This avc is allowed in the current policy allow spc_t container_ro_file_t:file execmod; apiv2 (mounts) $ rpm -q container-selinux container-selinux-2.219.0-1.fc38.noarch
came up with minimal pod spec to reproduce this bug 1 oc create namespace debug-selinux 2 oc adm policy add-scc-to-group privileged system:serviceaccounts:debug-selinux 3 oc adm policy add-scc-to-group anyuid system:serviceaccounts:debug-selinux 4 oc adm policy add-scc-to-group hostmount-anyuid system:serviceaccounts:debug-selinux 5 apply the following yaml apiVersion: v1 kind: Pod metadata: name: hostpath-pod namespace: debug-selinux spec: # This will ensure to schedule the pod to the mentioned node to debug easily. nodeName: rdr-cicd-mon01-414-6mk65-worker-c5vk4 containers: - command: - /bin/sh - -c - 'while true ; do sleep 2; done ' image: quay.io/openshift/community-e2e-images:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-29-4-4zE9mRvED4RQoUxQ imagePullPolicy: IfNotPresent name: hostpath-pod securityContext: privileged: true restartPolicy: Always Pod logs: % oc logs hostpath-pod -n debug-selinux /bin/sh: error while loading shared libraries: cannot restore segment prot after reloc: Permission denied Audit logs $ ausearch -m AVC,USER_AVC,SELINUX_ERR,USER_SELINUX_ERR -ts recent -i ---- type=PROCTITLE msg=audit(08/10/23 10:36:01.111:2909) : proctitle=/bin/sh -c while true ; do sleep 2; done type=SYSCALL msg=audit(08/10/23 10:36:01.111:2909) : arch=ppc64le syscall=mprotect success=no exit=EACCES(Permission denied) a0=0x13c060000 a1=0x130000 a2=PROT_READ|PROT_EXEC a3=0x113250 items=0 ppid=1125323 pid=1125335 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=sh exe=/bin/sh subj=system_u:system_r:spc_t:s0 key=(null) type=AVC msg=audit(08/10/23 10:36:01.111:2909) : avc: denied { execmod } for pid=1125335 comm=sh path=/bin/sh dev="dm-4" ino=138505864 scontext=system_u:system_r:spc_t:s0 tcontext=system_u:object_r:container_ro_file_t:s0 tclass=file permissive=0 Observations: - same pod is working fine in the x86 environment - same pod works fine when use quay.io/centos/centos:stream9 image
Dan, can you please take a look at comment #5?
After discussion with Jeremy I'm going to update container-selinux to 2.219.0 as Dan mentions.
OCP 4.13 and up is now updated to container-selinux-2.219.0.
Tested with patched rpm on the effected system and fix didn't work and still see the issue ---- type=PROCTITLE msg=audit(08/11/23 00:17:42.518:538) : proctitle=/bin/sh -c while true ; do sleep 2; done type=SYSCALL msg=audit(08/11/23 00:17:42.518:538) : arch=ppc64le syscall=mprotect success=no exit=EACCES(Permission denied) a0=0x11ce10000 a1=0x130000 a2=PROT_READ|PROT_EXEC a3=0x113250 items=0 ppid=8942 pid=8954 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=sh exe=/bin/sh subj=system_u:system_r:spc_t:s0 key=(null) type=AVC msg=audit(08/11/23 00:17:42.518:538) : avc: denied { execmod } for pid=8954 comm=sh path=/bin/sh dev="dm-4" ino=138505864 scontext=system_u:system_r:spc_t:s0 tcontext=system_u:object_r:container_ro_file_t:s0 tclass=file permissive=0 sh-5.1# date -u Fri Aug 11 00:18:45 UTC 2023 sh-5.1# rpm -qa | grep container-selinux container-selinux-2.219.0-1.rhaos4.13.el9.noarch sh-5.1# rpm-ostree status State: idle Deployments: * ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a0f238c723c13d13f48231421e42b5063c92692e588a2f381572ffa33aca8d9c Digest: sha256:a0f238c723c13d13f48231421e42b5063c92692e588a2f381572ffa33aca8d9c Version: 414.92.202308080233-0 (2023-08-09T04:31:17Z) LocalOverrides: container-selinux 3:2.215.0-2.rhaos4.13.el9 -> 3:2.219.0-1.rhaos4.13.el9 sh-5.1# rpm -qi container-selinux-2.219.0-1.rhaos4.13.el9.noarch Name : container-selinux Epoch : 3 Version : 2.219.0 Release : 1.rhaos4.13.el9 Architecture: noarch Install Date: Fri Aug 11 00:08:38 2023 Group : Unspecified Size : 68308 License : GPLv2 Signature : (none) Source RPM : container-selinux-2.219.0-1.rhaos4.13.el9.src.rpm Build Date : Thu Aug 10 16:41:23 2023 Build Host : x86-64-02.build.eng.rdu2.redhat.com Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Vendor : Red Hat, Inc. URL : https://github.com/containers/container-selinux Summary : SELinux policies for container runtimes Description : SELinux policy modules for use with container runtimes. sh-5.1#
I installed the new container-selinux rpm on all of the worker nodes: sh-5.1# rpm -qa | grep container-selin container-selinux-2.221.0-1.el9.noarch sh-5.1# rpm-ostree status State: idle Deployments: * ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a0f238c723c13d13f48231421e42b5063c92692e588a2f381572ffa33aca8d9c Digest: sha256:a0f238c723c13d13f48231421e42b5063c92692e588a2f381572ffa33aca8d9c Version: 414.92.202308080233-0 (2023-08-09T04:31:13Z) LocalOverrides: container-selinux 3:2.215.0-2.rhaos4.13.el9 -> 3:2.221.0-1.el9 Hiro executed the single e2e test and he did not observe any errors from the command line. I went thru the journal logs and did not see any "avc: denied" or related selinux errors after the e2e single test run. This cluster is still available with the newer container-selinux rpm installed for investigation.
Created attachment 1983452 [details] openshift-test run-test output for previously failing testcase This ran *after* all worker nodes were upgraded with `container-selinux-2.221.0-1.el9.noarch.rpm`.
Please see comments 13 & 14. As verification for the newest patch.