Description of problem (please be detailed as possible and provide log snippests): The subPath volume permission is not correctly set for CephFS volume Inside the Pod both directories should have fsGroup: sh-4.2$ ls -l /etc/healing-controller.d/ total 0 drwxrwsr-x. 2 root 9999 0 Mar 30 01:49 critical-containers-logs drwxrwsr-x. 2 root root 0 Mar 30 01:49 record Version of all relevant components (if applicable): OCP 4.12 ODF 4.12 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? The customer starts to see this problem since OCP 4.12. Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes, in my environment 90% reproduce rate. In customer's site it is 50% rate Can this issue reproduce from the UI? No If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. oc adm policy add-scc-to-user privileged -z default 2. Create the Pod and the CephFS CSI PVC $ cat /tmp/test-pv.yaml apiVersion: v1 kind: Pod metadata: name: rhel7 labels: app: rhel7 spec: containers: - name: myapp-container image: registry.access.redhat.com/ubi7/ubi command: ['sh', '-c', 'mkdir /etc/healing-controller.d -p && echo The app is running! && sleep 3600'] securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true runAsNonRoot: true seLinuxOptions: level: s0 volumeMounts: - mountPath: /etc/healing-controller.d/record name: local-disks subPath: record - mountPath: /etc/healing-controller.d/critical-containers-logs name: local-disks subPath: critical-containers-logs volumes: - name: local-disks persistentVolumeClaim: claimName: local-pvc-name securityContext: fsGroup: 9999 runAsGroup: 9999 runAsUser: 9999 --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: local-pvc-name namespace: test-pv spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: ocs-storagecluster-cephfs volumeMode: Filesystem 3. Login to the Pod and check /etc/healing-controller.d/* permissions sh-4.2$ ls -l /etc/healing-controller.d/ Actual results: sh-4.2$ ls -l /etc/healing-controller.d/ total 0 drwxrwsr-x. 2 root 9999 0 Mar 30 01:49 critical-containers-logs drwxrwsr-x. 2 root root 0 Mar 30 01:49 record Expected results: sh-4.2$ ls -l /etc/healing-controller.d/ total 0 drwxrwsr-x. 2 root 9999 0 Mar 30 01:47 critical-containers-logs drwxrwsr-x. 2 root 9999 0 Mar 30 01:47 record Additional info: This issue can not be reproduced by using other CSIs such as gp3-csi
sh-4.4# ./test.sh ++ grep mon_host /etc/ceph/ceph.conf ++ awk '{print $3}' + mon_endpoints=172.30.227.108:6789,172.30.40.75:6789,172.30.114.250:6789 ++ awk '{print $3}' ++ grep key /etc/ceph/keyring + my_secret=AQDHjD9kDgR6AhAAEdlX3qO3tb2PZqx/4USf5g== + for i in 1 2 + ceph fs subvolume create ocs-storagecluster-cephfilesystem test1 csi ++ ceph fs subvolume getpath ocs-storagecluster-cephfilesystem test1 csi + path=/volumes/csi/test1/492940a4-ccbc-4c9d-a6b1-43a490efab8e + mkdir -p /tmp/registry1 + ceph-fuse /tmp/registry1 -m=172.30.227.108:6789 --key=AQDHjD9kDgR6AhAAEdlX3qO3tb2PZqx/4USf5g== -n=client.admin -r /volumes/csi/test1/492940a4-ccbc-4c9d-a6b1-43a490efab8e -o nonempty --client_mds_namespace=ocs-storagecluster-cephfilesystem 2023-04-19T07:59:08.086+0000 7f3f63fbd540 -1 init, newargv = 0x56213485a800 newargc=17 ceph-fuse[1970]: starting ceph client ceph-fuse[1970]: starting fuse + chgrp 9999 /tmp/registry1 + chmod g+s,a+rwx /tmp/registry1 + sleep 5 + mkdir -p /tmp/registry1/a + mkdir -p /tmp/registry1/b + mkdir -p /tmp/registry1/b/x + ls -lrt /tmp/registry1/b/ total 1 drwxrwsrwx. 2 root 9999 0 Apr 19 07:59 x + mkdir /tmp/registry1/c + mkdir /tmp/registry1/d + ls -lrt /tmp/registry1 total 2 drwxrwsrwx. 2 root 9999 0 Apr 19 07:59 a drwxrwsrwx. 3 root 9999 0 Apr 19 07:59 b drwxrwsrwx. 2 root 9999 0 Apr 19 07:59 c drwxrwsrwx. 2 root 9999 0 Apr 19 07:59 d + umount /tmp/registry1 + rm -rf /tmp/registry1 + ceph fs subvolume rm ocs-storagecluster-cephfilesystem test1 csi + for i in 1 2 + ceph fs subvolume create ocs-storagecluster-cephfilesystem test2 csi ++ ceph fs subvolume getpath ocs-storagecluster-cephfilesystem test2 csi + path=/volumes/csi/test2/9cbc323e-9993-454c-8746-1b2bc4ace5c3 + mkdir -p /tmp/registry2 + ceph-fuse /tmp/registry2 -m=172.30.227.108:6789 --key=AQDHjD9kDgR6AhAAEdlX3qO3tb2PZqx/4USf5g== -n=client.admin -r /volumes/csi/test2/9cbc323e-9993-454c-8746-1b2bc4ace5c3 -o nonempty --client_mds_namespace=ocs-storagecluster-cephfilesystem 2023-04-19T07:59:14.063+0000 7ff6c34c3540 -1 init, newargv = 0x56163def7800 newargc=17 ceph-fuse[2093]: starting ceph client ceph-fuse[2093]: starting fuse + chgrp 9999 /tmp/registry2 + chmod g+s,a+rwx /tmp/registry2 + sleep 5 + mkdir -p /tmp/registry2/a + mkdir -p /tmp/registry2/b + mkdir -p /tmp/registry2/b/x + ls -lrt /tmp/registry2/b/ total 1 drwxrwsrwx. 2 root 9999 0 Apr 19 07:59 x + mkdir /tmp/registry2/c + mkdir /tmp/registry2/d + ls -lrt /tmp/registry2 total 2 drwxrwsrwx. 2 root 9999 0 Apr 19 07:59 a drwxrwsrwx. 3 root 9999 0 Apr 19 07:59 b drwxrwsrwx. 2 root 9999 0 Apr 19 07:59 c drwxrwsrwx. 2 root 9999 0 Apr 19 07:59 d + umount /tmp/registry2 + rm -rf /tmp/registry2 + ceph fs subvolume rm ocs-storagecluster-cephfilesystem test2 csi sh-4.4# sh-4.4# sh-4.4# sh-4.4# vi test.sh .bash_logout .bash_profile .bashrc .cshrc .tcshrc anaconda-ks.cfg anaconda-post.log original-ks.cfg test.sh sh-4.4# vi test.sh sh-4.4# ./test.sh ++ grep mon_host /etc/ceph/ceph.conf ++ awk '{print $3}' + mon_endpoints=172.30.227.108:6789,172.30.40.75:6789,172.30.114.250:6789 ++ grep key /etc/ceph/keyring ++ awk '{print $3}' + my_secret=AQDHjD9kDgR6AhAAEdlX3qO3tb2PZqx/4USf5g== + for i in 1 2 + ceph fs subvolume create ocs-storagecluster-cephfilesystem test1 csi ++ ceph fs subvolume getpath ocs-storagecluster-cephfilesystem test1 csi + path=/volumes/csi/test1/1251b28f-ca6a-4b0b-860f-ee3ebdbad933 + mkdir -p /tmp/registry1 + mount -t ceph -o mds_namespace=ocs-storagecluster-cephfilesystem,name=admin,secret=AQDHjD9kDgR6AhAAEdlX3qO3tb2PZqx/4USf5g== 172.30.227.108:6789,172.30.40.75:6789,172.30.114.250:6789://volumes/csi/test1/1251b28f-ca6a-4b0b-860f-ee3ebdbad933 /tmp/registry1 + chgrp 9999 /tmp/registry1 + chmod g+s,a+rwx /tmp/registry1 + sleep 5 + mkdir -p /tmp/registry1/a + mkdir -p /tmp/registry1/b + mkdir -p /tmp/registry1/b/x + ls -lrt /tmp/registry1/b/ total 0 drwxrwsrwx. 2 root 9999 0 Apr 19 08:00 x + mkdir /tmp/registry1/c + mkdir /tmp/registry1/d + ls -lrt /tmp/registry1 total 0 drwxrwsrwx. 2 root 9999 0 Apr 19 08:00 a drwxrwsrwx. 3 root 9999 1 Apr 19 08:00 b drwxrwsrwx. 2 root 9999 0 Apr 19 08:00 c drwxrwsrwx. 2 root 9999 0 Apr 19 08:00 d + umount /tmp/registry1 + rm -rf /tmp/registry1 + ceph fs subvolume rm ocs-storagecluster-cephfilesystem test1 csi + for i in 1 2 + ceph fs subvolume create ocs-storagecluster-cephfilesystem test2 csi ++ ceph fs subvolume getpath ocs-storagecluster-cephfilesystem test2 csi + path=/volumes/csi/test2/884662eb-86d9-421f-b853-d008334ae93b + mkdir -p /tmp/registry2 + mount -t ceph -o mds_namespace=ocs-storagecluster-cephfilesystem,name=admin,secret=AQDHjD9kDgR6AhAAEdlX3qO3tb2PZqx/4USf5g== 172.30.227.108:6789,172.30.40.75:6789,172.30.114.250:6789://volumes/csi/test2/884662eb-86d9-421f-b853-d008334ae93b /tmp/registry2 + chgrp 9999 /tmp/registry2 + chmod g+s,a+rwx /tmp/registry2 + sleep 5 + mkdir -p /tmp/registry2/a + mkdir -p /tmp/registry2/b + mkdir -p /tmp/registry2/b/x + ls -lrt /tmp/registry2/b/ total 0 drwxrwsrwx. 2 root 9999 0 Apr 19 08:00 x + mkdir /tmp/registry2/c + mkdir /tmp/registry2/d + ls -lrt /tmp/registry2 total 0 drwxrwsrwx. 2 root 9999 0 Apr 19 08:00 a drwxrwsrwx. 3 root 9999 1 Apr 19 08:00 b drwxrwsrwx. 2 root 9999 0 Apr 19 08:00 c drwxrwsrwx. 2 root 9999 0 Apr 19 08:00 d + umount /tmp/registry2 + rm -rf /tmp/registry2 + ceph fs subvolume rm ocs-storagecluster-cephfilesystem test2 csi sh-4.4# -------------------------------------------- Note if i put a 5 seconds delay between chmod and mkdir of 1st directory looks like right permission are set. not sure it matters but pasting here for reference
Verified! able to see expected result in FS Job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/25149/ [sdurgbun auth]$ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE local-pvc-name Bound pvc-f35bc2e7-ed86-4b25-b524-b071df9b8c2d 1Gi RWO ocs-storagecluster-cephfs 15s [sdurgbun auth]$ oc get pods NAME READY STATUS RESTARTS AGE rhel7 0/1 ContainerCreating 0 68s [sdurgbun auth]$ oc get pods NAME READY STATUS RESTARTS AGE rhel7 1/1 Running 0 95s [sdurgbun auth]$ oc rsh rhel7 sh-4.2$ ls -l /etc/healing-controller.d/ total 0 drwxrwsr-x. 2 root 9999 0 Jun 1 10:26 critical-containers-logs drwxrwsr-x. 2 root 9999 0 Jun 1 10:26 record sh-4.2$ sh-4.2$ exit [sdurgbun auth]$ oc get csv --show-labels No resources found in testbz namespace. [sdurgbun auth]$ oc get csv --namespace openshift-storage --show-labels NAME DISPLAY VERSION REPLACES PHASE LABELS mcg-operator.v4.12.4-rhodf NooBaa Operator 4.12.4-rhodf mcg-operator.v4.12.3-rhodf Succeeded operators.coreos.com/mcg-operator.openshift-storage= ocs-operator.v4.12.4-rhodf OpenShift Container Storage 4.12.4-rhodf ocs-operator.v4.12.3-rhodf Succeeded full_version=4.12.4-1,operatorframework.io/arch.amd64=supported,operatorframework.io/arch.ppc64le=supported,operatorframework.io/arch.s390x=supported,operators.coreos.com/ocs-operator.openshift-storage= odf-csi-addons-operator.v4.12.4-rhodf CSI Addons 4.12.4-rhodf odf-csi-addons-operator.v4.12.3-rhodf Succeeded operators.coreos.com/odf-csi-addons-operator.openshift-storage= odf-operator.v4.12.4-rhodf OpenShift Data Foundation 4.12.4-rhodf odf-operator.v4.12.3-rhodf Succeeded full_version=4.12.4-1,operatorframework.io/arch.amd64=supported,operatorframework.io/arch.ppc64le=supported,operatorframework.io/arch.s390x=supported,operators.coreos.com/odf-operator.openshift-storage= [sdurgbun auth]$ cat test-pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: local-pvc-name namespace: testbz spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: ocs-storagecluster-cephfs volumeMode: Filesystem [sdurgbun auth]$ cat test-pod.yaml apiVersion: v1 kind: Pod metadata: name: rhel7 labels: app: rhel7 spec: containers: - name: myapp-container image: registry.access.redhat.com/ubi7/ubi command: ['sh', '-c', 'mkdir /etc/healing-controller.d -p && echo The app is running! && sleep 3600'] securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true runAsNonRoot: true seLinuxOptions: level: s0 volumeMounts: - mountPath: /etc/healing-controller.d/record name: local-disks subPath: record - mountPath: /etc/healing-controller.d/critical-containers-logs name: local-disks subPath: critical-containers-logs volumes: - name: local-disks persistentVolumeClaim: claimName: local-pvc-name securityContext: fsGroup: 9999 runAsGroup: 9999 runAsUser: 9999 [sdurgbun auth]$
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.12.4 security and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3609