Bug 1802680

Summary: CephFS RWX can only be accessed on the same node.
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Matt LeVan <matt.levan>
Component: cephAssignee: Mudit Agarwal <muagarwa>
Status: NEW --- QA Contact: Elad <ebenahar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.2CC: ebenahar, jijoy, pdonnell, sagrawal, sostapov, srangana
Target Milestone: ---Keywords: Reopened
Target Release: ---Flags: srangana: needinfo-
srangana: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-16 19:08:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matt LeVan 2020-02-13 17:20:07 UTC
Description of problem (please be detailed as possible and provide log
snippests):
With RWX CephFS pod in OCS 4.2.1, only pods running on a single worker node are able to access the PVC.


Version of all relevant components (if applicable):
OCP 4.2
OCS 4.2.1

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Steps to Reproduce:
1. Create RWX PVC in storageClassName: ocs-storagecluster-cephfs
2. Create Deployment with >=2 Pods that access RWX PVC
3. Have pods running on same and different worker nodes


Actual results:
Only pods running on the same host are able to access the volumes.

Expected results:
All pods no matter what worker node can access volume.


Additional info:
Pods information and host they are running on:
oc -n cp4i get pods -o wide | grep ace
ace-test-ibm-ace-dashboard-icp4i-prod-754684647d-4zctr     1/2     CrashLoopBackOff   9          20m   10.129.2.224   cluster3-w2.cluster3.storage-ocp.tuc.stglabs.ibm.com   <none>           <none>
ace-test-ibm-ace-dashboard-icp4i-prod-754684647d-5jwmd     2/2     Running            0          26h   10.131.0.43    cluster3-w1.cluster3.storage-ocp.tuc.stglabs.ibm.com   <none>           <none>
ace-test-ibm-ace-dashboard-icp4i-prod-754684647d-xhtcl     2/2     Running            0          21m   10.131.0.59    cluster3-w1.cluster3.storage-ocp.tuc.stglabs.ibm.com   <none>           <none>



Output from reading the /mnt and /mnt/data directories on 3 different pods:
========= POD  ace-test-ibm-ace-dashboard-icp4i-prod-754684647d-4zctr ===========
========= CONTAINER ace-test-ibm-ace-dashboard-icp4i-prod-content-server ===========
ls: cannot open directory '/home/contentserver/content/': Permission denied
/mnt:
total 0
drwxr-xr-x. 1 root root 18 Feb 13 17:13 .
drwxr-xr-x. 1 root root 29 Feb 13 17:13 ..
drwxrwxrwx  1 root root  1 Feb 12 15:10 data
ls: cannot open directory '/mnt/data': Permission denied
command terminated with exit code 2
========= POD  ace-test-ibm-ace-dashboard-icp4i-prod-754684647d-5jwmd ===========
========= CONTAINER ace-test-ibm-ace-dashboard-icp4i-prod-content-server ===========
/home/contentserver/content/:
total 0
drwxrwxr-x 2 contentserver contentserver 1 Feb 13 16:41 .
drwxrwxrwx 3 root          root          1 Feb 12 15:10 ..
-rw-r--r-- 1 contentserver contentserver 0 Feb 13 16:41 file

/mnt:
total 0
drwxr-xr-x. 1 root root 18 Feb 12 15:10 .
drwxr-xr-x. 1 root root 40 Feb 12 15:10 ..
drwxrwxrwx  3 root root  1 Feb 12 15:10 data

/mnt/data:
total 0
drwxrwxrwx  3 root          root           1 Feb 12 15:10 .
drwxr-xr-x. 1 root          root          18 Feb 12 15:10 ..
drwxrwxr-x  2 contentserver contentserver  1 Feb 13 16:41 content
========= POD  ace-test-ibm-ace-dashboard-icp4i-prod-754684647d-xhtcl ===========
========= CONTAINER ace-test-ibm-ace-dashboard-icp4i-prod-content-server ===========
/home/contentserver/content/:
total 0
drwxrwxr-x 2 contentserver contentserver 1 Feb 13 16:41 .
drwxrwxrwx 3 root          root          1 Feb 12 15:10 ..
-rw-r--r-- 1 contentserver contentserver 0 Feb 13 16:41 file

/mnt:
total 0
drwxr-xr-x. 1 root root 18 Feb 13 16:54 .
drwxr-xr-x. 1 root root 29 Feb 13 16:54 ..
drwxrwxrwx  3 root root  1 Feb 12 15:10 data

/mnt/data:
total 0
drwxrwxrwx  3 root          root           1 Feb 12 15:10 .
drwxr-xr-x. 1 root          root          18 Feb 13 16:54 ..
drwxrwxr-x  2 contentserver contentserver  1 Feb 13 16:41 content

PVC/PV Information:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com
  creationTimestamp: "2020-02-12T15:09:44Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/instance: ace-test
    app.kubernetes.io/managed-by: Tiller
    app.kubernetes.io/name: ibm-ace-dashboard-icp4i-prod
    helm.sh/chart: ibm-ace-dashboard-icp4i-prod
    release: ace-test
  name: ace-test-ibm-ace-dashboard-icp4i-prod-datapvc
  namespace: cp4i
  resourceVersion: "10882196"
  selfLink: /api/v1/namespaces/cp4i/persistentvolumeclaims/ace-test-ibm-ace-dashboard-icp4i-prod-datapvc
  uid: b3898951-4da9-11ea-9136-0050568bb752
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: ocs-storagecluster-cephfs
  volumeMode: Filesystem
  volumeName: pvc-b3898951-4da9-11ea-9136-0050568bb752
status:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 5Gi
  phase: Bound

---
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: openshift-storage.cephfs.csi.ceph.com
  creationTimestamp: "2020-02-12T15:09:47Z"
  finalizers:
  - kubernetes.io/pv-protection
  name: pvc-b3898951-4da9-11ea-9136-0050568bb752
  resourceVersion: "10882192"
  selfLink: /api/v1/persistentvolumes/pvc-b3898951-4da9-11ea-9136-0050568bb752
  uid: b56adbe9-4da9-11ea-9136-0050568bb752
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 5Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: ace-test-ibm-ace-dashboard-icp4i-prod-datapvc
    namespace: cp4i
    resourceVersion: "10882127"
    uid: b3898951-4da9-11ea-9136-0050568bb752
  csi:
    driver: openshift-storage.cephfs.csi.ceph.com
    fsType: ext4
    nodeStageSecretRef:
      name: rook-csi-cephfs-node
      namespace: openshift-storage
    volumeAttributes:
      clusterID: openshift-storage
      fsName: ocs-storagecluster-cephfilesystem
      storage.kubernetes.io/csiProvisionerIdentity: 1580850549732-8081-openshift-storage.cephfs.csi.ceph.com
    volumeHandle: 0001-0011-openshift-storage-0000000000000001-b4604110-4da9-11ea-84cc-0a580a820258
  persistentVolumeReclaimPolicy: Delete
  storageClassName: ocs-storagecluster-cephfs
  volumeMode: Filesystem
status:
  phase: Bound

Comment 2 Jilju Joy 2020-02-14 19:15:26 UTC
Couldn't reproduce this in

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.16    True        False         59m     Cluster version is 4.2.16

OCS 4.2.1



$ oc -n namespace-test-dd3b3655934941a19a985a71c65d68bd get pvc -o wide
NAME                                        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                AGE
pvc-test-58ae5edb8e72430e830e542b70e3c477   Bound    pvc-66c5d813-4f57-11ea-b241-005056be6183   10Gi       RWX            ocs-storagecluster-cephfs   8m51s
(python-venv-ocsci) [jijoy@localhost ocs-ci]$ 
(python-venv-ocsci) [jijoy@localhost ocs-ci]$ 
(python-venv-ocsci) [jijoy@localhost ocs-ci]$ 
(python-venv-ocsci) [jijoy@localhost ocs-ci]$ oc -n namespace-test-dd3b3655934941a19a985a71c65d68bd get pvc pvc-test-58ae5edb8e72430e830e542b70e3c477 -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com
  creationTimestamp: "2020-02-14T18:25:39Z"
  finalizers:
  - kubernetes.io/pvc-protection
  name: pvc-test-58ae5edb8e72430e830e542b70e3c477
  namespace: namespace-test-dd3b3655934941a19a985a71c65d68bd
  resourceVersion: "31341"
  selfLink: /api/v1/namespaces/namespace-test-dd3b3655934941a19a985a71c65d68bd/persistentvolumeclaims/pvc-test-58ae5edb8e72430e830e542b70e3c477
  uid: 66c5d813-4f57-11ea-b241-005056be6183
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  storageClassName: ocs-storagecluster-cephfs
  volumeMode: Filesystem
  volumeName: pvc-66c5d813-4f57-11ea-b241-005056be6183
status:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 10Gi
  phase: Bound
(python-venv-ocsci) [jijoy@localhost ocs-ci]$ 
(python-venv-ocsci) [jijoy@localhost ocs-ci]$
(python-venv-ocsci) [jijoy@localhost ocs-ci]$
(python-venv-ocsci) [jijoy@localhost ocs-ci]$ oc get pv pvc-66c5d813-4f57-11ea-b241-005056be6183 -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: openshift-storage.cephfs.csi.ceph.com
  creationTimestamp: "2020-02-14T18:25:41Z"
  finalizers:
  - kubernetes.io/pv-protection
  name: pvc-66c5d813-4f57-11ea-b241-005056be6183
  resourceVersion: "31338"
  selfLink: /api/v1/persistentvolumes/pvc-66c5d813-4f57-11ea-b241-005056be6183
  uid: 680c59ce-4f57-11ea-8213-005056bebd55
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 10Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: pvc-test-58ae5edb8e72430e830e542b70e3c477
    namespace: namespace-test-dd3b3655934941a19a985a71c65d68bd
    resourceVersion: "31321"
    uid: 66c5d813-4f57-11ea-b241-005056be6183
  csi:
    driver: openshift-storage.cephfs.csi.ceph.com
    fsType: ext4
    nodeStageSecretRef:
      name: rook-csi-cephfs-node
      namespace: openshift-storage
    volumeAttributes:
      clusterID: openshift-storage
      fsName: ocs-storagecluster-cephfilesystem
      storage.kubernetes.io/csiProvisionerIdentity: 1581703138112-8081-openshift-storage.cephfs.csi.ceph.com
    volumeHandle: 0001-0011-openshift-storage-0000000000000001-677b5171-4f57-11ea-b2e6-0a580a810010
  persistentVolumeReclaimPolicy: Delete
  storageClassName: ocs-storagecluster-cephfs
  volumeMode: Filesystem
status:
  phase: Bound
(python-venv-ocsci) [jijoy@localhost ocs-ci]$ 
(python-venv-ocsci) [jijoy@localhost ocs-ci]$ 
(python-venv-ocsci) [jijoy@localhost ocs-ci]$ oc -n namespace-test-dd3b3655934941a19a985a71c65d68bd get pods -o wide
NAME                                                        READY   STATUS      RESTARTS   AGE     IP            NODE        NOMINATED NODE   READINESS GATES
pod-test-cephfs-2e7302abf6a348ca94b8609b8928c948-1-2rtvw    1/1     Running     0          8m6s    10.130.0.33   compute-1   <none>           <none>
pod-test-cephfs-2e7302abf6a348ca94b8609b8928c948-1-bhgq9    1/1     Running     0          8m6s    10.131.0.33   compute-2   <none>           <none>
pod-test-cephfs-2e7302abf6a348ca94b8609b8928c948-1-cxkll    1/1     Running     0          8m6s    10.129.0.34   compute-0   <none>           <none>
pod-test-cephfs-2e7302abf6a348ca94b8609b8928c948-1-deploy   0/1     Completed   0          8m15s   10.130.0.32   compute-1   <none>           <none>
pod-test-cephfs-2e7302abf6a348ca94b8609b8928c948-1-tfcp9    1/1     Running     0          8m7s    10.131.0.32   compute-2   <none>           <none>
pod-test-cephfs-3b17278a28004d5ab9dec7cd0b64078e-1-2zbzb    1/1     Running     0          9m12s   10.130.0.29   compute-1   <none>           <none>
pod-test-cephfs-3b17278a28004d5ab9dec7cd0b64078e-1-deploy   0/1     Completed   0          9m20s   10.129.0.30   compute-0   <none>           <none>
pod-test-cephfs-3b17278a28004d5ab9dec7cd0b64078e-1-jdbfg    1/1     Running     0          9m12s   10.130.0.28   compute-1   <none>           <none>
pod-test-cephfs-3b17278a28004d5ab9dec7cd0b64078e-1-nllrr    1/1     Running     0          9m12s   10.131.0.30   compute-2   <none>           <none>
pod-test-cephfs-3b17278a28004d5ab9dec7cd0b64078e-1-p6qvx    1/1     Running     0          9m12s   10.129.0.31   compute-0   <none>           <none>
pod-test-cephfs-d10598795ce144b8956550bb0216d781-1-2rm2n    1/1     Running     0          10m     10.129.0.27   compute-0   <none>           <none>
pod-test-cephfs-d10598795ce144b8956550bb0216d781-1-4z5zl    1/1     Running     0          10m     10.129.0.28   compute-0   <none>           <none>
pod-test-cephfs-d10598795ce144b8956550bb0216d781-1-deploy   0/1     Completed   0          10m     10.130.0.24   compute-1   <none>           <none>
pod-test-cephfs-d10598795ce144b8956550bb0216d781-1-gld7x    1/1     Running     0          10m     10.130.0.25   compute-1   <none>           <none>
pod-test-cephfs-d10598795ce144b8956550bb0216d781-1-rt5dx    1/1     Running     0          10m     10.131.0.27   compute-2   <none>           <none>
pod-test-cephfs-ec43305aa17e43cbb08d798c603fce58-1-4dl8h    1/1     Running     0          9m46s   10.130.0.27   compute-1   <none>           <none>
pod-test-cephfs-ec43305aa17e43cbb08d798c603fce58-1-deploy   0/1     Completed   0          9m54s   10.130.0.26   compute-1   <none>           <none>
pod-test-cephfs-ec43305aa17e43cbb08d798c603fce58-1-fvd6q    1/1     Running     0          9m46s   10.131.0.28   compute-2   <none>           <none>
pod-test-cephfs-ec43305aa17e43cbb08d798c603fce58-1-g9p7n    1/1     Running     0          9m46s   10.129.0.29   compute-0   <none>           <none>
pod-test-cephfs-ec43305aa17e43cbb08d798c603fce58-1-pspmm    1/1     Running     0          9m46s   10.131.0.29   compute-2   <none>           <none>
pod-test-cephfs-f0cd8f01aee34f179df755a94540412f-1-7gcjq    1/1     Running     0          8m40s   10.130.0.31   compute-1   <none>           <none>
pod-test-cephfs-f0cd8f01aee34f179df755a94540412f-1-deploy   0/1     Completed   0          8m48s   10.130.0.30   compute-1   <none>           <none>
pod-test-cephfs-f0cd8f01aee34f179df755a94540412f-1-dwh9h    1/1     Running     0          8m40s   10.131.0.31   compute-2   <none>           <none>
pod-test-cephfs-f0cd8f01aee34f179df755a94540412f-1-gr8j2    1/1     Running     0          8m40s   10.129.0.32   compute-0   <none>           <none>
pod-test-cephfs-f0cd8f01aee34f179df755a94540412f-1-t28gr    1/1     Running     0          8m40s   10.129.0.33   compute-0   <none>           <none>
pod-test-cephfs-f282572fca0a4b99b5130b68f89d17c4-1-bdmsg    1/1     Running     0          11m     10.131.0.26   compute-2   <none>           <none>
pod-test-cephfs-f282572fca0a4b99b5130b68f89d17c4-1-deploy   0/1     Completed   0          11m     10.129.0.25   compute-0   <none>           <none>
pod-test-cephfs-f282572fca0a4b99b5130b68f89d17c4-1-jqfz5    1/1     Running     0          11m     10.130.0.23   compute-1   <none>           <none>
pod-test-cephfs-f282572fca0a4b99b5130b68f89d17c4-1-p2qnp    1/1     Running     0          11m     10.130.0.22   compute-1   <none>           <none>
pod-test-cephfs-f282572fca0a4b99b5130b68f89d17c4-1-tn5gz    1/1     Running     0          11m     10.129.0.26   compute-0   <none>           <none>
(python-venv-ocsci) [jijoy@localhost ocs-ci]$
(python-venv-ocsci) [jijoy@localhost ocs-ci]$
(python-venv-ocsci) [jijoy@localhost ocs-ci]$ oc rsh pod-test-cephfs-2e7302abf6a348ca94b8609b8928c948-1-2rtvw
sh-5.0# cd /mnt/
sh-5.0# ls data/
f1
sh-5.0# exit
exit
(python-venv-ocsci) [jijoy@localhost ocs-ci]$ oc rsh pod-test-cephfs-2e7302abf6a348ca94b8609b8928c948-1-bhgq9
sh-5.0# cd /mnt/
sh-5.0# ls data/
f1




Read write permission is granted to all the pods. Able to write from different pods.

Comment 3 Matt LeVan 2020-02-14 19:43:50 UTC
Forgot one important detail.  This is a mixed RHEL/RHCOS installation on premises.

Node cluster3-w1 is RHCOS.
Nodes cluster3-w2,w3 are RHEL

oc get nodes -o wide
NAME                                                   STATUS   ROLES                                     AGE   VERSION             INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                   KERNEL-VERSION                CONTAINER-RUNTIME
cluster3-m1.cluster3.storage-ocp.tuc.stglabs.ibm.com   Ready    master                                    23d   v1.14.6+8e46c0036   9.11.221.51   9.11.221.51   Red Hat Enterprise Linux CoreOS 42.81.20191223.0 (Ootpa)   4.18.0-147.3.1.el8_1.x86_64   cri-o://1.14.11-4.dev.rhaos4.2.git179ea6b.el8
cluster3-m2.cluster3.storage-ocp.tuc.stglabs.ibm.com   Ready    master                                    23d   v1.14.6+8e46c0036   9.11.221.52   9.11.221.52   Red Hat Enterprise Linux CoreOS 42.81.20191223.0 (Ootpa)   4.18.0-147.3.1.el8_1.x86_64   cri-o://1.14.11-4.dev.rhaos4.2.git179ea6b.el8
cluster3-m3.cluster3.storage-ocp.tuc.stglabs.ibm.com   Ready    master                                    23d   v1.14.6+8e46c0036   9.11.221.53   9.11.221.53   Red Hat Enterprise Linux CoreOS 42.81.20191223.0 (Ootpa)   4.18.0-147.3.1.el8_1.x86_64   cri-o://1.14.11-4.dev.rhaos4.2.git179ea6b.el8
cluster3-w1.cluster3.storage-ocp.tuc.stglabs.ibm.com   Ready    cp-management,cp-master,cp-proxy,worker   9d    v1.14.6+8e46c0036   9.11.221.54   9.11.221.54   Red Hat Enterprise Linux CoreOS 42.81.20191223.0 (Ootpa)   4.18.0-147.3.1.el8_1.x86_64   cri-o://1.14.11-4.dev.rhaos4.2.git179ea6b.el8
cluster3-w2.cluster3.storage-ocp.tuc.stglabs.ibm.com   Ready    cp-management,cp-master,cp-proxy,worker   22d   v1.14.6+33ddc76e4   9.11.221.55   9.11.221.55   OpenShift Enterprise                                       3.10.0-1062.9.1.el7.x86_64    cri-o://1.14.11-7.dev.rhaos4.2.git627b85c.el7
cluster3-w3.cluster3.storage-ocp.tuc.stglabs.ibm.com   Ready    cp-management,cp-master,cp-proxy,worker   21d   v1.14.6+33ddc76e4   9.11.221.56   9.11.221.56   OpenShift Enterprise                                       3.10.0-1062.9.1.el7.x86_64    cri-o://1.14.11-7.dev.rhaos4.2.git627b85c.el7

Comment 4 Shyamsundar 2020-02-15 03:29:48 UTC
The default mounter used by the CSI plugins, for CephFS, is the kernel CephFS client. The kernel version mentioned for w2 and w3 instances seem older (3.10.0-1062), but still supports cephFS quotas [1], and should have been selected as the default mounter. On RHCOS (version 4.18.0) the CSI drivers would have defaulted to the kernel mounter.

Just a thought: This looks like some cephfs kernel client versions interaction issue, or one of the instances possibly defaulting to the fuse based client and causing the said problem. @rraja or @patrick, any known issues around such cross kernel client versions cephFS access? 

@matt.levan Request logs from all the CSI nodeplugin containers to understand what mounter was used. Even better, if we could get OCS must-gather output from the cluster [2], it would contain other logs of interest, to troubleshoot what maybe going wrong.

[1] CephFS kernel mounter decision: https://github.com/ceph/ceph-csi/blob/72ac53b6b0adbe3b19b9e58aaafce07c98fb4b99/pkg/cephfs/volumemounter.go#L80-L83
[2] OCS must-gather: https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.2/html/troubleshooting_openshift_container_storage/downloading-log-files-and-diagnostic-information_rhocs

Comment 5 Matt LeVan 2020-02-15 18:15:29 UTC
Here is a link to the must-gather output on IBM Enterprise Box:  https://ibm.box.com/s/qxojtbtfwy7fwots1sj31pfg81cl7pk7

Feedback on ocs-must-gather documentation.  When not specifying the image tag it failed pulling latest.  
oc adm must-gather --image=quay.io/rhceph-dev/ocs-must-gather --dest-dir=ocs-must-gather
[must-gather      ] OUT Using must-gather plugin-in image: quay.io/rhceph-dev/ocs-must-gather
[must-gather      ] OUT namespace/openshift-must-gather-7g4qg created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-t55j6 created
[must-gather      ] OUT pod for plug-in image quay.io/rhceph-dev/ocs-must-gather created
[must-gather-lq9gm] OUT gather did not start: unable to pull image: ErrImagePull: rpc error: code = Unknown desc = Error reading manifest latest in quay.io/rhceph-dev/ocs-must-gather: unknown: Tag latest was deleted or has expired. To pull, revive via time machine


Had to go to quay.io and find that the appropiate tag is --image=quay.io/rhceph-dev/ocs-must-gather:latest-4.2.

Comment 6 Sidhant Agrawal 2020-02-15 19:35:12 UTC
@Matt
As the cluster3-w2,w3 are RHEL based nodes, does the container use of the CephFS in SELinux enabled on those nodes by following the documentation [1].
Please ignore if already did.


[1] https://access.redhat.com/documentation/en-us/red_hat_openshift_container_storage/4.2/html/deploying_openshift_container_storage/deploying-openshift-container-storage#enabling-file-system-access-for-containers-on-red-hat-enterprise-linux-based-nodes_rhocs

Comment 7 Matt LeVan 2020-02-16 14:22:49 UTC
@sagrawal
You are correct, I did not set the CephFS in SELinux on the RHEL nodes.  This resolved the problem.  Please close as user error.

Comment 8 Sridhar Venkat (IBM) 2022-03-14 12:46:55 UTC
I am reopening this bug. Adharshdeep Cheema is able to reproduce this problem:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: myfsclaim
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 75Gi
  storageClassName: rook-cephfs

---
apiVersion: v1
kind: Pod
metadata:
  name: aaruni-demo-pod-fs2
spec:
  replicas: 2
  nodeName: worker1.nazare-test.os.fyre.ibm.com
  containers:
    - env:
      name: web-server
      image: quay.io/ocsci/nginx:latest
      volumeMounts:
        - name: mypvc
          mountPath: /var/lib/www/html
  volumes:
    - name: mypvc
      persistentVolumeClaim:
        claimName: myfsclaim
        readOnly: false

---
apiVersion: v1
kind: Pod
metadata:
  name: aaruni-demo-pod-fs1
spec:
  replicas: 2
  nodeName: worker1.nazare-test.os.fyre.ibm.com
  containers:
    - env:
      name: web-server
      image: quay.io/ocsci/nginx:latest
      volumeMounts:
        - name: mypvc
          mountPath: /var/lib/www/html
  volumes:
    - name: mypvc
      persistentVolumeClaim:
        claimName: myfsclaim
        readOnly: false
OUTPUT:
adarshdeepsinghcheema@Adarshdeeps-MacBook-Pro playbooks % oc get pod
NAME                  READY   STATUS    RESTARTS   AGE
aaruni-demo-pod-fs1   1/1     Running   0          23s
aaruni-demo-pod-fs2   1/1     Running   0          24s
adarshdeepsinghcheema@Adarshdeeps-MacBook-Pro playbooks % kubectl exec --stdin --tty aaruni-demo-pod-fs2 -- /bin/bash 
root@aaruni-demo-pod-fs2:/# cd /var/lib/www/html
bash: cd: /var/lib/www/html: Permission denied
root@aaruni-demo-pod-fs2:/# exit
exit
command terminated with exit code 1
adarshdeepsinghcheema@Adarshdeeps-MacBook-Pro playbooks % kubectl exec --stdin --tty aaruni-demo-pod-fs1 -- /bin/bash 
root@aaruni-demo-pod-fs1:/# cd /var/lib/www/html
root@aaruni-demo-pod-fs1:/var/lib/www/html# exit
exit
adarshdeepsinghcheema@Adarshdeeps-MacBook-Pro playbooks % oc get pvc                                                  
NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
cephfs-pvc   Bound    pvc-a2689039-6146-4e01-829c-f486683d349b   1Gi        RWO            rook-cephfs       245d
myfsclaim    Bound    pvc-051251ed-c6a3-4d18-9e37-0248b603109c   75Gi       RWX            rook-cephfs       2m43s
rbd-pvc      Bound    pvc-9a805f87-c648-47d4-974b-d54cd36a50d9   1Gi        RWO            rook-ceph-block   245d

Comment 9 Sridhar Venkat (IBM) 2022-03-14 12:48:48 UTC
Adarshdeep feels that the SELinux aspect of the problem resolution is not right.

Comment 10 Sridhar Venkat (IBM) 2022-03-14 14:58:32 UTC
I was asked to open a new bug. I opened one : https://bugzilla.redhat.com/show_bug.cgi?id=2063881