Bug 2139031

Summary: Pods that are using ocs-storagecluster-cephfs StorageClass failing with Error: failed to resolve symlink "/var/lib/kubelet/pods" permission denied
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Ashwini M. Khaire <akhaire>
Component: cephAssignee: Scott Ostapovicz <sostapov>
Status: NEW --- QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: bniver, sostapov
Target Milestone: ---Flags: akhaire: needinfo? (sostapov)
mrajanna: needinfo? (akhaire)
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ashwini M. Khaire 2022-11-01 03:36:16 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Pods that are using storage from the `ocs-storagecluster-cephfs` StorageClass are stuck into CreateContainerError post restart and failing with permission denied errors on a specific node.

~~~
Warning  Failed                  5m11s (x12 over 7m16s)  kubelet                  Error: failed to resolve symlink "/var/lib/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx/volumes/kubernetes.io~csi/pvc-xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx/mount": lstat /var/lib/kubelet/pods/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx/volumes/kubernetes.io~csi/pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx/mount: permission denied
~~~

Version of all relevant components (if applicable):

OpenShift Cluster Version: v4.10
OpenShift Container version: v4.10

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

The issue is impacting CU's production workload.

Is there any workaround available to the best of your knowledge?

Re-scheduling the pod to another node seems to resolve the issue.

Actual results:

The pod is failing with the permission denied error while using the `ocs-storagecluster-cephfs` StorageClass.

Expected results:

The pod should be up and running fine.