Description of problem (please be detailed as possible and provide log snippets): Pods are stuck in CreateContainerError with msg Error: relabel failed /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount: lsetxattr /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount/#ib_16384_0.dblwr: read-only file system Version of all relevant components (if applicable): OCP version:- 4.10.0-0.nightly-2022-05-26-102501 ODF version:- 4.10.3-4 CEPH version:- ceph version 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy the RDR cluster 2. Deploy IO and keep it for a long time 3. After some days we will see pods in the CreateContainerError state Actual results: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 93m default-scheduler Successfully assigned busybox-workloads-1/mysql-68f8bf78fd-s5gmz to compute-1 Normal AddedInterface 93m multus Add eth0 [10.128.2.142/23] from openshift-sdn Normal Pulled 93m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.237571543s Normal Pulled 92m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.515721369s Warning Failed 92m kubelet Error: relabel failed /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount: lsetxattr /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount/ibtmp1: read-only file system Normal Pulled 92m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.319901196s Normal Pulled 92m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.49214362s Normal Pulled 92m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.274508132s Normal Pulled 92m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.412132002s Normal Pulled 91m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.294925949s Warning Failed 91m (x7 over 93m) kubelet Error: relabel failed /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount: lsetxattr /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount/#ib_16384_0.dblwr: read-only file system Normal Pulled 91m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.472227825s Normal Pulled 18m (x302 over 91m) kubelet (combined from similar events): Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.436491104s Normal Pulling 2m59s (x372 over 93m) kubelet Pulling image "quay.io/prsurve/mysql:latest" Expected results: There should not be any issue Additional info:
Mostly a ceph fix, will wait for Yati's update. Not a 4.11 blocker, moving it out.
*** Bug 2136416 has been marked as a duplicate of this bug. ***
Moving to 4.13.z for verification purposes
The ODFRBDClientBlocked alert is triggered when an RBD client gets blocked by Ceph on a specific node within the Kubernetes cluster. This occurs when the metric ocs_rbd_client_blocklisted reports a value of 1 for the node, indicating that it has been blocklisted. Additionally, the alert is triggered if there are pods in a CreateContainerError state on the same node. We cannot identify from ODF side, if krbd client is blocklisted or some other client, so we also check for the CreateContainerError to raise the alert. https://issues.redhat.com/browse/OCSDOCS-1112 might be tracking the documentation effort