Description of problem (please be detailed as possible and provide log
snippets):
Pods are stuck in CreateContainerError with msg Error: relabel failed /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount: lsetxattr /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount/#ib_16384_0.dblwr: read-only file system
Version of all relevant components (if applicable):
OCP version:- 4.10.0-0.nightly-2022-05-26-102501
ODF version:- 4.10.3-4
CEPH version:- ceph version 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable)
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes
Is there any workaround available to the best of your knowledge?
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
Can this issue reproducible?
Can this issue reproduce from the UI?
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
1. Deploy the RDR cluster
2. Deploy IO and keep it for a long time
3. After some days we will see pods in the CreateContainerError state
Actual results:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 93m default-scheduler Successfully assigned busybox-workloads-1/mysql-68f8bf78fd-s5gmz to compute-1
Normal AddedInterface 93m multus Add eth0 [10.128.2.142/23] from openshift-sdn
Normal Pulled 93m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.237571543s
Normal Pulled 92m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.515721369s
Warning Failed 92m kubelet Error: relabel failed /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount: lsetxattr /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount/ibtmp1: read-only file system
Normal Pulled 92m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.319901196s
Normal Pulled 92m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.49214362s
Normal Pulled 92m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.274508132s
Normal Pulled 92m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.412132002s
Normal Pulled 91m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.294925949s
Warning Failed 91m (x7 over 93m) kubelet Error: relabel failed /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount: lsetxattr /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount/#ib_16384_0.dblwr: read-only file system
Normal Pulled 91m kubelet Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.472227825s
Normal Pulled 18m (x302 over 91m) kubelet (combined from similar events): Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.436491104s
Normal Pulling 2m59s (x372 over 93m) kubelet Pulling image "quay.io/prsurve/mysql:latest"
Expected results:
There should not be any issue
Additional info:
The ODFRBDClientBlocked alert is triggered when an RBD client gets blocked by Ceph on a specific node within the Kubernetes cluster. This occurs when the metric ocs_rbd_client_blocklisted reports a value of 1 for the node, indicating that it has been blocklisted. Additionally, the alert is triggered if there are pods in a CreateContainerError state on the same node.
We cannot identify from ODF side, if krbd client is blocklisted or some other client, so we also check for the CreateContainerError to raise the alert.
https://issues.redhat.com/browse/OCSDOCS-1112
might be tracking the documentation effort