Bug 2094320 - Pods are stuck in CreateContainerError because of blocklisting
Summary: Pods are stuck in CreateContainerError because of blocklisting
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: csi-driver
Version: 4.10
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: yati padia
QA Contact: Pratik Surve
URL:
Whiteboard:
: 2136416 (view as bug list)
Depends On:
Blocks: 2107226 2138210
TreeView+ depends on / blocked
 
Reported: 2022-06-07 11:38 UTC by Pratik Surve
Modified: 2023-08-25 06:04 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-25 06:04:25 UTC
Embargoed:


Attachments (Terms of Use)

Description Pratik Surve 2022-06-07 11:38:50 UTC
Description of problem (please be detailed as possible and provide log
snippets):

Pods are stuck in CreateContainerError with msg Error: relabel failed /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount: lsetxattr /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount/#ib_16384_0.dblwr: read-only file system

Version of all relevant components (if applicable):

OCP version:- 4.10.0-0.nightly-2022-05-26-102501
ODF version:- 4.10.3-4
CEPH version:- ceph version 16.2.7-112.el8cp (e18db2ff03ac60c64a18f3315c032b9d5a0a3b8f) pacific (stable)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy the RDR cluster 
2. Deploy IO and keep it for a long time
3. After some days we will see pods in the CreateContainerError state
 

Actual results:

Events:
  Type     Reason          Age                    From               Message
  ----     ------          ----                   ----               -------
  Normal   Scheduled       93m                    default-scheduler  Successfully assigned busybox-workloads-1/mysql-68f8bf78fd-s5gmz to compute-1
  Normal   AddedInterface  93m                    multus             Add eth0 [10.128.2.142/23] from openshift-sdn
  Normal   Pulled          93m                    kubelet            Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.237571543s
  Normal   Pulled          92m                    kubelet            Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.515721369s
  Warning  Failed          92m                    kubelet            Error: relabel failed /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount: lsetxattr /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount/ibtmp1: read-only file system
  Normal   Pulled          92m                    kubelet            Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.319901196s
  Normal   Pulled          92m                    kubelet            Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.49214362s
  Normal   Pulled          92m                    kubelet            Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.274508132s
  Normal   Pulled          92m                    kubelet            Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.412132002s
  Normal   Pulled          91m                    kubelet            Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.294925949s
  Warning  Failed          91m (x7 over 93m)      kubelet            Error: relabel failed /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount: lsetxattr /var/lib/kubelet/pods/cb27938e-f66f-401d-85f0-9eb5cf565ace/volumes/kubernetes.io~csi/pvc-86e7da91-29f9-4418-80a7-4ae7610bb613/mount/#ib_16384_0.dblwr: read-only file system
  Normal   Pulled          91m                    kubelet            Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.472227825s
  Normal   Pulled          18m (x302 over 91m)    kubelet            (combined from similar events): Successfully pulled image "quay.io/prsurve/mysql:latest" in 1.436491104s
  Normal   Pulling         2m59s (x372 over 93m)  kubelet            Pulling image "quay.io/prsurve/mysql:latest"


Expected results:
There should not be any issue 

Additional info:

Comment 4 Mudit Agarwal 2022-06-21 13:27:05 UTC
Mostly a ceph fix, will wait for Yati's update.
Not a 4.11 blocker, moving it out.

Comment 23 Shyamsundar 2023-01-20 14:00:51 UTC
*** Bug 2136416 has been marked as a duplicate of this bug. ***

Comment 37 Elad 2023-06-19 06:02:59 UTC
Moving to 4.13.z for verification purposes

Comment 39 Divyansh Kamboj 2023-06-20 06:44:46 UTC
The ODFRBDClientBlocked alert is triggered when an RBD client gets blocked by Ceph on a specific node within the Kubernetes cluster. This occurs when the metric ocs_rbd_client_blocklisted reports a value of 1 for the node, indicating that it has been blocklisted. Additionally, the alert is triggered if there are pods in a CreateContainerError state on the same node. 
We cannot identify from ODF side, if krbd client is blocklisted or some other client, so we also check for the CreateContainerError to raise the alert.

https://issues.redhat.com/browse/OCSDOCS-1112
might be tracking the documentation effort


Note You need to log in before you can comment on or make changes to this bug.