Description of problem (please be detailed as possible and provide log snippests): [Stretch cluster] Arbiter node replacement results in pod restarts and intermittent unresponsive cluster Version of all relevant components (if applicable): 4.13.0-0.nightly-2023-05-25-001936 and ODF 4.13.0-203 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? NA If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Install OCP cluster with 6 worker and 4 master nodes 2.Install local volume operator and add disks on 6 worker nodes 3. Label worker nodes in datazone 1 and 2 and label one amster node as arbiter and rest two in datazone 1 and 2 3. Install ODF and create storage system by choosing 6 worker nodes 4. scale down mon and other pods on the arbiter node 5. Delete the arbiter node through oc command and also delete the VM 6. label the other control-plane-node as arbiter and also add openshift data foundation label to it 7. wait for the new mon to come up on newly added arbiter node Actual results: The mon takes around 10-15 minutes to come up on newly added arbiter node and following pods are restarted during the process csi-addons-controller-manager-5b94568cf8-f4f9g (7 restarts) csi-cephfsplugin-provisioner-554f966f47-pgr8x (10 restarts) csi-cephfsplugin-provisioner-554f966f47-zks9h (2 restarts) csi-rbdplugin-provisioner-7999676974-fqf77 (11 restarts) csi-rbdplugin-provisioner-7999676974-wk27s (1 restart) noobaa-operator-5946d77759-d9dl8 (10 restart) ocs-operator-776f898d4-82r4w (7 restart) odf-operator-controller-manager-7958d76ddc-j4k9r (3 restart) Also during the process intermittently cluster becomes inaccessible while running oc commands rook-ceph-osd-prepare-63b2f492a51089f46df627dd0cfee0ba-fqkvf 0/1 Completed 0 22m 10.131.2.18 compute-3 <none> <none> rook-ceph-osd-prepare-d0fd414b7d2c277941386eceb4897361-zsxl2 0/1 Completed 0 22m 10.129.4.25 compute-4 <none> <none> rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-5769db78f7vt 2/2 Running 0 21m 10.128.2.29 compute-0 <none> <none> rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-5769db7jwp49 2/2 Running 0 21m 10.129.2.19 compute-5 <none> <none> rook-ceph-tools-5845b7c568-fvsz8 1/1 Running 0 23m 10.128.4.27 compute-1 <none> <none> [jopinto@jopinto nodes]$ oc get pods -o wide Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods) [jopinto@jopinto nodes]$ oc get pods -o wide Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods) [jopinto@jopinto nodes]$ oc get pods -o wide Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods) [jopinto@jopinto nodes]$ oc get pods -o wide Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods) [jopinto@jopinto nodes]$ oc get pods -o wide Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods) [jopinto@jopinto nodes]$ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES csi-addons-controller-manager-5b94568cf8-f4f9g 2/2 Running 5 (6m40s ago) 33m 10.128.4.22 compute-1 <none> <none> csi-cephfsplugin-6ms62 2/2 Running 0 27m 10.1.161.85 compute-4 <none> <none> csi-cephfsplugin-9gxlz 2/2 Running 0 27m 10.1.160.245 compute-0 <none> <none> csi-cephfsplugin-g4nwv 2/2 Running 0 27m 10.1.161.82 compute-2 <none> <none> csi-cephfsplugin-jtspt 2/2 Running 0 27m 10.1.161.16 compute-3 <none> <none> csi-cephfsplugin-provisioner-554f966f47-pgr8x 5/5 Running 4 (8m10s ago) 27m 10.129.2.12 compute-5 <none> <none> csi-cephfsplugin-provisioner-554f966f47-zks9h 5/5 Running 0 27m 10.128.2.27 compute-0 <none> <none> csi-cephfsplugin-sh9bh 2/2 Running 0 27m 10.1.161.21 compute-5 <none> <none> csi-cephfsplugin-wfqn7 2/2 Running 0 27m 10.1.160.253 compute-1 <none> <none> csi-rbdplugin-8cx7v 3/3 Running 0 27m 10.1.161.82 compute-2 <none> <none> csi-rbdplugin-9q742 3/3 Running 0 27m 10.1.161.85 compute-4 <none> <none> csi-rbdplugin-fztvp 3/3 Running 0 27m 10.1.161.21 compute-5 <none> <none> csi-rbdplugin-gfhqq 3/3 Running 0 27m 10.1.160.253 compute-1 <none> <none> csi-rbdplugin-lg6rq 3/3 Running 0 27m 10.1.161.16 compute-3 <none> <none> csi-rbdplugin-provisioner-7999676974-fqf77 6/6 Running 4 (8m1s ago) 27m 10.129.4.20 compute-4 <none> <none> csi-rbdplugin-provisioner-7999676974-wk27s 6/6 Running 0 27m 10.128.4.26 compute-1 <none> <none> csi-rbdplugin-zq8jj 3/3 Running 0 27m 10.1.160.245 compute-0 <none> <none> noobaa-core-0 1/1 Running 0 24m 10.128.2.32 compute-0 <none> <none> noobaa-db-pg-0 Also node topology in storage cluster yaml is changes post arbiter node change Before arbiter node replacemnet: kmsServerConnection: {} nodeTopologies: labels: kubernetes.io/hostname: - compute-0 - compute-1 - compute-2 - compute-3 - compute-4 - compute-5 topology.kubernetes.io/zone: - data-1 - data-2 phase: Ready relatedObjects: - apiVersion: ceph.rook.io/v1 kind: CephCluster name: ocs-storagecluster-cephcluster After arbiter node replacemnet: kmsServerConnection: {} nodeTopologies: labels: kubernetes.io/hostname: - compute-0 - compute-1 - compute-2 - compute-3 - compute-4 - compute-5 - control-plane-3 topology.kubernetes.io/zone: - arbiter - data-1 - data-2 phase: Ready relatedObjects: - apiVersion: ceph.rook.io/v1 kind: CephCluster Expected results: Restarts should not be seen and cluster should be accessible Additional info: Logs of all pods that were restarted are placed in http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-arbiternode/ [jopinto@jopinto nodes]$ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES csi-addons-controller-manager-5b94568cf8-f4f9g 2/2 Running 7 (12m ago) 49m 10.128.4.22 compute-1 <none> <none> csi-cephfsplugin-6ms62 2/2 Running 0 43m 10.1.161.85 compute-4 <none> <none> csi-cephfsplugin-9gxlz 2/2 Running 0 43m 10.1.160.245 compute-0 <none> <none> csi-cephfsplugin-g4nwv 2/2 Running 0 43m 10.1.161.82 compute-2 <none> <none> csi-cephfsplugin-jtspt 2/2 Running 0 43m 10.1.161.16 compute-3 <none> <none> csi-cephfsplugin-provisioner-554f966f47-pgr8x 5/5 Running 10 (10m ago) 43m 10.129.2.12 compute-5 <none> <none> csi-cephfsplugin-provisioner-554f966f47-zks9h 5/5 Running 2 (10m ago) 43m 10.128.2.27 compute-0 <none> <none> csi-cephfsplugin-sh9bh 2/2 Running 0 43m 10.1.161.21 compute-5 <none> <none> csi-cephfsplugin-wfqn7 2/2 Running 0 43m 10.1.160.253 compute-1 <none> <none> csi-rbdplugin-8cx7v 3/3 Running 0 43m 10.1.161.82 compute-2 <none> <none> csi-rbdplugin-9q742 3/3 Running 0 43m 10.1.161.85 compute-4 <none> <none> csi-rbdplugin-fztvp 3/3 Running 0 43m 10.1.161.21 compute-5 <none> <none> csi-rbdplugin-gfhqq 3/3 Running 0 43m 10.1.160.253 compute-1 <none> <none> csi-rbdplugin-lg6rq 3/3 Running 0 43m 10.1.161.16 compute-3 <none> <none> csi-rbdplugin-provisioner-7999676974-fqf77 6/6 Running 11 (10m ago) 43m 10.129.4.20 compute-4 <none> <none> csi-rbdplugin-provisioner-7999676974-wk27s 6/6 Running 1 (16m ago) 43m 10.128.4.26 compute-1 <none> <none> csi-rbdplugin-zq8jj 3/3 Running 0 43m 10.1.160.245 compute-0 <none> <none> noobaa-core-0 1/1 Running 0 40m 10.128.2.32 compute-0 <none> <none> noobaa-db-pg-0 1/1 Running 0 40m 10.131.2.21 compute-3 <none> <none> noobaa-endpoint-69bdfdcc8-9x58x 1/1 Running 0 39m 10.128.2.36 compute-0 <none> <none> noobaa-operator-5946d77759-d9dl8 1/1 Running 7 (11m ago) 50m 10.131.2.11 compute-3 <none> <none> ocs-metrics-exporter-988d5648-gb22d 1/1 Running 0 49m 10.128.4.21 compute-1 <none> <none> ocs-operator-776f898d4-82r4w 1/1 Running 7 (12m ago) 49m 10.128.4.20 compute-1 <none> <none> odf-console-85c7c76fb-7k8f5 1/1 Running 0 49m 10.129.4.16 compute-4 <none> <none> odf-operator-controller-manager-7958d76ddc-j4k9r 2/2 Running 3 (12m ago) 49m 10.128.2.24 compute-0 <none> <none> rook-ceph-crashcollector-compute-0-548ffc87cd-twhw4 1/1 Running 0 40m 10.128.2.30 compute-0 <none> <none> rook-ceph-crashcollector-compute-1-7f84777ff-qb7rf 1/1 Running 0 41m 10.128.4.29 compute-1 <none> <none> rook-ceph-crashcollector-compute-2-666849d6b4-plk9c 1/1 Running 0 40m 10.130.2.27 compute-2 <none> <none> rook-ceph-crashcollector-compute-3-74f749855f-xv767 1/1 Running 0 41m 10.131.2.16 compute-3 <none> <none> rook-ceph-crashcollector-compute-4-5ff6d6bc88-5t6tz 1/1 Running 0 41m 10.129.4.23 compute-4 <none> <none> rook-ceph-crashcollector-compute-5-7c674b8559-zcxsn 1/1 Running 0 40m 10.129.2.20 compute-5 <none> <none> rook-ceph-crashcollector-control-plane-3-5cf7f6bbc9-7djlx 1/1 Running 0 3m44s 10.131.0.8 control-plane-3 <none> <none> rook-ceph-exporter-compute-0-59db8bf49f-pvf5n 1/1 Running 0 40m 10.128.2.31 compute-0 <none> <none> rook-ceph-exporter-compute-1-59566cb48b-c9w2g 1/1 Running 0 41m 10.128.4.30 compute-1 <none> <none> rook-ceph-exporter-compute-2-56f55dd8f7-rqcd5 1/1 Running 0 40m 10.130.2.28 compute-2 <none> <none> rook-ceph-exporter-compute-3-5bbb487f5d-9gghn 1/1 Running 0 41m 10.131.2.17 compute-3 <none> <none> rook-ceph-exporter-compute-4-7b4dcf57d9-9gvgx 1/1 Running 0 41m 10.129.4.24 compute-4 <none> <none> rook-ceph-exporter-compute-5-65c98c4446-mdgx2 1/1 Running 0 40m 10.129.2.21 compute-5 <none> <none> rook-ceph-exporter-control-plane-3-854bc78f4-z7lq6 1/1 Running 0 3m44s 10.131.0.9 control-plane-3 <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7c4df569vfd5n 2/2 Running 0 40m 10.128.4.33 compute-1 <none> <none> rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6b9bddd9hjch9 2/2 Running 0 40m 10.130.2.26 compute-2 <none> <none> rook-ceph-mgr-a-6df8cf9876-l8t4f 3/3 Running 0 41m 10.130.2.19 compute-2 <none> <none> rook-ceph-mgr-b-7885b7cc9f-sxldl 3/3 Running 0 41m 10.128.4.28 compute-1 <none> <none> rook-ceph-mon-a-85df56bb6f-4zqnv 2/2 Running 0 42m 10.129.4.22 compute-4 <none> <none> rook-ceph-mon-b-865c9fcbb-r5f7m 2/2 Running 0 42m 10.130.2.18 compute-2 <none> <none> rook-ceph-mon-c-d9f4f5777-cgxgj 2/2 Running 0 42m 10.131.2.15 compute-3 <none> <none> rook-ceph-mon-d-55d95f6f6b-pbgsh 2/2 Running 0 42m 10.129.2.14 compute-5 <none> <none> rook-ceph-mon-f-75c8cb5d89-xfdvb 2/2 Running 0 4m14s 10.131.0.7 control-plane-3 <none> <none> rook-ceph-operator-7477d84999-ndfgh 1/1 Running 0 43m 10.129.4.19 compute-4 <none> <none> rook-ceph-osd-0-86f4cfcf76-6hrqp 2/2 Running 0 41m 10.129.2.18 compute-5 <none> <none> rook-ceph-osd-1-f59f54664-xwk78 2/2 Running 0 41m 10.130.2.23 compute-2 <none> <none> rook-ceph-osd-2-988566f77-jc2j9 2/2 Running 0 41m 10.131.2.19 compute-3 <none> <none> rook-ceph-osd-3-f4bb5b998-f45xh 2/2 Running 0 41m 10.129.4.26 compute-4 <none> <none> rook-ceph-osd-prepare-28572a8ce9ae9bb114424170be420dfc-92rrq 0/1 Completed 0 41m 10.130.2.22 compute-2 <none> <none> rook-ceph-osd-prepare-455dc68f9e2aa40ba2f2c0c2c20c8ac6-cztb9 0/1 Completed 0 41m 10.129.2.17 compute-5 <none> <none> rook-ceph-osd-prepare-63b2f492a51089f46df627dd0cfee0ba-fqkvf 0/1 Completed 0 41m 10.131.2.18 compute-3 <none> <none> rook-ceph-osd-prepare-d0fd414b7d2c277941386eceb4897361-zsxl2 0/1 Completed 0 41m 10.129.4.25 compute-4 <none> <none> rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-5769db78f7vt 2/2 Running 0 40m 10.128.2.29 compute-0 <none> <none> rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-5769db7jwp49 2/2 Running 0 40m 10.129.2.19 compute-5 <none> <none> rook-ceph-tools-5845b7c568-fvsz8 1/1 Running 0 42m 10.128.4.27 compute-1 <none> <none> [jopinto@jopinto nodes]$ oc rsh rook-ceph-tools-5845b7c568-fvsz8
Please add odf must gather to this BZ.
From rook logs: ``` 2023-05-30 10:50:06.213851 W | op-mon: mon "e" not found in quorum, waiting for timeout (554 seconds left) before failover 2023-05-30 10:50:51.624839 W | op-mon: mon "e" not found in quorum, waiting for timeout (509 seconds left) before failover 2023-05-30 10:51:37.032913 W | op-mon: mon "e" not found in quorum, waiting for timeout (463 seconds left) before failover 2023-05-30 10:52:22.437686 W | op-mon: mon "e" not found in quorum, waiting for timeout (418 seconds left) before failover 2023-05-30 10:53:57.299028 E | op-osd: failed to update cluster "ocs-storagecluster-cephcluster" Storage. failed to update object "openshift-storage/ocs-storagecluster-cephcluster" status: Timeout: request did not complete within requested timeout - context deadline exceeded 2023-05-30 10:54:07.442420 W | op-mon: failed to check mon health. failed to check for mons to skip reconcile: failed to query mons to skip reconcile: the server was unable to return a response in the time allotted, but may still be processing the request (get deployments.apps) W0530 11:02:19.705258 1 reflector.go:424] github.com/kube-object-storage/lib-bucket-provisioner/pkg/client/informers/externalversions/factory.go:117: failed to list *v1alpha1.ObjectBucket: Get "https://172.30.0.1:443/apis/objectbucket.io/v1alpha1/objectbuckets?resourceVersion=2459685": dial tcp 172.30.0.1:443: connect: connection refused E0530 11:02:19.705314 1 reflector.go:140] github.com/kube-object-storage/lib-bucket-provisioner/pkg/client/informers/externalversions/factory.go:117: Failed to watch *v1alpha1.ObjectBucket: failed to list *v1alpha1.ObjectBucket: Get "https://172.30.0.1:443/apis/objectbucket.io/v1alpha1/objectbuckets?resourceVersion=2459685": dial tcp 172.30.0.1:443: connect: connection refused W0530 11:02:20.528321 1 reflector.go:424] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262: failed to list *v1.Service: Get "https://172.30.0.1:443/api/v1/namespaces/openshift-storage/services?resourceVersion=2459665": dial tcp 172.30.0.1:443: connect: connection refused ``` Most likely some issue with the environment.
What pods are running on the arbiter that is brought down? Since the kube api isn't responding, I wonder if the master node is running the api service, and then it isn't responding. Are the other two OCP master nodes healthy? I wouldn't think taking down the arbiter should affect the api server like this, but it fundamentally does appear to be some OCP issue, and not an ODF issue.
(In reply to Travis Nielsen from comment #7) > What pods are running on the arbiter that is brought down? Since the kube > api isn't responding, I wonder if the master node is running the api > service, and then it isn't responding. Kube api service was running on master node. kube-apiserver-control-plane-0 5/5 Running 0 13h 10.1.160.78 control-plane-0 <none> <none> kube-apiserver-guard-control-plane-0 1/1 Running 0 13h 10.129.0.23 control-plane-0 <none> <none> Are the other two OCP master nodes healthy? -> Yes other OCP nodes were healthy I wouldn't think taking down the arbiter should affect the api > server like this, but it fundamentally does appear to be some OCP issue, and > not an ODF issue.
Moving out of 4.13 since this is a new scenario being tested, and replacing the arbiter is not a common case. The arbiter normally would be brought back online instead of replacing it. We still need to find the RCA, but I am proposing this should not be a 4.13 blocker since it is not a regression, neither is the intermittent failure issue affecting the data plane, nor is the cluster health affected permanently. The rook and ceph pods are all healthy, it is just the csi driver and odf/ocs/noobaa operators affected. Several questions: 1. What pods exactly did you scale down? Why scale them down instead of just deleting the VM? Pods aren't normally scaled down in the node loss scenario. 2. When an OCP master dies, there are necessary steps to recover, for example [1]. Were any steps taken to handle the lost node from OCP master perspective? I would have expected the OCP cluster to continue operating even with the loss of a single master, but want to understand all the procedures related to OCP as well. 3. At what point exactly did the intermittent issues stop occurring? Whether the arbiter mon is online wouldn't cause the intermittent issues described, so I'm wondering what else was happening in the cluster? [1] https://docs.openshift.com/container-platform/4.13/backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.html
Created attachment 1972330 [details] Pods with restarts
In the pods that were restarted (see attached), the vast majority of them are not related to ODF/Rook. They are OCP pods that do not have any dependency on ODF or the arbiter mon. When you are deleting and replace the arbiter node, OCP must be going through a transition phase that results in the instability that takes approximately as long as Rook bringing up the new arbiter mon. This instability is outside of ODF's control. Please perform this test again of deleting the OCP arbiter node even without the stretch cluster installed and see if it results in the same instability. Then we can move this to the OCP team.
@jopinto Hi Any new updates on this BZ?
(In reply to Travis Nielsen from comment #15) > In the pods that were restarted (see attached), the vast majority of them > are not related to ODF/Rook. They are OCP pods that do not have any > dependency on ODF or the arbiter mon. > > When you are deleting and replace the arbiter node, OCP must be going > through a transition phase that results in the instability that takes > approximately as long as Rook bringing up the new arbiter mon. > > This instability is outside of ODF's control. > Please perform this test again of deleting the OCP arbiter node even without > the stretch cluster installed and see if it results in the same instability. > Then we can move this to the OCP team. Similar behaviour was seen with 3M-6W UPI LSO cluster without stretch cluster installed. OCP build: 4.13.0-0.nightly-2023-07-27-013427 ODF build :4.13.0-rhodf provided by Red Hat Upon replacing control-plane-0 node similar behaviour was seen (venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods) (venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods) (venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods) (venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods) (venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods) (venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods) (venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage error: the server doesn't have a resource type "pods" (venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-694d8c5c99-m64bn 1/2 CrashLoopBackOff 6 (4m6s ago) 33m csi-cephfsplugin-8lkjp 2/2 Running 0 27m csi-cephfsplugin-fnbrz 2/2 Running 0 27m csi-cephfsplugin-gz55v 2/2 Running 0 27m csi-cephfsplugin-h2gcr 2/2 Running 0 27m csi-cephfsplugin-provisioner-ffb445b7b-cr2cl 5/5 Running 7 (2m17s ago) 27m csi-cephfsplugin-provisioner-ffb445b7b-drxwc 5/5 Running 5 (2m29s ago) 27m csi-cephfsplugin-vkt9n 2/2 Running 0 27m csi-cephfsplugin-zmkg2 2/2 Running 0 27m csi-rbdplugin-d7fp2 3/3 Running 0 27m csi-rbdplugin-fd8fd 3/3 Running 0 27m csi-rbdplugin-kxcpm 3/3 Running 0 27m csi-rbdplugin-lzwqm 3/3 Running 0 27m csi-rbdplugin-provisioner-679cbbbb45-6v7vw 6/6 Running 1 (7m54s ago) 27m csi-rbdplugin-provisioner-679cbbbb45-gpnzm 6/6 Running 10 (2m5s ago) 27m csi-rbdplugin-wg5zv 3/3 Running 0 27m csi-rbdplugin-z5djs 3/3 Running 0 27m noobaa-core-0 1/1 Running 0 23m noobaa-db-pg-0 1/1 Running 0 23m noobaa-endpoint-5949fc9f8c-zzvm7 1/1 Running 0 22m noobaa-operator-7cb695f787-h647t 1/1 Running 5 (102s ago) 34m ocs-metrics-exporter-568779cf5-n8tjr 1/1 Running 0 34m ocs-operator-84c64c4886-vs5gr 0/1 CrashLoopBackOff 6 (4m6s ago) 34m odf-console-b99979f76-gkf9b 1/1 Running 0 34m odf-operator-controller-manager-fbc65c8bd-pwswm 1/2 CrashLoopBackOff 3 (4m6s ago) 34m rook-ceph-crashcollector-compute-0-5ff97b4bfb-h67dg 1/1 Running 0 23m rook-ceph-crashcollector-compute-1-7b48b7b95-8xlfw 1/1 Running 0 25m rook-ceph-crashcollector-compute-2-6f55d5b9b4-bvpgz 1/1 Running 0 24m rook-ceph-crashcollector-compute-3-6f68545489-h5jcl 1/1 Running 0 24m rook-ceph-crashcollector-compute-4-5cd98b758d-wqqmd 1/1 Running 0 23m rook-ceph-crashcollector-compute-5-66fc646f89-bhcgf 1/1 Running 0 23m rook-ceph-exporter-compute-0-6b9b6c8679-g7qx9 1/1 Running 0 23m rook-ceph-exporter-compute-1-8c74d8668-c7g4d 1/1 Running 0 25m rook-ceph-exporter-compute-2-749bdcc57f-66t7g 1/1 Running 0 24m rook-ceph-exporter-compute-3-6757b5df78-szc85 1/1 Running 0 24m rook-ceph-exporter-compute-4-7bc59766cf-g54w6 1/1 Running 0 23m rook-ceph-exporter-compute-5-bdc7496f4-5csdw 1/1 Running 0 23m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-5d5898455656p 2/2 Running 0 23m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5d4b7d8dt7c2w 2/2 Running 0 23m rook-ceph-mgr-a-6f56c7f994-kzlsw 2/2 Running 0 25m rook-ceph-mon-a-5d46b67459-72sm5 2/2 Running 0 27m rook-ceph-mon-b-7db7659648-2lrqq 2/2 Running 0 26m rook-ceph-mon-c-d86c8b68-pcnhc 2/2 Running 0 26m rook-ceph-operator-657954c756-m7p4d 1/1 Running 0 27m rook-ceph-osd-0-6c6f6f86f7-mb67p 2/2 Running 0 24m rook-ceph-osd-1-b5cc9bccd-rn7m5 2/2 Running 0 24m rook-ceph-osd-2-5d7585c9b-l8fjg 2/2 Running 0 24m rook-ceph-osd-3-d7b978cf5-d24jc 2/2 Running 0 24m rook-ceph-osd-4-5758f767fb-brqsb 2/2 Running 0 24m rook-ceph-osd-5-845fcdcdf4-mxq6m 2/2 Running 0 24m rook-ceph-osd-prepare-119a090453ccd1d897b95b96544caa64-tbzqz 0/1 Completed 0 24m rook-ceph-osd-prepare-3ce9d82fea32a6497eef32b57672fbbb-76l52 0/1 Completed 0 24m rook-ceph-osd-prepare-9f173feba00b33a0c445fbea151f0646-rbhlb 0/1 Completed 0 24m rook-ceph-osd-prepare-cf614c03f16ad7a4f399501ffc38e3c6-97pq4 0/1 Completed 0 24m rook-ceph-osd-prepare-e4d8a76474772f7e1a49f8512ff4c725-6rcbf 0/1 Completed 0 24m rook-ceph-osd-prepare-f1b57a643155c4b5384bd1c21faa0985-tk2xm 0/1 Completed 0 24m rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-5fcddb7tzpl8 2/2 Running 0 23m rook-ceph-tools-78f4964698-pkdlc 1/1 Running 0 24m Eventually(After 5-10 mins) cluster started responding and came back to normal state
What are the next steps for this?
I'll take a look at this soon.