Bug 2210242 - [Stretch cluster] Arbiter node replacement results in pod restarts and intermittent unresponsive cluster [NEEDINFO]
Summary: [Stretch cluster] Arbiter node replacement results in pod restarts and interm...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Santosh Pillai
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-26 09:13 UTC by Joy John Pinto
Modified: 2023-08-16 23:00 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
muagarwa: needinfo? (sapillai)


Attachments (Terms of Use)
Pods with restarts (18.53 KB, text/plain)
2023-06-23 21:34 UTC, Travis Nielsen
no flags Details

Description Joy John Pinto 2023-05-26 09:13:15 UTC
Description of problem (please be detailed as possible and provide log
snippests):

[Stretch cluster] Arbiter node replacement results in pod restarts and intermittent unresponsive cluster

Version of all relevant components (if applicable):
4.13.0-0.nightly-2023-05-25-001936 and ODF 4.13.0-203

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
NA

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Install OCP cluster with 6 worker and 4 master nodes 
2.Install local volume operator and add disks on 6 worker nodes
3. Label worker nodes in datazone 1 and 2 and label one amster node as arbiter and rest two in datazone 1 and 2
3. Install ODF and create storage system by choosing 6 worker nodes
4. scale down mon and other pods on the arbiter node
5. Delete the arbiter node through oc command and also delete the VM
6. label the other control-plane-node as arbiter and also add openshift data foundation label to it
7. wait for the new mon to come up on newly added arbiter node


Actual results:
The mon takes around 10-15 minutes to come up on newly added arbiter node and following pods are restarted during the process
csi-addons-controller-manager-5b94568cf8-f4f9g (7 restarts)
csi-cephfsplugin-provisioner-554f966f47-pgr8x (10 restarts)
csi-cephfsplugin-provisioner-554f966f47-zks9h (2 restarts)
csi-rbdplugin-provisioner-7999676974-fqf77 (11 restarts)
csi-rbdplugin-provisioner-7999676974-wk27s (1 restart)
noobaa-operator-5946d77759-d9dl8 (10 restart)
ocs-operator-776f898d4-82r4w (7 restart)
odf-operator-controller-manager-7958d76ddc-j4k9r (3 restart)

Also during the process intermittently cluster becomes inaccessible while running oc commands

rook-ceph-osd-prepare-63b2f492a51089f46df627dd0cfee0ba-fqkvf      0/1     Completed   0               22m   10.131.2.18    compute-3   <none>           <none>
rook-ceph-osd-prepare-d0fd414b7d2c277941386eceb4897361-zsxl2      0/1     Completed   0               22m   10.129.4.25    compute-4   <none>           <none>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-5769db78f7vt   2/2     Running     0               21m   10.128.2.29    compute-0   <none>           <none>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-5769db7jwp49   2/2     Running     0               21m   10.129.2.19    compute-5   <none>           <none>
rook-ceph-tools-5845b7c568-fvsz8                                  1/1     Running     0               23m   10.128.4.27    compute-1   <none>           <none>
[jopinto@jopinto nodes]$ oc get pods -o wide
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods)
[jopinto@jopinto nodes]$ oc get pods -o wide
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods)
[jopinto@jopinto nodes]$ oc get pods -o wide
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods)
[jopinto@jopinto nodes]$ oc get pods -o wide
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods)
[jopinto@jopinto nodes]$ oc get pods -o wide
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods)
[jopinto@jopinto nodes]$ oc get pods -o wide
NAME                                                              READY   STATUS      RESTARTS        AGE   IP             NODE        NOMINATED NODE   READINESS GATES
csi-addons-controller-manager-5b94568cf8-f4f9g                    2/2     Running     5 (6m40s ago)   33m   10.128.4.22    compute-1   <none>           <none>
csi-cephfsplugin-6ms62                                            2/2     Running     0               27m   10.1.161.85    compute-4   <none>           <none>
csi-cephfsplugin-9gxlz                                            2/2     Running     0               27m   10.1.160.245   compute-0   <none>           <none>
csi-cephfsplugin-g4nwv                                            2/2     Running     0               27m   10.1.161.82    compute-2   <none>           <none>
csi-cephfsplugin-jtspt                                            2/2     Running     0               27m   10.1.161.16    compute-3   <none>           <none>
csi-cephfsplugin-provisioner-554f966f47-pgr8x                     5/5     Running     4 (8m10s ago)   27m   10.129.2.12    compute-5   <none>           <none>
csi-cephfsplugin-provisioner-554f966f47-zks9h                     5/5     Running     0               27m   10.128.2.27    compute-0   <none>           <none>
csi-cephfsplugin-sh9bh                                            2/2     Running     0               27m   10.1.161.21    compute-5   <none>           <none>
csi-cephfsplugin-wfqn7                                            2/2     Running     0               27m   10.1.160.253   compute-1   <none>           <none>
csi-rbdplugin-8cx7v                                               3/3     Running     0               27m   10.1.161.82    compute-2   <none>           <none>
csi-rbdplugin-9q742                                               3/3     Running     0               27m   10.1.161.85    compute-4   <none>           <none>
csi-rbdplugin-fztvp                                               3/3     Running     0               27m   10.1.161.21    compute-5   <none>           <none>
csi-rbdplugin-gfhqq                                               3/3     Running     0               27m   10.1.160.253   compute-1   <none>           <none>
csi-rbdplugin-lg6rq                                               3/3     Running     0               27m   10.1.161.16    compute-3   <none>           <none>
csi-rbdplugin-provisioner-7999676974-fqf77                        6/6     Running     4 (8m1s ago)    27m   10.129.4.20    compute-4   <none>           <none>
csi-rbdplugin-provisioner-7999676974-wk27s                        6/6     Running     0               27m   10.128.4.26    compute-1   <none>           <none>
csi-rbdplugin-zq8jj                                               3/3     Running     0               27m   10.1.160.245   compute-0   <none>           <none>
noobaa-core-0                                                     1/1     Running     0               24m   10.128.2.32    compute-0   <none>           <none>
noobaa-db-pg-0                                             

Also node topology in storage cluster yaml is changes post arbiter node change
Before arbiter node replacemnet:
    kmsServerConnection: {}
    nodeTopologies:
      labels:
        kubernetes.io/hostname:
        - compute-0
        - compute-1
        - compute-2
        - compute-3
        - compute-4
        - compute-5
        topology.kubernetes.io/zone:
        - data-1
        - data-2
    phase: Ready
    relatedObjects:
    - apiVersion: ceph.rook.io/v1
      kind: CephCluster
      name: ocs-storagecluster-cephcluster

After arbiter node replacemnet:
    kmsServerConnection: {}
    nodeTopologies:
      labels:
        kubernetes.io/hostname:
        - compute-0
        - compute-1
        - compute-2
        - compute-3
        - compute-4
        - compute-5
        - control-plane-3
        topology.kubernetes.io/zone:
        - arbiter
        - data-1
        - data-2
    phase: Ready
    relatedObjects:
    - apiVersion: ceph.rook.io/v1
      kind: CephCluster


Expected results:
Restarts should not be seen and cluster should be accessible

Additional info:
Logs of all pods that were restarted are placed in http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-arbiternode/

[jopinto@jopinto nodes]$ oc get pods -o wide
NAME                                                              READY   STATUS      RESTARTS       AGE     IP             NODE              NOMINATED NODE   READINESS GATES
csi-addons-controller-manager-5b94568cf8-f4f9g                    2/2     Running     7 (12m ago)    49m     10.128.4.22    compute-1         <none>           <none>
csi-cephfsplugin-6ms62                                            2/2     Running     0              43m     10.1.161.85    compute-4         <none>           <none>
csi-cephfsplugin-9gxlz                                            2/2     Running     0              43m     10.1.160.245   compute-0         <none>           <none>
csi-cephfsplugin-g4nwv                                            2/2     Running     0              43m     10.1.161.82    compute-2         <none>           <none>
csi-cephfsplugin-jtspt                                            2/2     Running     0              43m     10.1.161.16    compute-3         <none>           <none>
csi-cephfsplugin-provisioner-554f966f47-pgr8x                     5/5     Running     10 (10m ago)   43m     10.129.2.12    compute-5         <none>           <none>
csi-cephfsplugin-provisioner-554f966f47-zks9h                     5/5     Running     2 (10m ago)    43m     10.128.2.27    compute-0         <none>           <none>
csi-cephfsplugin-sh9bh                                            2/2     Running     0              43m     10.1.161.21    compute-5         <none>           <none>
csi-cephfsplugin-wfqn7                                            2/2     Running     0              43m     10.1.160.253   compute-1         <none>           <none>
csi-rbdplugin-8cx7v                                               3/3     Running     0              43m     10.1.161.82    compute-2         <none>           <none>
csi-rbdplugin-9q742                                               3/3     Running     0              43m     10.1.161.85    compute-4         <none>           <none>
csi-rbdplugin-fztvp                                               3/3     Running     0              43m     10.1.161.21    compute-5         <none>           <none>
csi-rbdplugin-gfhqq                                               3/3     Running     0              43m     10.1.160.253   compute-1         <none>           <none>
csi-rbdplugin-lg6rq                                               3/3     Running     0              43m     10.1.161.16    compute-3         <none>           <none>
csi-rbdplugin-provisioner-7999676974-fqf77                        6/6     Running     11 (10m ago)   43m     10.129.4.20    compute-4         <none>           <none>
csi-rbdplugin-provisioner-7999676974-wk27s                        6/6     Running     1 (16m ago)    43m     10.128.4.26    compute-1         <none>           <none>
csi-rbdplugin-zq8jj                                               3/3     Running     0              43m     10.1.160.245   compute-0         <none>           <none>
noobaa-core-0                                                     1/1     Running     0              40m     10.128.2.32    compute-0         <none>           <none>
noobaa-db-pg-0                                                    1/1     Running     0              40m     10.131.2.21    compute-3         <none>           <none>
noobaa-endpoint-69bdfdcc8-9x58x                                   1/1     Running     0              39m     10.128.2.36    compute-0         <none>           <none>
noobaa-operator-5946d77759-d9dl8                                  1/1     Running     7 (11m ago)    50m     10.131.2.11    compute-3         <none>           <none>
ocs-metrics-exporter-988d5648-gb22d                               1/1     Running     0              49m     10.128.4.21    compute-1         <none>           <none>
ocs-operator-776f898d4-82r4w                                      1/1     Running     7 (12m ago)    49m     10.128.4.20    compute-1         <none>           <none>
odf-console-85c7c76fb-7k8f5                                       1/1     Running     0              49m     10.129.4.16    compute-4         <none>           <none>
odf-operator-controller-manager-7958d76ddc-j4k9r                  2/2     Running     3 (12m ago)    49m     10.128.2.24    compute-0         <none>           <none>
rook-ceph-crashcollector-compute-0-548ffc87cd-twhw4               1/1     Running     0              40m     10.128.2.30    compute-0         <none>           <none>
rook-ceph-crashcollector-compute-1-7f84777ff-qb7rf                1/1     Running     0              41m     10.128.4.29    compute-1         <none>           <none>
rook-ceph-crashcollector-compute-2-666849d6b4-plk9c               1/1     Running     0              40m     10.130.2.27    compute-2         <none>           <none>
rook-ceph-crashcollector-compute-3-74f749855f-xv767               1/1     Running     0              41m     10.131.2.16    compute-3         <none>           <none>
rook-ceph-crashcollector-compute-4-5ff6d6bc88-5t6tz               1/1     Running     0              41m     10.129.4.23    compute-4         <none>           <none>
rook-ceph-crashcollector-compute-5-7c674b8559-zcxsn               1/1     Running     0              40m     10.129.2.20    compute-5         <none>           <none>
rook-ceph-crashcollector-control-plane-3-5cf7f6bbc9-7djlx         1/1     Running     0              3m44s   10.131.0.8     control-plane-3   <none>           <none>
rook-ceph-exporter-compute-0-59db8bf49f-pvf5n                     1/1     Running     0              40m     10.128.2.31    compute-0         <none>           <none>
rook-ceph-exporter-compute-1-59566cb48b-c9w2g                     1/1     Running     0              41m     10.128.4.30    compute-1         <none>           <none>
rook-ceph-exporter-compute-2-56f55dd8f7-rqcd5                     1/1     Running     0              40m     10.130.2.28    compute-2         <none>           <none>
rook-ceph-exporter-compute-3-5bbb487f5d-9gghn                     1/1     Running     0              41m     10.131.2.17    compute-3         <none>           <none>
rook-ceph-exporter-compute-4-7b4dcf57d9-9gvgx                     1/1     Running     0              41m     10.129.4.24    compute-4         <none>           <none>
rook-ceph-exporter-compute-5-65c98c4446-mdgx2                     1/1     Running     0              40m     10.129.2.21    compute-5         <none>           <none>
rook-ceph-exporter-control-plane-3-854bc78f4-z7lq6                1/1     Running     0              3m44s   10.131.0.9     control-plane-3   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-7c4df569vfd5n   2/2     Running     0              40m     10.128.4.33    compute-1         <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6b9bddd9hjch9   2/2     Running     0              40m     10.130.2.26    compute-2         <none>           <none>
rook-ceph-mgr-a-6df8cf9876-l8t4f                                  3/3     Running     0              41m     10.130.2.19    compute-2         <none>           <none>
rook-ceph-mgr-b-7885b7cc9f-sxldl                                  3/3     Running     0              41m     10.128.4.28    compute-1         <none>           <none>
rook-ceph-mon-a-85df56bb6f-4zqnv                                  2/2     Running     0              42m     10.129.4.22    compute-4         <none>           <none>
rook-ceph-mon-b-865c9fcbb-r5f7m                                   2/2     Running     0              42m     10.130.2.18    compute-2         <none>           <none>
rook-ceph-mon-c-d9f4f5777-cgxgj                                   2/2     Running     0              42m     10.131.2.15    compute-3         <none>           <none>
rook-ceph-mon-d-55d95f6f6b-pbgsh                                  2/2     Running     0              42m     10.129.2.14    compute-5         <none>           <none>
rook-ceph-mon-f-75c8cb5d89-xfdvb                                  2/2     Running     0              4m14s   10.131.0.7     control-plane-3   <none>           <none>
rook-ceph-operator-7477d84999-ndfgh                               1/1     Running     0              43m     10.129.4.19    compute-4         <none>           <none>
rook-ceph-osd-0-86f4cfcf76-6hrqp                                  2/2     Running     0              41m     10.129.2.18    compute-5         <none>           <none>
rook-ceph-osd-1-f59f54664-xwk78                                   2/2     Running     0              41m     10.130.2.23    compute-2         <none>           <none>
rook-ceph-osd-2-988566f77-jc2j9                                   2/2     Running     0              41m     10.131.2.19    compute-3         <none>           <none>
rook-ceph-osd-3-f4bb5b998-f45xh                                   2/2     Running     0              41m     10.129.4.26    compute-4         <none>           <none>
rook-ceph-osd-prepare-28572a8ce9ae9bb114424170be420dfc-92rrq      0/1     Completed   0              41m     10.130.2.22    compute-2         <none>           <none>
rook-ceph-osd-prepare-455dc68f9e2aa40ba2f2c0c2c20c8ac6-cztb9      0/1     Completed   0              41m     10.129.2.17    compute-5         <none>           <none>
rook-ceph-osd-prepare-63b2f492a51089f46df627dd0cfee0ba-fqkvf      0/1     Completed   0              41m     10.131.2.18    compute-3         <none>           <none>
rook-ceph-osd-prepare-d0fd414b7d2c277941386eceb4897361-zsxl2      0/1     Completed   0              41m     10.129.4.25    compute-4         <none>           <none>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-5769db78f7vt   2/2     Running     0              40m     10.128.2.29    compute-0         <none>           <none>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-5769db7jwp49   2/2     Running     0              40m     10.129.2.19    compute-5         <none>           <none>
rook-ceph-tools-5845b7c568-fvsz8                                  1/1     Running     0              42m     10.128.4.27    compute-1         <none>           <none>
[jopinto@jopinto nodes]$ oc rsh rook-ceph-tools-5845b7c568-fvsz8

Comment 3 Santosh Pillai 2023-05-30 06:52:37 UTC
Please add odf must gather to this BZ.

Comment 6 Santosh Pillai 2023-05-30 16:15:48 UTC
From rook logs:

```
2023-05-30 10:50:06.213851 W | op-mon: mon "e" not found in quorum, waiting for timeout (554 seconds left) before failover
2023-05-30 10:50:51.624839 W | op-mon: mon "e" not found in quorum, waiting for timeout (509 seconds left) before failover
2023-05-30 10:51:37.032913 W | op-mon: mon "e" not found in quorum, waiting for timeout (463 seconds left) before failover
2023-05-30 10:52:22.437686 W | op-mon: mon "e" not found in quorum, waiting for timeout (418 seconds left) before failover
2023-05-30 10:53:57.299028 E | op-osd: failed to update cluster "ocs-storagecluster-cephcluster" Storage. failed to update object "openshift-storage/ocs-storagecluster-cephcluster" status: Timeout: request did not complete within requested timeout - context deadline exceeded
2023-05-30 10:54:07.442420 W | op-mon: failed to check mon health. failed to check for mons to skip reconcile: failed to query mons to skip reconcile: the server was unable to return a response in the time allotted, but may still be processing the request (get deployments.apps)

W0530 11:02:19.705258       1 reflector.go:424] github.com/kube-object-storage/lib-bucket-provisioner/pkg/client/informers/externalversions/factory.go:117: failed to list *v1alpha1.ObjectBucket: Get "https://172.30.0.1:443/apis/objectbucket.io/v1alpha1/objectbuckets?resourceVersion=2459685": dial tcp 172.30.0.1:443: connect: connection refused
E0530 11:02:19.705314       1 reflector.go:140] github.com/kube-object-storage/lib-bucket-provisioner/pkg/client/informers/externalversions/factory.go:117: Failed to watch *v1alpha1.ObjectBucket: failed to list *v1alpha1.ObjectBucket: Get "https://172.30.0.1:443/apis/objectbucket.io/v1alpha1/objectbuckets?resourceVersion=2459685": dial tcp 172.30.0.1:443: connect: connection refused
W0530 11:02:20.528321       1 reflector.go:424] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262: failed to list *v1.Service: Get "https://172.30.0.1:443/api/v1/namespaces/openshift-storage/services?resourceVersion=2459665": dial tcp 172.30.0.1:443: connect: connection refused
```

Most likely some issue with the environment.

Comment 7 Travis Nielsen 2023-06-01 23:21:21 UTC
What pods are running on the arbiter that is brought down? Since the kube api isn't responding, I wonder if the master node is running the api service, and then it isn't responding. Are the other two OCP master nodes healthy? I wouldn't think taking down the arbiter should affect the api server like this, but it fundamentally does appear to be some OCP issue, and not an ODF issue.

Comment 8 Joy John Pinto 2023-06-05 04:27:33 UTC
(In reply to Travis Nielsen from comment #7)
> What pods are running on the arbiter that is brought down? Since the kube
> api isn't responding, I wonder if the master node is running the api
> service, and then it isn't responding. 
Kube api service was running on master node. 
kube-apiserver-control-plane-0         5/5     Running     0          13h   10.1.160.78   control-plane-0   <none>           <none>
kube-apiserver-guard-control-plane-0   1/1     Running     0          13h   10.129.0.23   control-plane-0   <none>           <none>


Are the other two OCP master nodes healthy? -> Yes other OCP nodes were healthy

I wouldn't think taking down the arbiter should affect the api
> server like this, but it fundamentally does appear to be some OCP issue, and
> not an ODF issue.

Comment 9 Travis Nielsen 2023-06-05 19:16:10 UTC
Moving out of 4.13 since this is a new scenario being tested, and replacing the arbiter is not a common case. The arbiter normally would be brought back online instead of replacing it. We still need to find the RCA, but I am proposing this should not be a 4.13 blocker since it is not a regression, neither is the intermittent failure issue affecting the data plane, nor is the cluster health affected permanently. The rook and ceph pods are all healthy, it is just the csi driver and odf/ocs/noobaa operators affected.

Several questions:
1. What pods exactly did you scale down? Why scale them down instead of just deleting the VM? Pods aren't normally scaled down in the node loss scenario.
2. When an OCP master dies, there are necessary steps to recover, for example [1]. Were any steps taken to handle the lost node from OCP master perspective? 
I would have expected the OCP cluster to continue operating even with the loss of a single master, but want to understand all the procedures related to OCP as well.
3. At what point exactly did the intermittent issues stop occurring? Whether the arbiter mon is online wouldn't cause the intermittent issues described, so I'm wondering what else was happening in the cluster? 

[1] https://docs.openshift.com/container-platform/4.13/backup_and_restore/control_plane_backup_and_restore/replacing-unhealthy-etcd-member.html

Comment 14 Travis Nielsen 2023-06-23 21:34:53 UTC
Created attachment 1972330 [details]
Pods with restarts

Comment 15 Travis Nielsen 2023-06-23 21:40:32 UTC
In the pods that were restarted (see attached), the vast majority of them are not related to ODF/Rook. They are OCP pods that do not have any dependency on ODF or the arbiter mon. 

When you are deleting and replace the arbiter node, OCP must be going through a transition phase that results in the instability that takes approximately as long as Rook bringing up the new arbiter mon.

This instability is outside of ODF's control. 
Please perform this test again of deleting the OCP arbiter node even without the stretch cluster installed and see if it results in the same instability. Then we can move this to the OCP team.

Comment 16 Santosh Pillai 2023-07-18 10:09:05 UTC
@jopinto  
Hi Any new updates on this BZ?

Comment 17 Joy John Pinto 2023-07-27 16:34:09 UTC
(In reply to Travis Nielsen from comment #15)
> In the pods that were restarted (see attached), the vast majority of them
> are not related to ODF/Rook. They are OCP pods that do not have any
> dependency on ODF or the arbiter mon. 
> 
> When you are deleting and replace the arbiter node, OCP must be going
> through a transition phase that results in the instability that takes
> approximately as long as Rook bringing up the new arbiter mon.
> 
> This instability is outside of ODF's control. 
> Please perform this test again of deleting the OCP arbiter node even without
> the stretch cluster installed and see if it results in the same instability.
> Then we can move this to the OCP team.

Similar behaviour was seen with 3M-6W UPI LSO cluster without stretch cluster installed.

OCP build: 4.13.0-0.nightly-2023-07-27-013427
ODF build :4.13.0-rhodf provided by Red Hat

Upon replacing control-plane-0 node similar behaviour was seen

(venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods)
(venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods)
(venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods)
(venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods)
(venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods)
(venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get pods)
(venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage
error: the server doesn't have a resource type "pods"
(venv) [jopinto@jopinto auth]$ oc get pods -n openshift-storage
NAME                                                              READY   STATUS             RESTARTS        AGE
csi-addons-controller-manager-694d8c5c99-m64bn                    1/2     CrashLoopBackOff   6 (4m6s ago)    33m
csi-cephfsplugin-8lkjp                                            2/2     Running            0               27m
csi-cephfsplugin-fnbrz                                            2/2     Running            0               27m
csi-cephfsplugin-gz55v                                            2/2     Running            0               27m
csi-cephfsplugin-h2gcr                                            2/2     Running            0               27m
csi-cephfsplugin-provisioner-ffb445b7b-cr2cl                      5/5     Running            7 (2m17s ago)   27m
csi-cephfsplugin-provisioner-ffb445b7b-drxwc                      5/5     Running            5 (2m29s ago)   27m
csi-cephfsplugin-vkt9n                                            2/2     Running            0               27m
csi-cephfsplugin-zmkg2                                            2/2     Running            0               27m
csi-rbdplugin-d7fp2                                               3/3     Running            0               27m
csi-rbdplugin-fd8fd                                               3/3     Running            0               27m
csi-rbdplugin-kxcpm                                               3/3     Running            0               27m
csi-rbdplugin-lzwqm                                               3/3     Running            0               27m
csi-rbdplugin-provisioner-679cbbbb45-6v7vw                        6/6     Running            1 (7m54s ago)   27m
csi-rbdplugin-provisioner-679cbbbb45-gpnzm                        6/6     Running            10 (2m5s ago)   27m
csi-rbdplugin-wg5zv                                               3/3     Running            0               27m
csi-rbdplugin-z5djs                                               3/3     Running            0               27m
noobaa-core-0                                                     1/1     Running            0               23m
noobaa-db-pg-0                                                    1/1     Running            0               23m
noobaa-endpoint-5949fc9f8c-zzvm7                                  1/1     Running            0               22m
noobaa-operator-7cb695f787-h647t                                  1/1     Running            5 (102s ago)    34m
ocs-metrics-exporter-568779cf5-n8tjr                              1/1     Running            0               34m
ocs-operator-84c64c4886-vs5gr                                     0/1     CrashLoopBackOff   6 (4m6s ago)    34m
odf-console-b99979f76-gkf9b                                       1/1     Running            0               34m
odf-operator-controller-manager-fbc65c8bd-pwswm                   1/2     CrashLoopBackOff   3 (4m6s ago)    34m
rook-ceph-crashcollector-compute-0-5ff97b4bfb-h67dg               1/1     Running            0               23m
rook-ceph-crashcollector-compute-1-7b48b7b95-8xlfw                1/1     Running            0               25m
rook-ceph-crashcollector-compute-2-6f55d5b9b4-bvpgz               1/1     Running            0               24m
rook-ceph-crashcollector-compute-3-6f68545489-h5jcl               1/1     Running            0               24m
rook-ceph-crashcollector-compute-4-5cd98b758d-wqqmd               1/1     Running            0               23m
rook-ceph-crashcollector-compute-5-66fc646f89-bhcgf               1/1     Running            0               23m
rook-ceph-exporter-compute-0-6b9b6c8679-g7qx9                     1/1     Running            0               23m
rook-ceph-exporter-compute-1-8c74d8668-c7g4d                      1/1     Running            0               25m
rook-ceph-exporter-compute-2-749bdcc57f-66t7g                     1/1     Running            0               24m
rook-ceph-exporter-compute-3-6757b5df78-szc85                     1/1     Running            0               24m
rook-ceph-exporter-compute-4-7bc59766cf-g54w6                     1/1     Running            0               23m
rook-ceph-exporter-compute-5-bdc7496f4-5csdw                      1/1     Running            0               23m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-5d5898455656p   2/2     Running            0               23m
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-5d4b7d8dt7c2w   2/2     Running            0               23m
rook-ceph-mgr-a-6f56c7f994-kzlsw                                  2/2     Running            0               25m
rook-ceph-mon-a-5d46b67459-72sm5                                  2/2     Running            0               27m
rook-ceph-mon-b-7db7659648-2lrqq                                  2/2     Running            0               26m
rook-ceph-mon-c-d86c8b68-pcnhc                                    2/2     Running            0               26m
rook-ceph-operator-657954c756-m7p4d                               1/1     Running            0               27m
rook-ceph-osd-0-6c6f6f86f7-mb67p                                  2/2     Running            0               24m
rook-ceph-osd-1-b5cc9bccd-rn7m5                                   2/2     Running            0               24m
rook-ceph-osd-2-5d7585c9b-l8fjg                                   2/2     Running            0               24m
rook-ceph-osd-3-d7b978cf5-d24jc                                   2/2     Running            0               24m
rook-ceph-osd-4-5758f767fb-brqsb                                  2/2     Running            0               24m
rook-ceph-osd-5-845fcdcdf4-mxq6m                                  2/2     Running            0               24m
rook-ceph-osd-prepare-119a090453ccd1d897b95b96544caa64-tbzqz      0/1     Completed          0               24m
rook-ceph-osd-prepare-3ce9d82fea32a6497eef32b57672fbbb-76l52      0/1     Completed          0               24m
rook-ceph-osd-prepare-9f173feba00b33a0c445fbea151f0646-rbhlb      0/1     Completed          0               24m
rook-ceph-osd-prepare-cf614c03f16ad7a4f399501ffc38e3c6-97pq4      0/1     Completed          0               24m
rook-ceph-osd-prepare-e4d8a76474772f7e1a49f8512ff4c725-6rcbf      0/1     Completed          0               24m
rook-ceph-osd-prepare-f1b57a643155c4b5384bd1c21faa0985-tk2xm      0/1     Completed          0               24m
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-5fcddb7tzpl8   2/2     Running            0               23m
rook-ceph-tools-78f4964698-pkdlc                                  1/1     Running            0               24m

Eventually(After 5-10 mins) cluster started responding and came back to normal state

Comment 19 Mudit Agarwal 2023-08-08 05:33:36 UTC
What are the next steps for this?

Comment 20 Santosh Pillai 2023-08-10 05:06:00 UTC
I'll take a look at this soon.


Note You need to log in before you can comment on or make changes to this bug.