1944284 – [Arbiter] When Performed zone(zone=a) Power off and Power On, 3 mon pod(zone=b,c) goes in CLBO after node Power off and 2 Osd(zone=a) goes in CLBO after node Power on

Bug 1944284 - [Arbiter] When Performed zone(zone=a) Power off and Power On, 3 mon pod(zone=b,c) goes in CLBO after node Power off and 2 Osd(zone=a) goes in CLBO after node Power on

Summary: [Arbiter] When Performed zone(zone=a) Power off and Power On, 3 mon pod(zone=...

Keywords:
Status:	CLOSED DUPLICATE of bug 1944611
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	4.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	5.1
Assignee:	Greg Farnum
QA Contact:	Manohar Murthy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-29 17:01 UTC by Greg Farnum
Modified:	2022-02-21 17:58 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-03-31 16:11:58 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-3209	0	None	None	None	2022-02-21 17:58:49 UTC

Description Greg Farnum 2021-03-29 17:01:45 UTC

This bug was initially created as a copy of Bug #1943596

I am copying this bug because: 



Description of problem (please be detailed as possible and provide log
snippests):
When Performed zone(zone=a) Power off and Power On 3 mon pod(zone=b,c) goes in CLBO after node Power off and 2 Osd(zone=a) goes in CLBO after node Power on

Zone=c is the arbiter zone and it's running over the master node

Version of all relevant components (if applicable):

OCP version:- 4.7.0-0.nightly-2021-03-25-225737
OCS version:- ocs-operator.v4.7.0-322.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
4

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Deploy OCP 4.7 over BM UPI
2. Deploy OCS with arbiter enabled 
3. Write some data using App pods
4. Power off the nodes from zone=a for a long time
5. Check for mon pods status
6. Power on the nodes
7. Check for osd Pods status


Actual results:
oc get pods -o wide                                                                                                                                                               localhost.localdomain: Fri Mar 26 20:05:07 2021

NAME                                                              READY   STATUS             RESTARTS   AGE     IP             NODE                      NOMINATED NODE   READINESS GATES
csi-cephfsplugin-2l657                                            3/3     Running            0          7h14m   10.8.128.205   argo005.ceph.redhat.com   <none>           <none>
csi-cephfsplugin-72wz4                                            3/3     Running            0          7h14m   10.8.128.209   argo009.ceph.redhat.com   <none>           <none>
csi-cephfsplugin-94hf6                                            3/3     Running            0          7h14m   10.8.128.207   argo007.ceph.redhat.com   <none>           <none>
csi-cephfsplugin-hl79c                                            3/3     Running            0          7h14m   10.8.128.206   argo006.ceph.redhat.com   <none>           <none>
csi-cephfsplugin-provisioner-5d8dd85f69-9glbq                     6/6     Running            0          7h14m   10.131.0.17    argo008.ceph.redhat.com   <none>           <none>
csi-cephfsplugin-provisioner-5d8dd85f69-z4pv5                     6/6     Running            0          3h2m    10.128.2.22    argo007.ceph.redhat.com   <none>           <none>
csi-cephfsplugin-t7x8b                                            3/3     Running            0          7h14m   10.8.128.208   argo008.ceph.redhat.com   <none>           <none>
csi-rbdplugin-9f8vf                                               3/3     Running            0          7h14m   10.8.128.207   argo007.ceph.redhat.com   <none>           <none>
csi-rbdplugin-9xsj9                                               3/3     Running            0          7h14m   10.8.128.209   argo009.ceph.redhat.com   <none>           <none>
csi-rbdplugin-bjbnp                                               3/3     Running            0          7h14m   10.8.128.205   argo005.ceph.redhat.com   <none>           <none>
csi-rbdplugin-ljlcn                                               3/3     Running            0          7h14m   10.8.128.206   argo006.ceph.redhat.com   <none>           <none>
csi-rbdplugin-mk7bw                                               3/3     Running            0          7h14m   10.8.128.208   argo008.ceph.redhat.com   <none>           <none>
csi-rbdplugin-provisioner-7b8b6fc678-5c4nc                        6/6     Running            0          3h3m    10.131.0.28    argo008.ceph.redhat.com   <none>           <none>
csi-rbdplugin-provisioner-7b8b6fc678-8s2sd                        6/6     Running            0          7h14m   10.128.2.14    argo007.ceph.redhat.com   <none>           <none>
noobaa-core-0                                                     1/1     Running            0          33m     10.131.0.35    argo008.ceph.redhat.com   <none>           <none>
noobaa-db-pg-0                                                    1/1     Running            0          7h11m   10.128.2.21    argo007.ceph.redhat.com   <none>           <none>
noobaa-endpoint-78486d6bdb-c6qlx                                  1/1     Running            0          3h3m    10.131.0.27    argo008.ceph.redhat.com   <none>           <none>
noobaa-operator-84f54fccbc-95m6z                                  1/1     Running            0          3h2m    10.131.0.34    argo008.ceph.redhat.com   <none>           <none>
ocs-metrics-exporter-75f96b94b7-7grz9                             1/1     Running            0          3h2m    10.131.0.33    argo008.ceph.redhat.com   <none>           <none>
ocs-operator-c95fddb85-fwd96                                      1/1     Running            0          3h2m    10.131.0.30    argo008.ceph.redhat.com   <none>           <none>
rook-ceph-crashcollector-argo004.ceph.redhat.com-687c49477xd89g   1/1     Running            0          7h13m   10.129.0.19    argo004.ceph.redhat.com   <none>           <none>
rook-ceph-crashcollector-argo005.ceph.redhat.com-646ccbc98rwjmv   1/1     Running            0          32m     10.129.2.11    argo005.ceph.redhat.com   <none>           <none>
rook-ceph-crashcollector-argo006.ceph.redhat.com-775cff6d6ljz7j   1/1     Running            0          33m     10.131.2.12    argo006.ceph.redhat.com   <none>           <none>
rook-ceph-crashcollector-argo007.ceph.redhat.com-5cdf8dd85kgcvm   1/1     Running            0          7h13m   10.128.2.17    argo007.ceph.redhat.com   <none>           <none>
rook-ceph-crashcollector-argo008.ceph.redhat.com-5b66bdbdb9fqkv   1/1     Running            0          7h13m   10.131.0.22    argo008.ceph.redhat.com   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-cc497cc927wh9   2/2     Running            0          3h2m    10.131.2.11    argo006.ceph.redhat.com   <none>           <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-866744f66d8tc   2/2     Running            0          7h11m   10.128.2.20    argo007.ceph.redhat.com   <none>           <none>
rook-ceph-mgr-a-54649df9d8-dl68s                                  2/2     Running            2          3h2m    10.131.0.32    argo008.ceph.redhat.com   <none>           <none>
rook-ceph-mon-a-768bd6ddf4-qbnd2                                  2/2     Running            0          3h3m    10.131.2.9     argo006.ceph.redhat.com   <none>           <none>
rook-ceph-mon-b-84864978bf-scg6d                                  2/2     Running            0          3h2m    10.129.2.12    argo005.ceph.redhat.com   <none>           <none>
rook-ceph-mon-c-585dcfdb86-xhkjn                                  2/2     Running            23         7h13m   10.131.0.19    argo008.ceph.redhat.com   <none>           <none>
rook-ceph-mon-d-545775c64f-ch6xb                                  1/2     CrashLoopBackOff   23         7h13m   10.128.2.16    argo007.ceph.redhat.com   <none>           <none>
rook-ceph-mon-e-5f68b597c4-cm28h                                  2/2     Running            27         7h13m   10.129.0.18    argo004.ceph.redhat.com   <none>           <none>
rook-ceph-operator-6b87cc65f9-jpvmc                               1/1     Running            0          7h29m   10.130.2.16    argo009.ceph.redhat.com   <none>           <none>
rook-ceph-osd-0-b54d97cd6-zzgbc                                   2/2     Running            1          7h12m   10.131.0.21    argo008.ceph.redhat.com   <none>           <none>
rook-ceph-osd-1-75bb8cb584-6h9x6                                  2/2     CrashLoopBackOff            13         3h2m    10.129.2.10    argo005.ceph.redhat.com   <none>           <none>
rook-ceph-osd-2-5848bcd794-grs2v                                  2/2     Running            1          7h12m   10.128.2.19    argo007.ceph.redhat.com   <none>           <none>
rook-ceph-osd-3-597f9c66df-q8pct                                  1/2     CrashLoopBackOff   11         3h2m    10.131.2.10    argo006.ceph.redhat.com   <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-localblock-1-data-0l24799sx   0/1     Completed          0          7h12m   10.131.0.20    argo008.ceph.redhat.com   <none>           <none>
rook-ceph-osd-prepare-ocs-deviceset-localblock-2-data-04p5w4d47   0/1     Completed          0          7h12m   10.128.2.18    argo007.ceph.redhat.com   <none>           <none>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-568cc9d9g4cd   2/2     Running            0          3h2m    10.131.2.13    argo006.ceph.redhat.com   <none>           <none>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-b-8cd59754n7sq   2/2     Running            0          7h10m   10.131.0.23    argo008.ceph.redhat.com   <none>           <none>
rook-ceph-tools-5cb7b9df75-pmhdv                                  1/1     Running            0          7h5m    10.8.128.208   argo008.ceph.redhat.com   <none>           <none>

Expected results:
There should not be any pod in CBLO mon,osd should recover after power on and power off 


Additional info:

As per bz https://bugzilla.redhat.com/show_bug.cgi?id=1939617, there were no new mon created which is expected behavior

Comment 1 RHEL Program Management 2021-03-29 17:01:51 UTC

Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 2 Greg Farnum 2021-03-29 17:03:00 UTC

This is just a bad assert I'll remove in a short patch.

Comment 3 Greg Farnum 2021-03-31 16:11:58 UTC


*** This bug has been marked as a duplicate of bug 1944611 ***

Note You need to log in before you can comment on or make changes to this bug.