[Arbiter] When Performed zone(zone=a) Power off and Power On, 3 mon pod(zone=b,c) goes in CLBO after node Power off and 2 Osd(zone=a) goes in CLBO after node Power on
This bug was initially created as a copy of Bug #1943596
I am copying this bug because:
Description of problem (please be detailed as possible and provide log
snippests):
When Performed zone(zone=a) Power off and Power On 3 mon pod(zone=b,c) goes in CLBO after node Power off and 2 Osd(zone=a) goes in CLBO after node Power on
Zone=c is the arbiter zone and it's running over the master node
Version of all relevant components (if applicable):
OCP version:- 4.7.0-0.nightly-2021-03-25-225737
OCS version:- ocs-operator.v4.7.0-322.ci
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes
Is there any workaround available to the best of your knowledge?
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
4
Can this issue reproducible?
Yes
Can this issue reproduce from the UI?
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
1. Deploy OCP 4.7 over BM UPI
2. Deploy OCS with arbiter enabled
3. Write some data using App pods
4. Power off the nodes from zone=a for a long time
5. Check for mon pods status
6. Power on the nodes
7. Check for osd Pods status
Actual results:
oc get pods -o wide localhost.localdomain: Fri Mar 26 20:05:07 2021
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
csi-cephfsplugin-2l657 3/3 Running 0 7h14m 10.8.128.205 argo005.ceph.redhat.com <none> <none>
csi-cephfsplugin-72wz4 3/3 Running 0 7h14m 10.8.128.209 argo009.ceph.redhat.com <none> <none>
csi-cephfsplugin-94hf6 3/3 Running 0 7h14m 10.8.128.207 argo007.ceph.redhat.com <none> <none>
csi-cephfsplugin-hl79c 3/3 Running 0 7h14m 10.8.128.206 argo006.ceph.redhat.com <none> <none>
csi-cephfsplugin-provisioner-5d8dd85f69-9glbq 6/6 Running 0 7h14m 10.131.0.17 argo008.ceph.redhat.com <none> <none>
csi-cephfsplugin-provisioner-5d8dd85f69-z4pv5 6/6 Running 0 3h2m 10.128.2.22 argo007.ceph.redhat.com <none> <none>
csi-cephfsplugin-t7x8b 3/3 Running 0 7h14m 10.8.128.208 argo008.ceph.redhat.com <none> <none>
csi-rbdplugin-9f8vf 3/3 Running 0 7h14m 10.8.128.207 argo007.ceph.redhat.com <none> <none>
csi-rbdplugin-9xsj9 3/3 Running 0 7h14m 10.8.128.209 argo009.ceph.redhat.com <none> <none>
csi-rbdplugin-bjbnp 3/3 Running 0 7h14m 10.8.128.205 argo005.ceph.redhat.com <none> <none>
csi-rbdplugin-ljlcn 3/3 Running 0 7h14m 10.8.128.206 argo006.ceph.redhat.com <none> <none>
csi-rbdplugin-mk7bw 3/3 Running 0 7h14m 10.8.128.208 argo008.ceph.redhat.com <none> <none>
csi-rbdplugin-provisioner-7b8b6fc678-5c4nc 6/6 Running 0 3h3m 10.131.0.28 argo008.ceph.redhat.com <none> <none>
csi-rbdplugin-provisioner-7b8b6fc678-8s2sd 6/6 Running 0 7h14m 10.128.2.14 argo007.ceph.redhat.com <none> <none>
noobaa-core-0 1/1 Running 0 33m 10.131.0.35 argo008.ceph.redhat.com <none> <none>
noobaa-db-pg-0 1/1 Running 0 7h11m 10.128.2.21 argo007.ceph.redhat.com <none> <none>
noobaa-endpoint-78486d6bdb-c6qlx 1/1 Running 0 3h3m 10.131.0.27 argo008.ceph.redhat.com <none> <none>
noobaa-operator-84f54fccbc-95m6z 1/1 Running 0 3h2m 10.131.0.34 argo008.ceph.redhat.com <none> <none>
ocs-metrics-exporter-75f96b94b7-7grz9 1/1 Running 0 3h2m 10.131.0.33 argo008.ceph.redhat.com <none> <none>
ocs-operator-c95fddb85-fwd96 1/1 Running 0 3h2m 10.131.0.30 argo008.ceph.redhat.com <none> <none>
rook-ceph-crashcollector-argo004.ceph.redhat.com-687c49477xd89g 1/1 Running 0 7h13m 10.129.0.19 argo004.ceph.redhat.com <none> <none>
rook-ceph-crashcollector-argo005.ceph.redhat.com-646ccbc98rwjmv 1/1 Running 0 32m 10.129.2.11 argo005.ceph.redhat.com <none> <none>
rook-ceph-crashcollector-argo006.ceph.redhat.com-775cff6d6ljz7j 1/1 Running 0 33m 10.131.2.12 argo006.ceph.redhat.com <none> <none>
rook-ceph-crashcollector-argo007.ceph.redhat.com-5cdf8dd85kgcvm 1/1 Running 0 7h13m 10.128.2.17 argo007.ceph.redhat.com <none> <none>
rook-ceph-crashcollector-argo008.ceph.redhat.com-5b66bdbdb9fqkv 1/1 Running 0 7h13m 10.131.0.22 argo008.ceph.redhat.com <none> <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-cc497cc927wh9 2/2 Running 0 3h2m 10.131.2.11 argo006.ceph.redhat.com <none> <none>
rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-866744f66d8tc 2/2 Running 0 7h11m 10.128.2.20 argo007.ceph.redhat.com <none> <none>
rook-ceph-mgr-a-54649df9d8-dl68s 2/2 Running 2 3h2m 10.131.0.32 argo008.ceph.redhat.com <none> <none>
rook-ceph-mon-a-768bd6ddf4-qbnd2 2/2 Running 0 3h3m 10.131.2.9 argo006.ceph.redhat.com <none> <none>
rook-ceph-mon-b-84864978bf-scg6d 2/2 Running 0 3h2m 10.129.2.12 argo005.ceph.redhat.com <none> <none>
rook-ceph-mon-c-585dcfdb86-xhkjn 2/2 Running 23 7h13m 10.131.0.19 argo008.ceph.redhat.com <none> <none>
rook-ceph-mon-d-545775c64f-ch6xb 1/2 CrashLoopBackOff 23 7h13m 10.128.2.16 argo007.ceph.redhat.com <none> <none>
rook-ceph-mon-e-5f68b597c4-cm28h 2/2 Running 27 7h13m 10.129.0.18 argo004.ceph.redhat.com <none> <none>
rook-ceph-operator-6b87cc65f9-jpvmc 1/1 Running 0 7h29m 10.130.2.16 argo009.ceph.redhat.com <none> <none>
rook-ceph-osd-0-b54d97cd6-zzgbc 2/2 Running 1 7h12m 10.131.0.21 argo008.ceph.redhat.com <none> <none>
rook-ceph-osd-1-75bb8cb584-6h9x6 2/2 CrashLoopBackOff 13 3h2m 10.129.2.10 argo005.ceph.redhat.com <none> <none>
rook-ceph-osd-2-5848bcd794-grs2v 2/2 Running 1 7h12m 10.128.2.19 argo007.ceph.redhat.com <none> <none>
rook-ceph-osd-3-597f9c66df-q8pct 1/2 CrashLoopBackOff 11 3h2m 10.131.2.10 argo006.ceph.redhat.com <none> <none>
rook-ceph-osd-prepare-ocs-deviceset-localblock-1-data-0l24799sx 0/1 Completed 0 7h12m 10.131.0.20 argo008.ceph.redhat.com <none> <none>
rook-ceph-osd-prepare-ocs-deviceset-localblock-2-data-04p5w4d47 0/1 Completed 0 7h12m 10.128.2.18 argo007.ceph.redhat.com <none> <none>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-568cc9d9g4cd 2/2 Running 0 3h2m 10.131.2.13 argo006.ceph.redhat.com <none> <none>
rook-ceph-rgw-ocs-storagecluster-cephobjectstore-b-8cd59754n7sq 2/2 Running 0 7h10m 10.131.0.23 argo008.ceph.redhat.com <none> <none>
rook-ceph-tools-5cb7b9df75-pmhdv 1/1 Running 0 7h5m 10.8.128.208 argo008.ceph.redhat.com <none> <none>
Expected results:
There should not be any pod in CBLO mon,osd should recover after power on and power off
Additional info:
As per bz https://bugzilla.redhat.com/show_bug.cgi?id=1939617, there were no new mon created which is expected behavior
Comment 1RHEL Program Management
2021-03-29 17:01:51 UTC