Description of problem (please be detailed as possible and provide log
snippests):
[DR] Volumes get stuck in split-brain after Failover Action is initiated
Version of all relevant components (if applicable):
ODF:- odr-cluster-operator.v4.9.0-164.ci
OCP:- 4.9.0-0.nightly-2021-10-01-202059
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes
Is there any workaround available to the best of your knowledge?
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2
Can this issue reproducible?
yes
Can this issue reproduce from the UI?
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
1. Deploy DR Over 2 OCP cluster
2. Deploy App
3. Perform the failover action
Actual results:
rbd images are in a Split-brain state after failover
Expected results:
There should not be any split-brain case
Additional info:
rbd image satus
{
"lastChecked": "2021-10-06T06:21:49Z",
"summary": {
"daemon_health": "OK",
"health": "ERROR",
"image_health": "ERROR",
"states": {
"error": 6
}
}
}
bash-4.4$ rbd mirror image status ocs-storagecluster-cephblockpool/csi-vol-e21a2369-25e1-11ec-94bc-0a580a8301c5
csi-vol-e21a2369-25e1-11ec-94bc-0a580a8301c5:
global_id: b8bb1ba4-7d03-4a88-aa57-2424112aa2b0
state: up+error
description: split-brain
service: a on vmware-dccp-one-f84rh-worker-hkg99
last_update: 2021-10-06 06:22:04
peer_sites:
name: afc6aaac-199c-472e-bf35-390eb2799b3e
state: up+stopped
description: local image is primary
last_update: 2021-10-06 06:21:44
Description of problem (please be detailed as possible and provide log snippests): [DR] Volumes get stuck in split-brain after Failover Action is initiated Version of all relevant components (if applicable): ODF:- odr-cluster-operator.v4.9.0-164.ci OCP:- 4.9.0-0.nightly-2021-10-01-202059 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? yes Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy DR Over 2 OCP cluster 2. Deploy App 3. Perform the failover action Actual results: rbd images are in a Split-brain state after failover Expected results: There should not be any split-brain case Additional info: rbd image satus { "lastChecked": "2021-10-06T06:21:49Z", "summary": { "daemon_health": "OK", "health": "ERROR", "image_health": "ERROR", "states": { "error": 6 } } } bash-4.4$ rbd mirror image status ocs-storagecluster-cephblockpool/csi-vol-e21a2369-25e1-11ec-94bc-0a580a8301c5 csi-vol-e21a2369-25e1-11ec-94bc-0a580a8301c5: global_id: b8bb1ba4-7d03-4a88-aa57-2424112aa2b0 state: up+error description: split-brain service: a on vmware-dccp-one-f84rh-worker-hkg99 last_update: 2021-10-06 06:22:04 peer_sites: name: afc6aaac-199c-472e-bf35-390eb2799b3e state: up+stopped description: local image is primary last_update: 2021-10-06 06:21:44