Bug 2111485

Summary: [RDR] rbd mirror image status keep changing between up+starting_replay and up+stopped
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Pratik Surve <prsurve>
Component: cephAssignee: Ilya Dryomov <idryomov>
ceph sub component: RBD-Mirror QA Contact: Elad <ebenahar>
Status: CLOSED NOTABUG Docs Contact:
Severity: unspecified    
Priority: unspecified CC: amagrawa, bniver, dupadhya, madam, mmuench, mrajanna, muagarwa, ocs-bugs, odf-bz-bot, srangana, vashastr
Version: 4.11   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-16 02:40:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pratik Surve 2022-07-27 11:26:57 UTC
Description of problem (please be detailed as possible and provide log
snippets):

[RDR] rbd mirror image status keep changing between up+starting_replay and up+stopped

Version of all relevant components (if applicable):
OCP version:- 4.11.0-0.nightly-2022-07-19-104004
ODF version:- 4.11.0-123
CEPH version:- ceph version 16.2.8-79.el8cp (b49680b5658a09188897100c9224ae968e0b6c5b) pacific (stable)


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
4

Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Deploy RDR cluster 
2.Keep workload running for 3-4 days
3.perofrm failover of some workload
4.after some time check rbd mirror image status 


Actual results:

Output from cephblockpool 
C1:- http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/pratik/bz/rbd_mirror_flap/27-07-2022_16-40-39/c1_image_status
C2:- http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/pratik/bz/rbd_mirror_flap/27-07-2022_16-40-37/c2_image_status


I have also collected output from the rbd mirror image status cmd

C1:- http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/pratik/bz/rbd_mirror_flap/27-07-2022_16-40-39/c1_image_status_toolbox_rbd_cmd
C2:- http://rhsqe-repo.lab.eng.blr.redhat.com/ocs4qe/pratik/bz/rbd_mirror_flap/27-07-2022_16-40-37/c2_image_status_toolbox_rbd_cmd

Expected results:
There should not be any image status flapping every few seconds


Additional info: