Bug 2211883

Summary: [MDR] After zone failure(c1+h1 cluster) and hub recovery, c1 apps peer ready status is in "Unknown" state
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Parikshith <pbyregow>
Component: odf-drAssignee: Benamar Mekhissi <bmekhiss>
odf-dr sub component: ramen QA Contact: krishnaram Karthick <kramdoss>
Status: ASSIGNED --- Docs Contact:
Severity: low    
Priority: unspecified CC: hnallurv, muagarwa, odf-bz-bot
Version: 4.13   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
After the zone failure and hub recovery, occasionally, the peer ready status of the subscription and appset applications in their disaster recovery placement control (DRPC) is shown as `Unknown`. This is a cosmetic issue and does not impact the regular functionality of Ramen and is limited to the visual appearance of the DRPC output when viewed using the `oc` command. Workaround: Use the YAML output to know the correct status: ---- $ oc get drpc -o yaml ----
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2154341    

Description Parikshith 2023-06-02 13:19:44 UTC
Description of problem (please be detailed as possible and provide log
snippests):
After shutting down zone hosting c1 and h1 cluster and performing hub recovery to h2. Peer Ready status of subscription and appset apps in their drpc is in "Unknown" state.

NAMESPACE          NAME                             AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION      START TIME             DURATION       PEER READY
b-sub-1            b-sub-1-placement-1-drpc         59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
b-sub-2            b-sub-2-placement-1-drpc         59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
cronjob-sub-1      cronjob-sub-1-placement-1-drpc   59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   b-app-1-placement-drpc           59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   b-app-2-placement-drpc           59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   cronjob-app-1-placement-drpc     59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   job-app-1-placement-drpc         59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown

Version of all relevant components (if applicable):
ocp: 4.13.0-0.nightly-2023-05-30-074322
odf/mco: 4.13.0-207
ACM: 2.7.4

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
no

Is there any workaround available to the best of your knowledge?
no

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
4

Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Create 4 OCP clusters such that 2 hubs and 2 managed clusters. And one stretched RHCS cluster.
   Deploy cluster in such a way that
	zone a: arbiter ceph node
	zone b: c1, h1, 3 ceph nodes
	zone c: c2, h2, 3 ceph nodes
2. Configure MDR and deploy applications(appset and subscription) on each managed clusters. Apply drpolicy to all apps.
3. Initiate a backup process, such that the active and passive hubs are in sync
4. Made zone b down, ie c1, h1 and 3 ceph nodes
5. Initiate the restore process on h2
6. Restore succeeded in new-hub, dr policy on h2 in validated state
7. Check drpc of c1 apps


Actual results:
Peer ready status of c1 apps is Unknown

Expected results:
Peer ready status c1 apps should be True 

Additional info:

Comment 4 Mudit Agarwal 2023-06-05 11:47:49 UTC
Not a 4.13 blocker, moving it out

Comment 5 Benamar Mekhissi 2023-06-12 01:31:44 UTC
PR is here: https://github.com/RamenDR/ramen/pull/920