Bug 2211883

Summary:	[MDR] After zone failure(c1+h1 cluster) and hub recovery, c1 apps peer ready status is in "Unknown" state
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	Parikshith <pbyregow>
Component:	odf-dr	Assignee:	Benamar Mekhissi <bmekhiss>
odf-dr sub component:	ramen	QA Contact:	Shrivaibavi Raghaventhiran <sraghave>
Status:	CLOSED CURRENTRELEASE	Docs Contact:
Severity:	low
Priority:	unspecified	CC:	bmekhiss, hnallurv, kseeger, muagarwa, odf-bz-bot
Version:	4.13
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Known Issue
Doc Text:	After the zone failure and hub recovery, occasionally, the peer ready status of the subscription and appset applications in their disaster recovery placement control (DRPC) is shown as `Unknown`. This is a cosmetic issue and does not impact the regular functionality of Ramen and is limited to the visual appearance of the DRPC output when viewed using the `oc` command. Workaround: Use the YAML output to know the correct status: ---- $ oc get drpc -o yaml ----	Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-12-07 17:19:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2154341

Description Parikshith 2023-06-02 13:19:44 UTC

Description of problem (please be detailed as possible and provide log
snippests):
After shutting down zone hosting c1 and h1 cluster and performing hub recovery to h2. Peer Ready status of subscription and appset apps in their drpc is in "Unknown" state.

NAMESPACE          NAME                             AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION      START TIME             DURATION       PEER READY
b-sub-1            b-sub-1-placement-1-drpc         59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
b-sub-2            b-sub-2-placement-1-drpc         59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
cronjob-sub-1      cronjob-sub-1-placement-1-drpc   59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   b-app-1-placement-drpc           59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   b-app-2-placement-drpc           59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   cronjob-app-1-placement-drpc     59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   job-app-1-placement-drpc         59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown

Version of all relevant components (if applicable):
ocp: 4.13.0-0.nightly-2023-05-30-074322
odf/mco: 4.13.0-207
ACM: 2.7.4

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
no

Is there any workaround available to the best of your knowledge?
no

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
4

Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Create 4 OCP clusters such that 2 hubs and 2 managed clusters. And one stretched RHCS cluster.
   Deploy cluster in such a way that
	zone a: arbiter ceph node
	zone b: c1, h1, 3 ceph nodes
	zone c: c2, h2, 3 ceph nodes
2. Configure MDR and deploy applications(appset and subscription) on each managed clusters. Apply drpolicy to all apps.
3. Initiate a backup process, such that the active and passive hubs are in sync
4. Made zone b down, ie c1, h1 and 3 ceph nodes
5. Initiate the restore process on h2
6. Restore succeeded in new-hub, dr policy on h2 in validated state
7. Check drpc of c1 apps


Actual results:
Peer ready status of c1 apps is Unknown

Expected results:
Peer ready status c1 apps should be True 

Additional info:

Comment 4 Mudit Agarwal 2023-06-05 11:47:49 UTC

Not a 4.13 blocker, moving it out

Comment 5 Benamar Mekhissi 2023-06-12 01:31:44 UTC

PR is here: https://github.com/RamenDR/ramen/pull/920

Comment 8 Karolin Seeger 2023-09-07 12:26:14 UTC

PR is merged, moving it to ON_QA.

Comment 9 Shrivaibavi Raghaventhiran 2023-10-20 12:04:32 UTC

Tested versions:
----------------
OCP - 4.14.0-0.nightly-2023-10-08-220853
ODF - 4.14.0-146.stable
ACM - 2.9.0-180

Test Steps:
------------
1. Create 4 OCP clusters such that 2 hubs and 2 managed clusters. And one stretched RHCS cluster.
   Deploy cluster in such a way that
	zone a: arbiter ceph node
	zone b: c1, h1, 3 ceph nodes
	zone c: c2, h2, 3 ceph nodes
2. Configure MDR and deploy applications(appset and subscription) on each managed clusters. Apply drpolicy to all apps.
3. Initiate a backup process, such that the active and passive hubs are in sync
4. Made zone b down, ie c1, h1 and 3 ceph nodes
5. Initiate the restore process on h2
6. Restore succeeded in new-hub, dr policy on h2 in validated state
7. Check drpc of c1 apps

Validation:
------------
Peer ready status of apps is displayed as True/false not an unknown after hub recovery

DRPC O/P:
---------
sraghave:~$ oc get drpc -A -o wide
NAMESPACE          NAME                             AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION           PEER READY
cephfs1            cephfs1-placement-3-drpc         18h   sraghave-c1-oct    sraghave-c2-oct   Failover       FailedOver     Completed     2023-10-19T19:00:01Z   24m4.988864425s    True
cephfs2            cephfs2-placement-3-drpc         18h   sraghave-c2-oct                                     Deployed       Completed                                               True
daemonset1         daemonset1-placement-3-drpc      18h   sraghave-c1-oct    sraghave-c2-oct   Failover       FailedOver     Completed     2023-10-19T19:00:24Z   22m42.043971686s   True
deployment1        deployment1-placement-3-drpc     16h   sraghave-c1-oct                                     Deployed       Completed                                               True
openshift-gitops   cephfs-appset1-placement-drpc    18h   sraghave-c1-oct    sraghave-c2-oct   Failover       FailedOver     Completed                                               True
openshift-gitops   cephfs-placement-drpc            18h   sraghave-c2-oct                                     Deployed       Completed                                               True
openshift-gitops   cephfs1-app-placement-drpc       18h   sraghave-c1-oct                                     Deployed       Completed                                               True
openshift-gitops   cephfs2-app-placement-drpc       18h   sraghave-c2-oct                                     Deployed       Completed                                               True
openshift-gitops   deployment1-app-placement-drpc   18h   sraghave-c1-oct                                     Deployed       Completed                                               True
openshift-gitops   deployment2-app-placement-drpc   18h   sraghave-c2-oct                                     Deployed       Completed                                               True
openshift-gitops   hello-appsets1-placement-drpc    18h   sraghave-c1-oct    sraghave-c2-oct   Failover       FailedOver     Completed                                               True
openshift-gitops   hello1-app-placement-drpc        18h   sraghave-c1-oct                                     Deployed       Completed                                               True
openshift-gitops   hello2-app-placement-drpc        18h   sraghave-c2-oct                                     Deployed       Completed                                               True
openshift-gitops   helloworld-placement-drpc        18h   sraghave-c2-oct                                     Deployed       Completed                                               True
openshift-gitops   rbd-appset1-placement-drpc       18h   sraghave-c1-oct    sraghave-c2-oct   Failover       FailedOver     Completed                                               True
openshift-gitops   rbd-placement-drpc               18h   sraghave-c2-oct                                     Deployed       Completed                                               True
openshift-gitops   rbd-sample-placement-drpc        18h   sraghave-c1-oct    sraghave-c2-oct   Failover       FailedOver     Cleaning Up   2023-10-20T08:31:45Z                      False
openshift-gitops   rbd2-app-placement-drpc          18h   sraghave-c2-oct                                     Deployed       Completed                                               True


With above observations moving the BZ to Verified

Comment 11 Red Hat Bugzilla 2024-04-06 04:25:06 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days