2211883 – [MDR] After zone failure(c1+h1 cluster) and hub recovery, c1 apps peer ready status is in "Unknown" state

Bug 2211883 - [MDR] After zone failure(c1+h1 cluster) and hub recovery, c1 apps peer ready status is in "Unknown" state

Summary: [MDR] After zone failure(c1+h1 cluster) and hub recovery, c1 apps peer ready ...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-dr
Sub Component:
Version:	4.13
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	---
Target Release:	---
Assignee:	Benamar Mekhissi
QA Contact:	Shrivaibavi Raghaventhiran
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2154341
TreeView+	depends on / blocked

Reported:	2023-06-02 13:19 UTC by Parikshith
Modified:	2024-04-06 04:25 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	After the zone failure and hub recovery, occasionally, the peer ready status of the subscription and appset applications in their disaster recovery placement control (DRPC) is shown as `Unknown`. This is a cosmetic issue and does not impact the regular functionality of Ramen and is limited to the visual appearance of the DRPC output when viewed using the `oc` command. Workaround: Use the YAML output to know the correct status: ---- $ oc get drpc -o yaml ----
Clone Of:
Environment:
Last Closed:	2023-12-07 17:19:20 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	RamenDR ramen pull 920	0	None	Merged	Ordered initialization for the DRPC condition array	2023-09-07 12:24:28 UTC

Description Parikshith 2023-06-02 13:19:44 UTC

Description of problem (please be detailed as possible and provide log
snippests):
After shutting down zone hosting c1 and h1 cluster and performing hub recovery to h2. Peer Ready status of subscription and appset apps in their drpc is in "Unknown" state.

NAMESPACE          NAME                             AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION      START TIME             DURATION       PEER READY
b-sub-1            b-sub-1-placement-1-drpc         59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
b-sub-2            b-sub-2-placement-1-drpc         59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
cronjob-sub-1      cronjob-sub-1-placement-1-drpc   59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   b-app-1-placement-drpc           59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   b-app-2-placement-drpc           59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   cronjob-app-1-placement-drpc     59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   job-app-1-placement-drpc         59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown

Version of all relevant components (if applicable):
ocp: 4.13.0-0.nightly-2023-05-30-074322
odf/mco: 4.13.0-207
ACM: 2.7.4

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
no

Is there any workaround available to the best of your knowledge?
no

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
4

Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Create 4 OCP clusters such that 2 hubs and 2 managed clusters. And one stretched RHCS cluster.
   Deploy cluster in such a way that
	zone a: arbiter ceph node
	zone b: c1, h1, 3 ceph nodes
	zone c: c2, h2, 3 ceph nodes
2. Configure MDR and deploy applications(appset and subscription) on each managed clusters. Apply drpolicy to all apps.
3. Initiate a backup process, such that the active and passive hubs are in sync
4. Made zone b down, ie c1, h1 and 3 ceph nodes
5. Initiate the restore process on h2
6. Restore succeeded in new-hub, dr policy on h2 in validated state
7. Check drpc of c1 apps


Actual results:
Peer ready status of c1 apps is Unknown

Expected results:
Peer ready status c1 apps should be True 

Additional info:

Comment 4 Mudit Agarwal 2023-06-05 11:47:49 UTC

Not a 4.13 blocker, moving it out

Comment 5 Benamar Mekhissi 2023-06-12 01:31:44 UTC

PR is here: https://github.com/RamenDR/ramen/pull/920

Comment 8 Karolin Seeger 2023-09-07 12:26:14 UTC

PR is merged, moving it to ON_QA.

Comment 9 Shrivaibavi Raghaventhiran 2023-10-20 12:04:32 UTC

Tested versions:
----------------
OCP - 4.14.0-0.nightly-2023-10-08-220853
ODF - 4.14.0-146.stable
ACM - 2.9.0-180

Test Steps:
------------
1. Create 4 OCP clusters such that 2 hubs and 2 managed clusters. And one stretched RHCS cluster.
   Deploy cluster in such a way that
	zone a: arbiter ceph node
	zone b: c1, h1, 3 ceph nodes
	zone c: c2, h2, 3 ceph nodes
2. Configure MDR and deploy applications(appset and subscription) on each managed clusters. Apply drpolicy to all apps.
3. Initiate a backup process, such that the active and passive hubs are in sync
4. Made zone b down, ie c1, h1 and 3 ceph nodes
5. Initiate the restore process on h2
6. Restore succeeded in new-hub, dr policy on h2 in validated state
7. Check drpc of c1 apps

Validation:
------------
Peer ready status of apps is displayed as True/false not an unknown after hub recovery

DRPC O/P:
---------
sraghave:~$ oc get drpc -A -o wide
NAMESPACE          NAME                             AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION           PEER READY
cephfs1            cephfs1-placement-3-drpc         18h   sraghave-c1-oct    sraghave-c2-oct   Failover       FailedOver     Completed     2023-10-19T19:00:01Z   24m4.988864425s    True
cephfs2            cephfs2-placement-3-drpc         18h   sraghave-c2-oct                                     Deployed       Completed                                               True
daemonset1         daemonset1-placement-3-drpc      18h   sraghave-c1-oct    sraghave-c2-oct   Failover       FailedOver     Completed     2023-10-19T19:00:24Z   22m42.043971686s   True
deployment1        deployment1-placement-3-drpc     16h   sraghave-c1-oct                                     Deployed       Completed                                               True
openshift-gitops   cephfs-appset1-placement-drpc    18h   sraghave-c1-oct    sraghave-c2-oct   Failover       FailedOver     Completed                                               True
openshift-gitops   cephfs-placement-drpc            18h   sraghave-c2-oct                                     Deployed       Completed                                               True
openshift-gitops   cephfs1-app-placement-drpc       18h   sraghave-c1-oct                                     Deployed       Completed                                               True
openshift-gitops   cephfs2-app-placement-drpc       18h   sraghave-c2-oct                                     Deployed       Completed                                               True
openshift-gitops   deployment1-app-placement-drpc   18h   sraghave-c1-oct                                     Deployed       Completed                                               True
openshift-gitops   deployment2-app-placement-drpc   18h   sraghave-c2-oct                                     Deployed       Completed                                               True
openshift-gitops   hello-appsets1-placement-drpc    18h   sraghave-c1-oct    sraghave-c2-oct   Failover       FailedOver     Completed                                               True
openshift-gitops   hello1-app-placement-drpc        18h   sraghave-c1-oct                                     Deployed       Completed                                               True
openshift-gitops   hello2-app-placement-drpc        18h   sraghave-c2-oct                                     Deployed       Completed                                               True
openshift-gitops   helloworld-placement-drpc        18h   sraghave-c2-oct                                     Deployed       Completed                                               True
openshift-gitops   rbd-appset1-placement-drpc       18h   sraghave-c1-oct    sraghave-c2-oct   Failover       FailedOver     Completed                                               True
openshift-gitops   rbd-placement-drpc               18h   sraghave-c2-oct                                     Deployed       Completed                                               True
openshift-gitops   rbd-sample-placement-drpc        18h   sraghave-c1-oct    sraghave-c2-oct   Failover       FailedOver     Cleaning Up   2023-10-20T08:31:45Z                      False
openshift-gitops   rbd2-app-placement-drpc          18h   sraghave-c2-oct                                     Deployed       Completed                                               True


With above observations moving the BZ to Verified

Comment 11 Red Hat Bugzilla 2024-04-06 04:25:06 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.