2258351 – [RDR] [Hub recovery] [Co-situated] Failover did not progress after site failure

Bug 2258351 - [RDR] [Hub recovery] [Co-situated] Failover did not progress after site failure

Summary: [RDR] [Hub recovery] [Co-situated] Failover did not progress after site failure

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-dr
Sub Component:
Version:	4.15
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	ODF 4.15.0
Assignee:	Benamar Mekhissi
QA Contact:	Aman Agrawal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-01-14 16:38 UTC by Aman Agrawal
Modified:	2024-07-18 04:25 UTC (History)
CC List:	4 users (show)
Fixed In Version:	4.15.0-125
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-03-19 15:31:22 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	RamenDR ramen pull 1179	None	open	Fix Failover Confusion in DRPC Action Post Hub Recovery	2024-01-17 23:25:33 UTC
Github	red-hat-storage ramen pull 175	None	open	Bug 2258351: Fix Failover Confusion in DRPC Action Post Hub Recovery	2024-01-23 17:39:44 UTC
Red Hat Product Errata	RHSA-2024:1383	None	None	None	2024-03-19 15:31:23 UTC

Comment 4 Shrivaibavi Raghaventhiran 2024-01-17 06:52:43 UTC

MDR team also hits this issue

versions:
---------
RHCS - 7.0
OCP - 4.15.0-0.nightly-2024-01-10-101042
ODF - 4.15.0-113
ACM - 2.9.1

Steps to reproduce:
--------------------
1. On MDR hub recovery setup, Deploy subscription apps and appset apps
ensure that few apps are moved to failedover and relocated states
2. Ensure that few apps are just installed without assigning any DRPolicy to them
3. Ensure that backup is taken on both active and passive hub
4. Bring zone b down ( ceph 0, 1, 2 nodes, C1 cluster and Active hub cluster)
5. Restore passive hub into active hub
6. After importing secrets of C2 cluster check DRPolicy to be in validated state
7. Now check DRPC status
$ oc get drpc --all-namespaces -o wide
NAMESPACE          NAME                                AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME   DURATION   PEER READY
cephfs-sub1        cephfs-sub1-placement-1-drpc        67m   sraghave-c1-jan    sraghave-c2-jan   Failover                                                            True
openshift-gitops   cephfs-appset1-placement-drpc       67m   sraghave-c1-jan    sraghave-c2-jan   Relocate                                                            True
openshift-gitops   helloworld-appset1-placement-drpc   67m   sraghave-c1-jan    sraghave-c2-jan   Failover                                                            True
openshift-gitops   rbd-appset1-placement-drpc          67m   sraghave-c1-jan    sraghave-c2-jan   Relocate                                                            True
rbd-sub1           rbd-sub1-placement-1-drpc           67m   sraghave-c1-jan    sraghave-c2-jan   Relocate                                                            True
8. Now assign DRPOlicy to apps that are already installed on clusters and check DRPC statuses
$ oc get drpc --all-namespaces -o wide
NAMESPACE          NAME                                AGE     PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION        PEER READY
cephfs-sub1        cephfs-sub1-placement-1-drpc        70m     sraghave-c1-jan    sraghave-c2-jan   Failover       FailedOver     Cleaning Up                                          False
cephfs-sub2        cephfs-sub2-placement-1-drpc        101s    sraghave-c2-jan                                     Deployed       Completed     2024-01-16T16:35:06Z   22.043807167s   True
openshift-gitops   cephfs-appset1-placement-drpc       70m     sraghave-c1-jan    sraghave-c2-jan   Relocate                      Paused                                               True
openshift-gitops   cephfs-appset2-placement-drpc       3m      sraghave-c2-jan                                     Deployed       Completed     2024-01-16T16:34:08Z   1.043287787s    True
openshift-gitops   helloworld-appset1-placement-drpc   70m     sraghave-c1-jan    sraghave-c2-jan   Failover       FailedOver     Cleaning Up                                          False
openshift-gitops   helloworld-appset2-placement-drpc   2m35s   sraghave-c2-jan                                     Deployed       Completed     2024-01-16T16:34:33Z   1.048046697s    True
openshift-gitops   rbd-appset1-placement-drpc          70m     sraghave-c1-jan    sraghave-c2-jan   Relocate                      Paused                                               True
openshift-gitops   rbd-appset2-placement-drpc          2m14s   sraghave-c2-jan                                     Deployed       Completed     2024-01-16T16:34:41Z   15.040241914s   True
rbd-sub1           rbd-sub1-placement-1-drpc           70m     sraghave-c1-jan    sraghave-c2-jan   Relocate                      Paused                                               True
rbd-sub2           rbd-sub2-placement-1-drpc           62s     sraghave-c2-jan                                     Deployed       Completed     2024-01-16T16:36:06Z   1.04207343s     True

Please note: Zone b is not recovered yet and Failover/relocate not performed post hub recovery.

Comment 12 errata-xmlrpc 2024-03-19 15:31:22 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383

Comment 13 Red Hat Bugzilla 2024-07-18 04:25:26 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.