2209288 – [UI][MDR] After hub recovery, cant initiate failover of applicationSet based apps.

Bug 2209288 - [UI][MDR] After hub recovery, cant initiate failover of applicationSet based apps.

Summary: [UI][MDR] After hub recovery, cant initiate failover of applicationSet based ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	management-console
Sub Component:
Version:	4.13
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	ODF 4.14.0
Assignee:	gowtham
QA Contact:	Shrivaibavi Raghaventhiran
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2183153 (view as bug list)
Depends On:
Blocks:	2154341 2176028
TreeView+	depends on / blocked

Reported:	2023-05-23 11:15 UTC by Parikshith
Modified:	2023-11-08 18:50 UTC (History)
CC List:	10 users (show)
Fixed In Version:	4.14.0-103
Doc Type:	Known Issue
Doc Text:	When a primary managed cluster is down after hub recovery, it is not possible to use the user interface (UI) to trigger failover of `applicationSet` based applications if its las action was `Relocate`. Workaround: To trigger failover from the command-line interface(CLI), set the `DRPC.spec.action` field to `Failover` as follows: ---- $ oc edit drpc -n openshift-gitops app-placement-drpc spec action: Failover ---- You need to trigger using CLI only for the first immediate failover after the hub recovery and the UI works from the next time onward.
Clone Of:
Environment:
Last Closed:	2023-11-08 18:50:55 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	red-hat-storage odf-console pull 861	None	Merged	Fix cant initiate failover of applicationSet based apps after hub recovery	2023-08-01 08:13:25 UTC
Github	red-hat-storage odf-console pull 867	None	Merged	Bug 2209288: [release-4.13] Fix cant initiate failover of applicationSet based apps after hub recovery	2023-08-01 08:13:25 UTC
Github	red-hat-storage odf-console pull 871	None	Merged	Bug 2209288: [release-4.13-compatibility] Fix cant initiate failover of applicationSet based apps after hub recovery	2023-08-01 08:13:26 UTC
Github	red-hat-storage odf-console pull 932	None	open	Bug 2209288: After hub recovery, cant initiate failover - WIP	2023-08-01 08:13:27 UTC
Github	red-hat-storage odf-console pull 956	None	open	Bug 2209288: [release-4.14-compatibility] After hub recovery, cant initiate failover	2023-08-09 05:08:21 UTC
Github	red-hat-storage odf-console pull 957	None	open	Bug 2209288: [release-4.14] After hub recovery, cant initiate failover	2023-08-09 05:09:25 UTC

Description Parikshith 2023-05-23 11:15:06 UTC

Created attachment 1966396 [details]
screenshot 1

Description of problem (please be detailed as possible and provide log
snippests):
Post hub recovery, cannot initiate failover of appset apps from c1 to c2 managed cluster. (screenshot 1)

test-busy-app was deployed on pbyregow-c1. Since pbyregow-c1 was down due to zone failure (screenshot 2), targetcluster should be chosen as pbyregow-c2 during failover.

drpc of test-busy-app:
oc get drpc -A -o wide | grep test-busy-app-placement-drpc
openshift-gitops   test-busy-app-placement-drpc     46s     pbyregow-c1   


apiVersion: ramendr.openshift.io/v1alpha1
kind: DRPlacementControl
metadata:
  resourceVersion: '396061'
  name: test-busy-app-placement-drpc
  uid: 91e7191e-1d9a-452d-9e73-18ae47c434ff
  creationTimestamp: '2023-05-23T08:27:45Z'
  generation: 1
  namespace: openshift-gitops
  finalizers:
    - drpc.ramendr.openshift.io/finalizer
  labels:
    cluster.open-cluster-management.io/backup: resource
    velero.io/backup-name: acm-resources-generic-schedule-20230523080037
    velero.io/restore-name: restore-acm-acm-resources-generic-schedule-20230523080037
spec:
  drPolicyRef:
    apiVersion: ramendr.openshift.io/v1alpha1
    kind: DRPolicy
    name: odr-policy
  placementRef:
    apiVersion: cluster.open-cluster-management.io/v1beta1
    kind: Placement
    name: test-busy-app-placement
    namespace: openshift-gitops
  preferredCluster: pbyregow-c1
  pvcSelector:
    matchLabels:
      appname: busybox-rbd
status:
  actionStartTime: '2023-05-23T10:44:07Z'
  conditions:
    - lastTransitionTime: '2023-05-23T08:40:06Z'
      message: Initial deployment completed
      observedGeneration: 1
      reason: Deployed
      status: 'True'
      type: Available

Version of all relevant components (if applicable):
ODF/MCO 4.13.0-199
ocp: 4.13.0-0.nightly-2023-05-10-112355
ACM: 2.7.3

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes, cant failover apps after hub recovery

Is there any workaround available to the best of your knowledge?
no

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
4

Can this issue reproducible?
1/1

Can this issue reproduce from the UI?
yes

If this is a regression, please provide more details to justify this:
no

Steps to Reproduce:
1. Deploy and configure MDR cluster. zone a: c1, h1  zone b: c2, h2
2. Create subscription and appset based apps on c1 and c2. Apply drpolicy to all apps.
3. Shutdown zone a (c1 and h1)
4. Perform hub recovery to hub: h2.
5. Fence c1
6. Failover appset apps from c1 to c2.


Actual results:
Not able to initiate failover of appset apps

Expected results:
Should be able to initiate failover of appset apps post hub recovery.

Additional info:
Can failover subscription based apps from c1 to c2. (screenshot 3)

Comment 16 Mudit Agarwal 2023-06-05 11:30:05 UTC

Adding it as a known issue for 4.13.0, @gshanmug please fill the doc text.

Comment 18 Harish NV Rao 2023-06-06 07:01:12 UTC

QE considers this as a blocker for 4.13.0 as this breaks the functionality of failover from UI. Just for this case users have to rely on CLI. As mentioned in comment 12, this seems to be a minor fix. QE can test this fix in 4.13.0 to make the user experience better and uniform.

Comment 21 Sanjal Katiyar 2023-06-06 08:59:32 UTC

as per https://bugzilla.redhat.com/show_bug.cgi?id=2209288#c12 this needs to be fixed in 4.13.z, moving it to 4.14 right now (as it is proposed for 4.13.0 currently). When we will have flag for 4.13.z, we will propose it back for that release.

Comment 23 Harish NV Rao 2023-06-06 12:44:28 UTC

(In reply to Sanjal Katiyar from comment #21)
> as per https://bugzilla.redhat.com/show_bug.cgi?id=2209288#c12 this needs to
> be fixed in 4.13.z, moving it to 4.14 right now (as it is proposed for
> 4.13.0 currently). When we will have flag for 4.13.z, we will propose it
> back for that release.

QE wants this to be fixed in 4.13.0 for the reasons mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=2209288#c18. Karolin, could you please check and let us know?

Comment 26 Benamar Mekhissi 2023-06-06 13:00:59 UTC

I met with Gowtham and Chandan, and we reached a conclusion regarding the necessary changes to the UI. The updated UI will function as follows:

1. In the event of a hub recovery and the entire zone being down (including both the hub and managed clusters), the UI will automatically choose the surviving cluster for failover.

2. If both clusters are operational, the UI will look at the PlacementDecision to determine the target cluster. If the PlacementDecision is unavailable, the UI will present a dropdown menu containing the available clusters. Users can then select the desired target cluster for the action.

In v4.14, Ramen (DRPC) will retrieve the last known Primary when it rebuilds its status. The Last Known Primary will be obtained from the s3 store. If the Last Known Primary is empty, the UI will revert to the aforementioned behavior.

@gshanmug Correct me if I missed anything.

Comment 38 Sanjal Katiyar 2023-07-14 09:24:36 UTC

*** Bug 2183153 has been marked as a duplicate of this bug. ***

Comment 40 Mudit Agarwal 2023-07-25 07:19:36 UTC

Ramen change is now merged, what's the plan for this BZ?

Comment 44 Shrivaibavi Raghaventhiran 2023-10-20 11:54:27 UTC

Tested versions:
----------------
OCP - 4.14.0-0.nightly-2023-10-08-220853
ODF - 4.14.0-146.stable
ACM - 2.9.0-180

I was able to initiate failover of appset apps which was in Deployed state via UI

Moving this BZ to verified

Comment 46 errata-xmlrpc 2023-11-08 18:50:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6832

Note You need to log in before you can comment on or make changes to this bug.