Created attachment 1966396 [details] screenshot 1 Description of problem (please be detailed as possible and provide log snippests): Post hub recovery, cannot initiate failover of appset apps from c1 to c2 managed cluster. (screenshot 1) test-busy-app was deployed on pbyregow-c1. Since pbyregow-c1 was down due to zone failure (screenshot 2), targetcluster should be chosen as pbyregow-c2 during failover. drpc of test-busy-app: oc get drpc -A -o wide | grep test-busy-app-placement-drpc openshift-gitops test-busy-app-placement-drpc 46s pbyregow-c1 apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPlacementControl metadata: resourceVersion: '396061' name: test-busy-app-placement-drpc uid: 91e7191e-1d9a-452d-9e73-18ae47c434ff creationTimestamp: '2023-05-23T08:27:45Z' generation: 1 namespace: openshift-gitops finalizers: - drpc.ramendr.openshift.io/finalizer labels: cluster.open-cluster-management.io/backup: resource velero.io/backup-name: acm-resources-generic-schedule-20230523080037 velero.io/restore-name: restore-acm-acm-resources-generic-schedule-20230523080037 spec: drPolicyRef: apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPolicy name: odr-policy placementRef: apiVersion: cluster.open-cluster-management.io/v1beta1 kind: Placement name: test-busy-app-placement namespace: openshift-gitops preferredCluster: pbyregow-c1 pvcSelector: matchLabels: appname: busybox-rbd status: actionStartTime: '2023-05-23T10:44:07Z' conditions: - lastTransitionTime: '2023-05-23T08:40:06Z' message: Initial deployment completed observedGeneration: 1 reason: Deployed status: 'True' type: Available Version of all relevant components (if applicable): ODF/MCO 4.13.0-199 ocp: 4.13.0-0.nightly-2023-05-10-112355 ACM: 2.7.3 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? yes, cant failover apps after hub recovery Is there any workaround available to the best of your knowledge? no Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 4 Can this issue reproducible? 1/1 Can this issue reproduce from the UI? yes If this is a regression, please provide more details to justify this: no Steps to Reproduce: 1. Deploy and configure MDR cluster. zone a: c1, h1 zone b: c2, h2 2. Create subscription and appset based apps on c1 and c2. Apply drpolicy to all apps. 3. Shutdown zone a (c1 and h1) 4. Perform hub recovery to hub: h2. 5. Fence c1 6. Failover appset apps from c1 to c2. Actual results: Not able to initiate failover of appset apps Expected results: Should be able to initiate failover of appset apps post hub recovery. Additional info: Can failover subscription based apps from c1 to c2. (screenshot 3)
Adding it as a known issue for 4.13.0, @gshanmug please fill the doc text.
QE considers this as a blocker for 4.13.0 as this breaks the functionality of failover from UI. Just for this case users have to rely on CLI. As mentioned in comment 12, this seems to be a minor fix. QE can test this fix in 4.13.0 to make the user experience better and uniform.
as per https://bugzilla.redhat.com/show_bug.cgi?id=2209288#c12 this needs to be fixed in 4.13.z, moving it to 4.14 right now (as it is proposed for 4.13.0 currently). When we will have flag for 4.13.z, we will propose it back for that release.
(In reply to Sanjal Katiyar from comment #21) > as per https://bugzilla.redhat.com/show_bug.cgi?id=2209288#c12 this needs to > be fixed in 4.13.z, moving it to 4.14 right now (as it is proposed for > 4.13.0 currently). When we will have flag for 4.13.z, we will propose it > back for that release. QE wants this to be fixed in 4.13.0 for the reasons mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=2209288#c18. Karolin, could you please check and let us know?
I met with Gowtham and Chandan, and we reached a conclusion regarding the necessary changes to the UI. The updated UI will function as follows: 1. In the event of a hub recovery and the entire zone being down (including both the hub and managed clusters), the UI will automatically choose the surviving cluster for failover. 2. If both clusters are operational, the UI will look at the PlacementDecision to determine the target cluster. If the PlacementDecision is unavailable, the UI will present a dropdown menu containing the available clusters. Users can then select the desired target cluster for the action. In v4.14, Ramen (DRPC) will retrieve the last known Primary when it rebuilds its status. The Last Known Primary will be obtained from the s3 store. If the Last Known Primary is empty, the UI will revert to the aforementioned behavior. @gshanmug Correct me if I missed anything.
*** Bug 2183153 has been marked as a duplicate of this bug. ***
Ramen change is now merged, what's the plan for this BZ?
Tested versions: ---------------- OCP - 4.14.0-0.nightly-2023-10-08-220853 ODF - 4.14.0-146.stable ACM - 2.9.0-180 I was able to initiate failover of appset apps which was in Deployed state via UI Moving this BZ to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6832