Description of problem (please be detailed as possible and provide log snippests): This bug is being raised after the discussion in Bug 2138855 On a RDR setup, after initiating Relocate action, until PeerReady is set as False , UI allows Failover action to be triggered to the same peer cluster. At this time, both PreferredCluster and FailoverCluster are pointing to same cluster. This misconfiguration causes Failover to get stuck in WaitForFencing progression. See comment https://bugzilla.redhat.com/show_bug.cgi?id=2138855#c43 for more details. Version of all relevant components (if applicable): OCP: 4.14.0-ec.4 ODF: 4.14.0-134 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Failover will get stuck with WaitForFencing Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. On a RDR setup, deploy an ApplicationSet based application NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY openshift-gitops busybox-1-placement-drpc 9m20s sagrawal-nc1 Deployed Completed True 2. Relocate from C1 to C2 and wait for it to complete. NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY openshift-gitops busybox-1-placement-drpc 13m sagrawal-nc2 Relocate Relocated Completed 2023-09-27T06:48:27Z 4m12.200921726s True 3. Again initiate Relocate from C2 to C1 , and when DRPC shows progression as RunningFinalSync, initiate Failover action from C2 to C1 Wed Sep 27 06:53:28 UTC 2023 NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY openshift-gitops busybox-1-placement-drpc 14m sagrawal-nc1 Relocate Initiating PreparingFinalSync 2023-09-27T06:53:28Z True Wed Sep 27 06:53:37 UTC 2023 NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY openshift-gitops busybox-1-placement-drpc 14m sagrawal-nc1 Relocate Relocating RunningFinalSync 2023-09-27T06:53:28Z True Wed Sep 27 06:53:40 UTC 2023 NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY openshift-gitops busybox-1-placement-drpc 14m sagrawal-nc1 sagrawal-nc1 Failover FailingOver WaitForFencing 2023-09-27T06:53:28Z False Actual results: Failover operation remain stuck forever with WaitForFencing Expected results: Failover operation proceed and should be successful. Additional info: DRPC yaml out when Failover is stuck: --- $ oc get drpc -A -o yaml apiVersion: v1 items: - apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPlacementControl metadata: annotations: drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: sagrawal-nc2 creationTimestamp: "2023-09-27T06:39:07Z" finalizers: - drpc.ramendr.openshift.io/finalizer generation: 4 labels: cluster.open-cluster-management.io/backup: resource name: busybox-1-placement-drpc namespace: openshift-gitops ownerReferences: - apiVersion: cluster.open-cluster-management.io/v1beta1 blockOwnerDeletion: true controller: true kind: Placement name: busybox-1-placement uid: 90edda56-75e9-409a-9fb1-c81fbd389966 resourceVersion: "2615018" uid: fd1d105d-6415-4606-aa29-524d9ca693e2 spec: action: Failover drPolicyRef: apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPolicy name: odr-policy-10m failoverCluster: sagrawal-nc1 placementRef: apiVersion: cluster.open-cluster-management.io/v1beta1 kind: Placement name: busybox-1-placement namespace: openshift-gitops preferredCluster: sagrawal-nc1 pvcSelector: matchLabels: appname: busybox_app1 status: actionStartTime: "2023-09-27T06:53:28Z" conditions: - lastTransitionTime: "2023-09-27T06:53:39Z" message: current home cluster sagrawal-nc1 is not fenced observedGeneration: 4 reason: FailingOver status: "False" type: Available - lastTransitionTime: "2023-09-27T06:53:39Z" message: Started failover to cluster "sagrawal-nc1" observedGeneration: 4 reason: NotStarted status: "False" type: PeerReady lastGroupSyncBytes: 6316032 lastGroupSyncDuration: 0s lastGroupSyncTime: "2023-09-27T06:40:00Z" lastUpdateTime: "2023-09-27T06:53:39Z" phase: FailingOver preferredDecision: clusterName: sagrawal-nc1 clusterNamespace: sagrawal-nc1 progression: WaitForFencing resourceConditions: conditions: - lastTransitionTime: "2023-09-27T06:51:57Z" message: PVCs in the VolumeReplicationGroup are ready for use observedGeneration: 1 reason: Ready status: "True" type: DataReady - lastTransitionTime: "2023-09-27T06:51:48Z" message: VolumeReplicationGroup is replicating observedGeneration: 1 reason: Replicating status: "False" type: DataProtected - lastTransitionTime: "2023-09-27T06:51:41Z" message: Restored cluster data observedGeneration: 1 reason: Restored status: "True" type: ClusterDataReady - lastTransitionTime: "2023-09-27T06:51:57Z" message: Cluster data of all PVs are protected observedGeneration: 1 reason: Uploaded status: "True" type: ClusterDataProtected resourceMeta: generation: 1 kind: VolumeReplicationGroup name: busybox-1-placement-drpc namespace: busybox-1 protectedpvcs: - busybox-pvc-1 - busybox-pvc-10 - busybox-pvc-11 - busybox-pvc-12 - busybox-pvc-13 - busybox-pvc-14 - busybox-pvc-15 - busybox-pvc-16 - busybox-pvc-17 - busybox-pvc-18 - busybox-pvc-19 - busybox-pvc-2 - busybox-pvc-20 - busybox-pvc-3 - busybox-pvc-4 - busybox-pvc-5 - busybox-pvc-6 - busybox-pvc-7 - busybox-pvc-8 - busybox-pvc-9 kind: List metadata: resourceVersion: "" ---
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383