Description of problem (please be detailed as possible and provide log snippests): Version of all relevant components (if applicable): OCP 4.14.0-0.nightly-2023-10-30-170011 advanced-cluster-management.v2.9.0-188 ODF 4.14.0-157 ceph version 17.2.6-148.el9cp (badc1d27cb07762bea48f6554ad4f92b9d3fbb6b) quincy (stable) ACM 2.9.0-DOWNSTREAM-2023-10-18-17-59-25 Submariner brew.registry.redhat.io/rh-osbs/iib:607438 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. On a hub recovery RDR setup, ensure backups are being created on active and passive hub clusters. Failover and relocate different workloads so that it is running on the primary managed cluster after the failover and relocate operation completes. Ensure latest backups are taken and no action of any of the workloads (cephfs, rbd- appset or subscription type) is in progress. 2. Collect drpc status. Bring primary managed cluster down, and then bring active hub down. 3. Ensure secondary managed cluster is properly imported on the passive hub and then DRPolicy gets validated. 4. Check the drpc status from passive hub and compare it with the output taken from active hub when it was up. We notice that post hub recovery, a sanity check is run for all the workloads which were failedover or relocated where we again perform the same action on those workloads which was performed from the active hub, which marks peer ready as false for those workloads. from active hub- NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY busybox-workloads-2 subscription-cephfs-placement-1-drpc 9h amagrawa-31o-prim amagrawa-passivee Relocate Relocated Completed 2023-11-01T17:54:21Z 30.282249722s True busybox-workloads-5 subscription-rbd1-placement-1-drpc 9h amagrawa-31o-prim amagrawa-31o-prim Failover FailedOver Completed 2023-11-01T13:57:37Z 47m3.364814169s True busybox-workloads-6 subscription-rbd2-placement-1-drpc 9h amagrawa-31o-prim amagrawa-passivee Relocate Relocated Completed 2023-11-01T14:16:28Z 3h17m50.318760845s True openshift-gitops appset-cephfs-placement-drpc 9h amagrawa-31o-prim amagrawa-passivee Failover FailedOver Completed 2023-11-01T13:20:45Z 5m59.4021061s True openshift-gitops appset-rbd1-placement-drpc 9h amagrawa-31o-prim amagrawa-31o-prim Failover FailedOver Completed 2023-11-01T14:15:30Z 41m2.588884417s True openshift-gitops appset-rbd2-placement-drpc 9h amagrawa-passivee Deployed Completed True from passive hub- amagrawa:~$ drpc NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY busybox-workloads-2 subscription-cephfs-placement-1-drpc 57m amagrawa-31o-prim amagrawa-passivee Relocate Relocating 2023-11-01T18:59:35Z False busybox-workloads-5 subscription-rbd1-placement-1-drpc 57m amagrawa-31o-prim amagrawa-31o-prim Failover FailingOver WaitForStorageMaintenanceActivation 2023-11-01T18:59:36Z False busybox-workloads-6 subscription-rbd2-placement-1-drpc 57m amagrawa-31o-prim amagrawa-passivee Relocate True openshift-gitops appset-cephfs-placement-drpc 57m amagrawa-31o-prim amagrawa-passivee Failover FailedOver EnsuringVolSyncSetup True openshift-gitops appset-rbd1-placement-drpc 57m amagrawa-31o-prim amagrawa-31o-prim Failover FailingOver FailingOverToCluster 2023-11-01T18:59:36Z False openshift-gitops appset-rbd2-placement-drpc 57m amagrawa-passivee Deployed Completed True Since peer ready is now marked as false due to sanity check, subscription-cephfs-placement-1-drpc and subscription-rbd1-placement-1-drpc and appset-rbd1-placement-drpc can not be failedover in this example. This sanity check is needed as per k8s recommended guidelines and we should not backup the currentstate of the workloads as confirmed by @bmekhiss so the issue will always persist. As of now, the only option is to trigger a failover by editing drpc yaml from CLI hence a force failover UI option is needed in this case with a caution that it may cause data loss/data corruption which would need to be tested. Currently it's blocked due to BZ2246084 which we were able to repro again and would be updated later. Actual results: Current UI cannot initiate failover of workloads which were in any other state than deployed before hub recovery was performed Expected results: Allow a force failover of workloads post hub recovery where peer ready is false Additional info:
Tested with OCP 4.15.0-0.nightly-2024-01-03-015912 ACM GA'ed 2.9.1 ODF 4.15.0-104 ceph version 17.2.6-167.el9cp (5ef1496ea3e9daaa9788809a172bd5a1c3192cf7) quincy (stable) Active hub co-situated with primary managed cluster After site failure and moving to passive hub, Peer Ready for all the workloads was set to True which means we could trigger a failover operation via ACM console. from active hub amagrawa:hub$ drpc NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY busybox-workloads-1 cephfs-sub-busybox1-placement-1-drpc 8d amagrawa-c1-3jan amagrawa-c1-3jan Failover FailedOver Completed 2024-01-12T12:18:52Z 2m55.286783085s True busybox-workloads-10 rbd-sub-busybox10-placement-1-drpc 7h11m amagrawa-c1-3jan Deployed Completed 2024-01-12T09:17:04Z 2.056054748s True busybox-workloads-11 rbd-sub-busybox11-placement-1-drpc 7h10m amagrawa-c1-3jan Deployed Completed 2024-01-12T09:17:49Z 21.043985378s True busybox-workloads-12 rbd-sub-busybox12-placement-1-drpc 7h8m amagrawa-c2-3jan Deployed Completed 2024-01-12T09:19:19Z 88.077165ms True busybox-workloads-15 cephfs-sub-busybox15-placement-1-drpc 7h4m amagrawa-c2-3jan Deployed Completed 2024-01-12T09:23:37Z 52.119470621s True busybox-workloads-2 rbd-sub-busybox2-placement-1-drpc 8d amagrawa-c1-3jan amagrawa-c1-3jan Failover FailedOver Completed 2024-01-12T12:19:42Z 6m5.284279162s True busybox-workloads-5 rbd-sub-busybox5-placement-1-drpc 4d5h amagrawa-c1-3jan Relocate Relocated Completed 2024-01-12T12:19:49Z 2m58.152248084s True busybox-workloads-6 cephfs-sub-busybox6-placement-1-drpc 7h18m amagrawa-c1-3jan Relocate Relocated Completed 2024-01-12T12:18:59Z 2m53.478551336s True busybox-workloads-7 cephfs-sub-busybox7-placement-1-drpc 7h16m amagrawa-c1-3jan Deployed Completed 2024-01-12T09:11:18Z 32.120615573s True openshift-gitops cephfs-appset-busybox16-placement-drpc 7h3m amagrawa-c2-3jan amagrawa-c1-3jan Failover FailedOver Completed 2024-01-12T12:19:33Z 2m37.50568738s True openshift-gitops cephfs-appset-busybox3-placement-drpc 4d6h amagrawa-c2-3jan amagrawa-c2-3jan Failover FailedOver Completed 2024-01-12T12:22:54Z 2m36.257186541s True openshift-gitops cephfs-appset-busybox8-placement-drpc 7h14m amagrawa-c1-3jan Relocate Relocated Completed 2024-01-12T12:19:20Z 4m10.339668753s True openshift-gitops cephfs-appset-busybox9-placement-drpc 7h13m amagrawa-c1-3jan Deployed Completed 2024-01-12T09:15:06Z 32.175780774s True openshift-gitops rbd-appset-busybox13-placement-drpc 7h6m amagrawa-c1-3jan Relocate Relocated Completed 2024-01-12T12:20:02Z 6m7.188328151s True openshift-gitops rbd-appset-busybox14-placement-drpc 7h5m amagrawa-c2-3jan amagrawa-c1-3jan Failover FailedOver Completed 2024-01-12T12:20:12Z 5m29.864938194s True openshift-gitops rbd-appset-busybox17-placement-drpc 4h3m amagrawa-c2-3jan Deployed Completed 2024-01-12T12:24:43Z 15.046600381s True openshift-gitops rbd-appset-busybox4-placement-drpc 4d6h amagrawa-c2-3jan amagrawa-c1-3jan Failover FailedOver Completed 2024-01-12T12:01:19Z 8m27.272353624s True from passive hub (when active hub and primary managed cluster is down after site failure) amagrawa:acm$ drpc NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY busybox-workloads-1 cephfs-sub-busybox1-placement-1-drpc 39m amagrawa-c1-3jan amagrawa-c1-3jan Failover Paused True busybox-workloads-10 rbd-sub-busybox10-placement-1-drpc 39m amagrawa-c1-3jan Paused True busybox-workloads-11 rbd-sub-busybox11-placement-1-drpc 39m amagrawa-c1-3jan Paused True busybox-workloads-12 rbd-sub-busybox12-placement-1-drpc 39m amagrawa-c2-3jan Deployed Completed 2024-01-12T16:52:33Z 963.058163ms True busybox-workloads-15 cephfs-sub-busybox15-placement-1-drpc 39m amagrawa-c2-3jan Deployed EnsuringVolSyncSetup 2024-01-12T16:53:31Z True busybox-workloads-2 rbd-sub-busybox2-placement-1-drpc 39m amagrawa-c1-3jan amagrawa-c1-3jan Failover Paused True busybox-workloads-5 rbd-sub-busybox5-placement-1-drpc 39m amagrawa-c1-3jan Relocate Paused True busybox-workloads-6 cephfs-sub-busybox6-placement-1-drpc 39m amagrawa-c1-3jan Relocate Paused True busybox-workloads-7 cephfs-sub-busybox7-placement-1-drpc 39m amagrawa-c1-3jan Paused True openshift-gitops cephfs-appset-busybox16-placement-drpc 39m amagrawa-c2-3jan amagrawa-c1-3jan Failover Paused True openshift-gitops cephfs-appset-busybox3-placement-drpc 39m amagrawa-c2-3jan amagrawa-c2-3jan Failover FailedOver Cleaning Up True openshift-gitops cephfs-appset-busybox8-placement-drpc 39m amagrawa-c1-3jan Relocate Paused True openshift-gitops cephfs-appset-busybox9-placement-drpc 39m amagrawa-c1-3jan Paused True openshift-gitops rbd-appset-busybox13-placement-drpc 39m amagrawa-c1-3jan Relocate Paused True openshift-gitops rbd-appset-busybox14-placement-drpc 39m amagrawa-c2-3jan amagrawa-c1-3jan Failover Paused True openshift-gitops rbd-appset-busybox17-placement-drpc 39m amagrawa-c2-3jan Deployed Completed 2024-01-12T16:52:32Z 1.263215882s True openshift-gitops rbd-appset-busybox4-placement-drpc 39m amagrawa-c2-3jan amagrawa-c1-3jan Failover Paused True Benamar, do you think we can close this bug based upon this observation?
Not a 4.15 blocker
@amagrawa we can close this one.
Closing based upon observation from Comment6 and confirmation in Comment11.