Description of problem (please be detailed as possible and provide log snippests): Upgrade one of the managed cluster say c1(both OCP and ODF), perform failover of applications from c2 to c1 (that is when the managed clusters are on different versions of OCP and ODF perform failover of an application). Here c2 managed cluster is using OCP:4.12-nightly-build, ODF: 4.12.3 and c1 is upgraded to OCP: 4.13-nightly-build, ODF: 4.13-latest-rc When trying to failover applications (helloworld-c2, cronjob-c2, bs-1) from c2 to c1, failover is hanged in "FailingOver" state with below error message: "Failed to restore PVs (failed to restore ClusterData for VolRep (failed to restore PVs and PVCs using profile list ([s3profile-akrai-c1-ocs-external-storagecluster s3profile-akrai-c2-ocs-external-storagecluster]): unable to ListKeys of type v1.PersistentVolume keyPrefix helloworld-c2/helloworld-c2-placement-1-drpc/v1.PersistentVolume/, failed to list objects in bucket odrbucket-67670dd10b7c:helloworld-c2/helloworld-c2-placement-1-drpc/v1.PersistentVolume/, InternalError: We encountered an internal error. Please try again.\n\tstatus code: 500, request id: lisp29p4-4gcur5-6o1, host id: lisp29p4-4gcur5-6o1))" observedGeneration: 1 $ date; date --utc; oc get drpc -A -owide | grep -i FailingOver Monday 12 June 2023 04:05:53 PM IST Monday 12 June 2023 10:35:53 AM UTC bs-1 bs-1-placement-1-drpc 4h43m akrai-c2 akrai-c1 Failover FailingOver WaitingForPVRestore 2023-06-12T10:00:40Z False cronjob-c2 cronjob-c2-placement-1-drpc 4h43m akrai-c2 akrai-c1 Failover FailingOver WaitingForPVRestore 2023-06-12T10:00:52Z False helloworld-c2 helloworld-c2-placement-1-drpc 4h42m akrai-c2 akrai-c1 Failover FailingOver WaitingForPVRestore 2023-06-12T10:01:05Z False Version of all relevant components (if applicable): c1 managed cluster upgraded version: OCP: 4.13.0-0.nightly-2023-06-09-152551 ODF: 4.13.0-rhodf (latest rc build) C2 and hub clusters OCP: 4.12.0-0.nightly-2023-06-08-063126 ODF: 4.12.3-rhodf ACM: 2.7.4 CEPH: 17.2.6-70.el9cp (fe62dcdbb2c6e05782a3e2b67d025b84ff5047cc) quincy (stable) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? yes, failover application cannot be done Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? 1/1 Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Create 4 OCP clusters such that 2 hubs and 2 managed clusters. And one stretched RHCS cluster. Deploy cluster in such a way that zone a: arbiter ceph node zone b: c1, h1, 3 ceph nodes zone c: c2, h2, 3 ceph nodes Deployed cluster using version: OCP : 4.12.0-0.nightly-2023-06-08-063126 ODF: 4.12.3-rhodf 2. Configure MDR and deploy 10 applications on each managed clusters 3. Upgrade c1 managed cluster, OCP to 4.13.0-0.nightly-2023-06-09-152551 and ODF to 4.13.0-rhodf (latest rc build) 4. Perform failover and failback of applications from c1 to c2, which succeeded 5. Perform failover of applications from c2 to c1, which is hanged in "FaillingOver" state Actual results: Application failover are hanged in "FailingOver" state when the managed clusters are on different versions of OCP and ODF. Expected results: Application failover and failback of applications, should succeed. Additional info:
Proposing as a blocker for 4.13.0 due to the recent news that Metro DR functionalities, such as failover and failback are broken post ugprade to 4.13.0
Retaining original severity and removing the blocker flag as comment 3 is now tracked in the new bz: https://bugzilla.redhat.com/show_bug.cgi?id=2215462 which is a blocker for 4.13.0 We would like to retain this bz for the original issue "Application failover are hanged in "FailingOver" state when the managed clusters are on different versions of OCP and ODF"