4.12.z clone of https://bugzilla.redhat.com/show_bug.cgi?id=2211643 Description of problem (please be detailed as possible and provide log snippests): After shutting down zone hosting c1 and h1 cluster and performing hub recovery to h2. Apps(subscription and appset) located/deployed on c2 managed cluster is deleted automatically. Drpc of apps before hub recovery: --------------------------------- oc get drpc -A -o wide NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY b-sub-1 b-sub-1-placement-1-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:53:34Z 2m5.010223347s True b-sub-2 b-sub-2-placement-1-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:53:46Z 1m31.99599474s True b-sub-3 b-sub-3-placement-1-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:48:24Z 2m16.995339752s True b-sub-4 b-sub-4-placement-1-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:48:39Z 18m7.137529589s True cronjob-sub-1 cronjob-sub-1-placement-1-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:53:59Z 1m59.004503184s True cronjob-sub-2 cronjob-sub-2-placement-1-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:48:59Z 2m13.949180869s True job-sub-1 job-sub-1-placement-1-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:54:34Z 1m40.973458756s True job-sub-2 job-sub-2-placement-1-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:49:16Z 2m24.759375791s True new-sub-1 new-sub-1-placement-1-drpc 3h11m pbyregow-c1 Deployed Completed 2023-06-01T05:58:49Z 2.032041553s True new-sub-2 new-sub-2-placement-1-drpc 3h11m pbyregow-c2 Deployed Completed 2023-06-01T05:58:38Z 19.043831186s True openshift-gitops b-app-1-placement-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:53:20Z 1m41.015799768s True openshift-gitops b-app-2-placement-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:53:24Z 5m0.009409233s True openshift-gitops b-app-3-placement-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:48:10Z 5m30.927637297s True openshift-gitops b-app-4-placement-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:48:15Z 4m55.943941384s True openshift-gitops cronjob-app-1-placement-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:53:52Z 1m49.956000232s True openshift-gitops cronjob-app-2-placement-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:48:50Z 1m53.3372577s True openshift-gitops job-app-1-placement-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:54:05Z 1m57.943351562s True openshift-gitops job-app-2-placement-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:49:07Z 2m3.812597752s True openshift-gitops new-app-1-placement-drpc 3h11m pbyregow-c1 Deployed Completed 2023-06-01T05:58:37Z 29.01439644s True openshift-gitops new-app-2-placement-drpc 3h10m pbyregow-c2 Deployed Completed 2023-06-01T05:59:14Z 1.013864968s True drpc of apps after recovery: ---------------------------- oc get drpc -A -o wide NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY b-sub-1 b-sub-1-placement-1-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown b-sub-2 b-sub-2-placement-1-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown b-sub-3 b-sub-3-placement-1-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up True b-sub-4 b-sub-4-placement-1-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up True cronjob-sub-1 cronjob-sub-1-placement-1-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown cronjob-sub-2 cronjob-sub-2-placement-1-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up True job-sub-1 job-sub-1-placement-1-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown job-sub-2 job-sub-2-placement-1-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up True new-sub-1 new-sub-1-placement-1-drpc 59m pbyregow-c1 Deployed UpdatingPlRule 2023-06-01T10:21:16Z True new-sub-2 new-sub-2-placement-1-drpc 59m pbyregow-c2 Deployed Completed 2023-06-01T09:40:40Z 116.057682ms True openshift-gitops b-app-1-placement-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown openshift-gitops b-app-2-placement-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown openshift-gitops b-app-3-placement-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up True openshift-gitops b-app-4-placement-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up True openshift-gitops cronjob-app-1-placement-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown openshift-gitops cronjob-app-2-placement-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up 2023-06-01T09:40:39Z True openshift-gitops job-app-1-placement-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown openshift-gitops job-app-2-placement-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up True openshift-gitops new-app-1-placement-drpc 59m pbyregow-c1 Deployed UpdatingPlRule 2023-06-01T10:21:20Z True openshift-gitops new-app-2-placement-drpc 59m pbyregow-c2 Deployed Completed 2023-06-01T09:40:41Z 376.606272ms True Version of all relevant components (if applicable): ocp: 4.13.0-0.nightly-2023-05-30-074322 odf/mco: 4.13.0-207 ACM: 2.7.4 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? yes, apps on running managed cluster should not be deleted Is there any workaround available to the best of your knowledge? no Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 4 Can this issue reproducible? not sure Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: yes Steps to Reproduce: 1. Create 4 OCP clusters such that 2 hubs and 2 managed clusters. And one stretched RHCS cluster. Deploy cluster in such a way that zone a: arbiter ceph node zone b: c1, h1, 3 ceph nodes zone c: c2, h2, 3 ceph nodes 2. Configure MDR and deploy 20 applications on each managed clusters 3. Initiate a backup process, such that the active and passive hubs are in sync 4. Made zone b down, ie c1, h1 and 3 ceph nodes 5. Initiate the restore process on h2 6. Restore succeeded in new-hub, dr policy on h2 in validated state 7. Check applications on c2 cluster Actual results: Applications present on c2 managed cluster are deleted after hub recovery Expected results: Applications on c2 managed cluster should present and in running state after hub recovery. Additional info: Had validated the status of apps on c2 some time before hub recovery: ------------------------------------------------------------------------ $for i in {b-sub-3,b-sub-4,b-app-3,b-app-4,cronjob-sub-2,job-sub-2,cronjob-app-2,job-app-2,new-app-2,new-sub-2}; do oc get pod,pvc -n $i; done NAME READY STATUS RESTARTS AGE pod/busybox-rbd-5f46b79479-h8pdn 1/1 Running 0 19h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-rbd-pvc Bound pvc-3800a556-9776-4216-b8b1-c48b4989308e 5Gi RWO ocs-external-storagecluster-ceph-rbd 19h NAME READY STATUS RESTARTS AGE pod/busybox-cephfs-7bd55bcb67-9qn6c 1/1 Running 0 19h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-cephfs-pvc Bound pvc-8fe28974-f5ae-4848-95d9-763a9e7457e5 5Gi RWO ocs-external-storagecluster-cephfs 19h NAME READY STATUS RESTARTS AGE pod/busybox-rbd-5f46b79479-n5jj8 1/1 Running 0 19h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-rbd-pvc Bound pvc-27f7ab03-88c8-47e2-b362-e6886ae4ad22 5Gi RWO ocs-external-storagecluster-ceph-rbd 19h NAME READY STATUS RESTARTS AGE pod/busybox-cephfs-7bd55bcb67-282cz 1/1 Running 0 19h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-cephfs-pvc Bound pvc-28174687-31a2-4b65-947a-368655475774 5Gi RWO ocs-external-storagecluster-cephfs 19h NAME READY STATUS RESTARTS AGE pod/hello-world-job-cephfs-28093508-p7hkv 0/1 Completed 0 2m14s pod/hello-world-job-cephfs-28093509-skd98 0/1 Completed 0 74s pod/hello-world-job-cephfs-28093510-kgv9w 0/1 Completed 0 14s pod/hello-world-job-rbd-28093508-wdxrn 0/1 Completed 0 2m14s pod/hello-world-job-rbd-28093509-vzsg6 0/1 Completed 0 74s pod/hello-world-job-rbd-28093510-xjkxx 0/1 Completed 0 14s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/hello-world-cephfs Bound pvc-c9fe7e99-cd5a-4777-8711-42dcdeacb3c2 10Gi RWO ocs-external-storagecluster-cephfs 19h persistentvolumeclaim/hello-world-rbd Bound pvc-7c0ee582-284c-4397-a2aa-57c7cf769568 10Gi RWO ocs-external-storagecluster-ceph-rbd 19h NAME READY STATUS RESTARTS AGE pod/countdown-cephfs-sdvg5 1/1 Running 0 19h pod/countdown-cephfs-sztzq 1/1 Running 0 19h pod/countdown-cephfs-wctxf 1/1 Running 0 19h pod/countdown-rbd-2cqd6 1/1 Running 0 19h pod/countdown-rbd-g5mnl 1/1 Running 0 19h pod/countdown-rbd-jbflw 1/1 Running 0 19h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/job-cephfspvc Bound pvc-2e71aa9f-5eb2-4098-80e0-6fe602c5a0b4 5Gi RWO ocs-external-storagecluster-cephfs 19h persistentvolumeclaim/job-rbdpvc Bound pvc-8b3af01c-b138-4882-94bd-742b7656c699 5Gi RWO ocs-external-storagecluster-ceph-rbd 19h NAME READY STATUS RESTARTS AGE pod/hello-world-job-cephfs-28093508-9zs4k 0/1 Completed 0 2m17s pod/hello-world-job-cephfs-28093509-lvcp2 0/1 Completed 0 77s pod/hello-world-job-cephfs-28093510-5fr6p 0/1 Completed 0 17s pod/hello-world-job-rbd-28093508-v6bql 0/1 Completed 0 2m17s pod/hello-world-job-rbd-28093509-kvgtr 0/1 Completed 0 77s pod/hello-world-job-rbd-28093510-xsskr 0/1 Completed 0 17s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/hello-world-cephfs Bound pvc-c840a012-b91c-4be0-aeef-bbe6c976e52e 10Gi RWO ocs-external-storagecluster-cephfs 19h persistentvolumeclaim/hello-world-rbd Bound pvc-e9f70091-2bad-4f66-90ac-ac09927f9aa7 10Gi RWO ocs-external-storagecluster-ceph-rbd 19h NAME READY STATUS RESTARTS AGE pod/countdown-cephfs-l2wqv 1/1 Running 0 19h pod/countdown-cephfs-mxxct 1/1 Running 0 19h pod/countdown-cephfs-wzt87 1/1 Running 0 19h pod/countdown-rbd-89hb6 1/1 Running 0 19h pod/countdown-rbd-gghfs 1/1 Running 0 19h pod/countdown-rbd-l6h5v 1/1 Running 0 19h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/job-cephfspvc Bound pvc-f533982a-9613-4aad-93f1-7ae1be45d4b3 5Gi RWO ocs-external-storagecluster-cephfs 19h persistentvolumeclaim/job-rbdpvc Bound pvc-e463888c-04e4-4278-969c-9bbea4148294 5Gi RWO ocs-external-storagecluster-ceph-rbd 19h NAME READY STATUS RESTARTS AGE pod/busybox-rbd-5f46b79479-52mht 1/1 Running 0 3h12m NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-rbd-pvc Bound pvc-37dc7b45-96b8-430d-b1c6-44b72b489805 5Gi RWO ocs-external-storagecluster-ceph-rbd 3h12m NAME READY STATUS RESTARTS AGE pod/busybox-rbd-5f46b79479-zlwj5 1/1 Running 0 3h12m NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-rbd-pvc Bound pvc-0e1d7acd-c676-4b10-9b08-cae9c4847426 5Gi RWO After hub recovery: ----------------- $for i in {b-sub-3,b-sub-4,b-app-3,b-app-4,cronjob-sub-2,job-sub-2,cronjob-app-2,job-app-2,new-app-2,new-sub-2}; do oc get pod,pvc -n $i; done No resources found in b-sub-3 namespace. No resources found in b-sub-4 namespace. No resources found in b-app-3 namespace. No resources found in b-app-4 namespace. No resources found in cronjob-sub-2 namespace. No resources found in job-sub-2 namespace. No resources found in cronjob-app-2 namespace. No resources found in job-app-2 namespace. No resources found in new-app-2 namespace. No resources found in new-sub-2 namespace.
Bug has been excluded of 4.12.4, because of a CVP issue.
(In reply to Karolin Seeger from comment #10) > Bug has been excluded of 4.12.4, because of a CVP issue. This fix did not cause the CVP issue, patch to be included in 4.12.5.
Bug still in assigned state at the time of 4.12.5 finalization. Moving the bug to 4.12.6.
Verification of this bug at 4.14 is still not complete. Moving the bug to 4.12.10 release for verification.
We also need fix for https://issues.redhat.com/browse/ACM-7115 to verify this bug and the fix is not backported on to ACM 2.7 z streams. Moving the bug out to 4.12.11
4.12.11 content finalization was complete and this fix was not part of it. moving the bug to next z stream.
Harish - Given that hub recovery won't be supported in z streams for MDR, Can we close this bug?
(In reply to krishnaram Karthick from comment #28) > Harish - Given that hub recovery won't be supported in z streams for MDR, > Can we close this bug? Closing this bug, we can reopen the bug if there is an ask from PM.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days