Description of problem (please be detailed as possible and provide log snippests): After shutting down zone hosting c1 and h1 cluster and performing hub recovery to h2. Apps(subscription and appset) located/deployed on c2 managed cluster is deleted automatically. Drpc of apps before hub recovery: --------------------------------- oc get drpc -A -o wide NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY b-sub-1 b-sub-1-placement-1-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:53:34Z 2m5.010223347s True b-sub-2 b-sub-2-placement-1-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:53:46Z 1m31.99599474s True b-sub-3 b-sub-3-placement-1-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:48:24Z 2m16.995339752s True b-sub-4 b-sub-4-placement-1-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:48:39Z 18m7.137529589s True cronjob-sub-1 cronjob-sub-1-placement-1-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:53:59Z 1m59.004503184s True cronjob-sub-2 cronjob-sub-2-placement-1-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:48:59Z 2m13.949180869s True job-sub-1 job-sub-1-placement-1-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:54:34Z 1m40.973458756s True job-sub-2 job-sub-2-placement-1-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:49:16Z 2m24.759375791s True new-sub-1 new-sub-1-placement-1-drpc 3h11m pbyregow-c1 Deployed Completed 2023-06-01T05:58:49Z 2.032041553s True new-sub-2 new-sub-2-placement-1-drpc 3h11m pbyregow-c2 Deployed Completed 2023-06-01T05:58:38Z 19.043831186s True openshift-gitops b-app-1-placement-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:53:20Z 1m41.015799768s True openshift-gitops b-app-2-placement-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:53:24Z 5m0.009409233s True openshift-gitops b-app-3-placement-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:48:10Z 5m30.927637297s True openshift-gitops b-app-4-placement-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:48:15Z 4m55.943941384s True openshift-gitops cronjob-app-1-placement-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:53:52Z 1m49.956000232s True openshift-gitops cronjob-app-2-placement-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:48:50Z 1m53.3372577s True openshift-gitops job-app-1-placement-drpc 23h pbyregow-c1 pbyregow-c2 Relocate Relocated Completed 2023-05-31T11:54:05Z 1m57.943351562s True openshift-gitops job-app-2-placement-drpc 23h pbyregow-c2 pbyregow-c1 Relocate Relocated Completed 2023-05-31T13:49:07Z 2m3.812597752s True openshift-gitops new-app-1-placement-drpc 3h11m pbyregow-c1 Deployed Completed 2023-06-01T05:58:37Z 29.01439644s True openshift-gitops new-app-2-placement-drpc 3h10m pbyregow-c2 Deployed Completed 2023-06-01T05:59:14Z 1.013864968s True drpc of apps after recovery: ---------------------------- oc get drpc -A -o wide NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY b-sub-1 b-sub-1-placement-1-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown b-sub-2 b-sub-2-placement-1-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown b-sub-3 b-sub-3-placement-1-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up True b-sub-4 b-sub-4-placement-1-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up True cronjob-sub-1 cronjob-sub-1-placement-1-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown cronjob-sub-2 cronjob-sub-2-placement-1-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up True job-sub-1 job-sub-1-placement-1-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown job-sub-2 job-sub-2-placement-1-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up True new-sub-1 new-sub-1-placement-1-drpc 59m pbyregow-c1 Deployed UpdatingPlRule 2023-06-01T10:21:16Z True new-sub-2 new-sub-2-placement-1-drpc 59m pbyregow-c2 Deployed Completed 2023-06-01T09:40:40Z 116.057682ms True openshift-gitops b-app-1-placement-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown openshift-gitops b-app-2-placement-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown openshift-gitops b-app-3-placement-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up True openshift-gitops b-app-4-placement-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up True openshift-gitops cronjob-app-1-placement-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown openshift-gitops cronjob-app-2-placement-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up 2023-06-01T09:40:39Z True openshift-gitops job-app-1-placement-drpc 59m pbyregow-c1 pbyregow-c2 Relocate Unknown openshift-gitops job-app-2-placement-drpc 59m pbyregow-c2 pbyregow-c1 Relocate Relocated Cleaning Up True openshift-gitops new-app-1-placement-drpc 59m pbyregow-c1 Deployed UpdatingPlRule 2023-06-01T10:21:20Z True openshift-gitops new-app-2-placement-drpc 59m pbyregow-c2 Deployed Completed 2023-06-01T09:40:41Z 376.606272ms True Version of all relevant components (if applicable): ocp: 4.13.0-0.nightly-2023-05-30-074322 odf/mco: 4.13.0-207 ACM: 2.7.4 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? yes, apps on running managed cluster should not be deleted Is there any workaround available to the best of your knowledge? no Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 4 Can this issue reproducible? not sure Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: yes Steps to Reproduce: 1. Create 4 OCP clusters such that 2 hubs and 2 managed clusters. And one stretched RHCS cluster. Deploy cluster in such a way that zone a: arbiter ceph node zone b: c1, h1, 3 ceph nodes zone c: c2, h2, 3 ceph nodes 2. Configure MDR and deploy 20 applications on each managed clusters 3. Initiate a backup process, such that the active and passive hubs are in sync 4. Made zone b down, ie c1, h1 and 3 ceph nodes 5. Initiate the restore process on h2 6. Restore succeeded in new-hub, dr policy on h2 in validated state 7. Check applications on c2 cluster Actual results: Applications present on c2 managed cluster are deleted after hub recovery Expected results: Applications on c2 managed cluster should present and in running state after hub recovery. Additional info: Had validated the status of apps on c2 some time before hub recovery: ------------------------------------------------------------------------ $for i in {b-sub-3,b-sub-4,b-app-3,b-app-4,cronjob-sub-2,job-sub-2,cronjob-app-2,job-app-2,new-app-2,new-sub-2}; do oc get pod,pvc -n $i; done NAME READY STATUS RESTARTS AGE pod/busybox-rbd-5f46b79479-h8pdn 1/1 Running 0 19h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-rbd-pvc Bound pvc-3800a556-9776-4216-b8b1-c48b4989308e 5Gi RWO ocs-external-storagecluster-ceph-rbd 19h NAME READY STATUS RESTARTS AGE pod/busybox-cephfs-7bd55bcb67-9qn6c 1/1 Running 0 19h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-cephfs-pvc Bound pvc-8fe28974-f5ae-4848-95d9-763a9e7457e5 5Gi RWO ocs-external-storagecluster-cephfs 19h NAME READY STATUS RESTARTS AGE pod/busybox-rbd-5f46b79479-n5jj8 1/1 Running 0 19h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-rbd-pvc Bound pvc-27f7ab03-88c8-47e2-b362-e6886ae4ad22 5Gi RWO ocs-external-storagecluster-ceph-rbd 19h NAME READY STATUS RESTARTS AGE pod/busybox-cephfs-7bd55bcb67-282cz 1/1 Running 0 19h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-cephfs-pvc Bound pvc-28174687-31a2-4b65-947a-368655475774 5Gi RWO ocs-external-storagecluster-cephfs 19h NAME READY STATUS RESTARTS AGE pod/hello-world-job-cephfs-28093508-p7hkv 0/1 Completed 0 2m14s pod/hello-world-job-cephfs-28093509-skd98 0/1 Completed 0 74s pod/hello-world-job-cephfs-28093510-kgv9w 0/1 Completed 0 14s pod/hello-world-job-rbd-28093508-wdxrn 0/1 Completed 0 2m14s pod/hello-world-job-rbd-28093509-vzsg6 0/1 Completed 0 74s pod/hello-world-job-rbd-28093510-xjkxx 0/1 Completed 0 14s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/hello-world-cephfs Bound pvc-c9fe7e99-cd5a-4777-8711-42dcdeacb3c2 10Gi RWO ocs-external-storagecluster-cephfs 19h persistentvolumeclaim/hello-world-rbd Bound pvc-7c0ee582-284c-4397-a2aa-57c7cf769568 10Gi RWO ocs-external-storagecluster-ceph-rbd 19h NAME READY STATUS RESTARTS AGE pod/countdown-cephfs-sdvg5 1/1 Running 0 19h pod/countdown-cephfs-sztzq 1/1 Running 0 19h pod/countdown-cephfs-wctxf 1/1 Running 0 19h pod/countdown-rbd-2cqd6 1/1 Running 0 19h pod/countdown-rbd-g5mnl 1/1 Running 0 19h pod/countdown-rbd-jbflw 1/1 Running 0 19h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/job-cephfspvc Bound pvc-2e71aa9f-5eb2-4098-80e0-6fe602c5a0b4 5Gi RWO ocs-external-storagecluster-cephfs 19h persistentvolumeclaim/job-rbdpvc Bound pvc-8b3af01c-b138-4882-94bd-742b7656c699 5Gi RWO ocs-external-storagecluster-ceph-rbd 19h NAME READY STATUS RESTARTS AGE pod/hello-world-job-cephfs-28093508-9zs4k 0/1 Completed 0 2m17s pod/hello-world-job-cephfs-28093509-lvcp2 0/1 Completed 0 77s pod/hello-world-job-cephfs-28093510-5fr6p 0/1 Completed 0 17s pod/hello-world-job-rbd-28093508-v6bql 0/1 Completed 0 2m17s pod/hello-world-job-rbd-28093509-kvgtr 0/1 Completed 0 77s pod/hello-world-job-rbd-28093510-xsskr 0/1 Completed 0 17s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/hello-world-cephfs Bound pvc-c840a012-b91c-4be0-aeef-bbe6c976e52e 10Gi RWO ocs-external-storagecluster-cephfs 19h persistentvolumeclaim/hello-world-rbd Bound pvc-e9f70091-2bad-4f66-90ac-ac09927f9aa7 10Gi RWO ocs-external-storagecluster-ceph-rbd 19h NAME READY STATUS RESTARTS AGE pod/countdown-cephfs-l2wqv 1/1 Running 0 19h pod/countdown-cephfs-mxxct 1/1 Running 0 19h pod/countdown-cephfs-wzt87 1/1 Running 0 19h pod/countdown-rbd-89hb6 1/1 Running 0 19h pod/countdown-rbd-gghfs 1/1 Running 0 19h pod/countdown-rbd-l6h5v 1/1 Running 0 19h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/job-cephfspvc Bound pvc-f533982a-9613-4aad-93f1-7ae1be45d4b3 5Gi RWO ocs-external-storagecluster-cephfs 19h persistentvolumeclaim/job-rbdpvc Bound pvc-e463888c-04e4-4278-969c-9bbea4148294 5Gi RWO ocs-external-storagecluster-ceph-rbd 19h NAME READY STATUS RESTARTS AGE pod/busybox-rbd-5f46b79479-52mht 1/1 Running 0 3h12m NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-rbd-pvc Bound pvc-37dc7b45-96b8-430d-b1c6-44b72b489805 5Gi RWO ocs-external-storagecluster-ceph-rbd 3h12m NAME READY STATUS RESTARTS AGE pod/busybox-rbd-5f46b79479-zlwj5 1/1 Running 0 3h12m NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/busybox-rbd-pvc Bound pvc-0e1d7acd-c676-4b10-9b08-cae9c4847426 5Gi RWO After hub recovery: ----------------- $for i in {b-sub-3,b-sub-4,b-app-3,b-app-4,cronjob-sub-2,job-sub-2,cronjob-app-2,job-app-2,new-app-2,new-sub-2}; do oc get pod,pvc -n $i; done No resources found in b-sub-3 namespace. No resources found in b-sub-4 namespace. No resources found in b-app-3 namespace. No resources found in b-app-4 namespace. No resources found in cronjob-sub-2 namespace. No resources found in job-sub-2 namespace. No resources found in cronjob-app-2 namespace. No resources found in job-app-2 namespace. No resources found in new-app-2 namespace. No resources found in new-sub-2 namespace.
Hi Benamar Does this fix cover upgrade scenario that we discussed in yesterday's meeting? If not, then what are the steps to cover it manually? Please let us know Harish
@hnallurv To update the namespace ManifestWork after upgrade, follow these steps: 0. Find where the application is running: ``` oc get drpc -A NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE busybox-sample busybox-drpc 68d c1 c2 Failover FailedOver ``` 1. Find the ManifestWork for the namespace ``` oc get manifestwork -n c2 | grep ns NAME AGE busybox-drpc-busybox-sample-ns-mw 69d ``` 2. Find the namespace ManifestWork for the application. It is named based on this format "%1-%2-%3-mw". The breakdown of that is as follows: - %1: Name of the application - %2: Namespace of the application - %3: the word 'ns' example: busybox-drpc-busybox-sample-ns-mw --> [busybox-drpc]-[busybox-sample]-ns-mw 2. Edit the ManifestWork ``` oc edit manifestwork -n c2 busybox-drpc-busybox-sample-ns-mw ``` 3. Add the following lable to the .spec.workload.manifests section ``` labels: cluster.open-cluster-management.io/backup: resource ``` 4. Here is an example: ``` apiVersion: work.open-cluster-management.io/v1 kind: ManifestWork metadata: annotations: drplacementcontrol.ramendr.openshift.io/drpc-name: busybox-drpc drplacementcontrol.ramendr.openshift.io/drpc-namespace: busybox-sample creationTimestamp: "2023-03-30T19:50:25Z" finalizers: - cluster.open-cluster-management.io/manifest-work-cleanup generation: 2 name: busybox-drpc-busybox-sample-ns-mw namespace: c2 resourceVersion: "910332" uid: 788ff2c3-4d2e-49dc-b222-61f581131866 spec: workload: manifests: - apiVersion: v1 kind: Namespace labels: cluster.open-cluster-management.io/backup: resource metadata: name: busybox-sample spec: {} status: {} ```
ACM issue https://issues.redhat.com/browse/ACM-5795 is fixed with 2.7.7
*** Bug 2222706 has been marked as a duplicate of this bug. ***
Tested versions: ---------------- OCP - 4.14.0-0.nightly-2023-10-08-220853 ODF - 4.14.0-146.stable ACM - 2.9.0-180 Steps performed: ----------------- 1. Configured 4.14 MetroDR setup with ACM 2.9.0 zone-a: c1, hub-active. zone-b: clu2, hub-passive 2. Deployed subscription and appset apps on both managed cluster(c1, c2) 3. Applied DR policy to apps and had apps on deployed, failedover and relocate states 4. Created backup 5. Brought down zone-a(c1, hub-active, ceph nodes) 6. Restored on hub-passive Observations: -------------- 1. Post restoring, had to manually import the c1 managed cluster.(using auto-import-secret) 2. After few minutes dr policy reached validated state. All applications were running and not cleaned up on managed clusters. 3. Openshift-storage ns and other app resources were intact With above observations moving the BZ to Verified state
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.14.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6832
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days