2213472 – [MDR][4.12.z clone] After zone failure(c1+h1 cluster) and hub recovery, apps on c2 cluster are cleaned up

Bug 2213472 - [MDR][4.12.z clone] After zone failure(c1+h1 cluster) and hub recovery, apps on c2 cluster are cleaned up

Summary: [MDR][4.12.z clone] After zone failure(c1+h1 cluster) and hub recovery, apps ...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	odf-dr
Sub Component:
Version:	4.12
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Benamar Mekhissi
QA Contact:	Shrivaibavi Raghaventhiran
Docs Contact:
URL:
Whiteboard:
Depends On:	2211643
Blocks:
TreeView+	depends on / blocked

Reported:	2023-06-08 09:03 UTC by Karolin Seeger
Modified:	2024-06-29 04:25 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-02-22 05:40:49 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	red-hat-storage ramen pull 113	0	None	Merged	Bug 2213472: Backup application namespace manifestwork	2023-06-09 12:49:44 UTC

Description Karolin Seeger 2023-06-08 09:03:47 UTC

4.12.z clone of https://bugzilla.redhat.com/show_bug.cgi?id=2211643

Description of problem (please be detailed as possible and provide log
snippests):
After shutting down zone hosting c1 and h1 cluster and performing hub recovery to h2. Apps(subscription and appset) located/deployed on c2 managed cluster is deleted automatically.

Drpc of apps before hub recovery:
---------------------------------
oc get drpc -A -o wide
NAMESPACE          NAME                             AGE     PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION          PEER READY
b-sub-1            b-sub-1-placement-1-drpc         23h     pbyregow-c1        pbyregow-c2       Relocate       Relocated      Completed     2023-05-31T11:53:34Z   2m5.010223347s    True
b-sub-2            b-sub-2-placement-1-drpc         23h     pbyregow-c1        pbyregow-c2       Relocate       Relocated      Completed     2023-05-31T11:53:46Z   1m31.99599474s    True
b-sub-3            b-sub-3-placement-1-drpc         23h     pbyregow-c2        pbyregow-c1       Relocate       Relocated      Completed     2023-05-31T13:48:24Z   2m16.995339752s   True
b-sub-4            b-sub-4-placement-1-drpc         23h     pbyregow-c2        pbyregow-c1       Relocate       Relocated      Completed     2023-05-31T13:48:39Z   18m7.137529589s   True
cronjob-sub-1      cronjob-sub-1-placement-1-drpc   23h     pbyregow-c1        pbyregow-c2       Relocate       Relocated      Completed     2023-05-31T11:53:59Z   1m59.004503184s   True
cronjob-sub-2      cronjob-sub-2-placement-1-drpc   23h     pbyregow-c2        pbyregow-c1       Relocate       Relocated      Completed     2023-05-31T13:48:59Z   2m13.949180869s   True
job-sub-1          job-sub-1-placement-1-drpc       23h     pbyregow-c1        pbyregow-c2       Relocate       Relocated      Completed     2023-05-31T11:54:34Z   1m40.973458756s   True
job-sub-2          job-sub-2-placement-1-drpc       23h     pbyregow-c2        pbyregow-c1       Relocate       Relocated      Completed     2023-05-31T13:49:16Z   2m24.759375791s   True
new-sub-1          new-sub-1-placement-1-drpc       3h11m   pbyregow-c1                                         Deployed       Completed     2023-06-01T05:58:49Z   2.032041553s      True
new-sub-2          new-sub-2-placement-1-drpc       3h11m   pbyregow-c2                                         Deployed       Completed     2023-06-01T05:58:38Z   19.043831186s     True
openshift-gitops   b-app-1-placement-drpc           23h     pbyregow-c1        pbyregow-c2       Relocate       Relocated      Completed     2023-05-31T11:53:20Z   1m41.015799768s   True
openshift-gitops   b-app-2-placement-drpc           23h     pbyregow-c1        pbyregow-c2       Relocate       Relocated      Completed     2023-05-31T11:53:24Z   5m0.009409233s    True
openshift-gitops   b-app-3-placement-drpc           23h     pbyregow-c2        pbyregow-c1       Relocate       Relocated      Completed     2023-05-31T13:48:10Z   5m30.927637297s   True
openshift-gitops   b-app-4-placement-drpc           23h     pbyregow-c2        pbyregow-c1       Relocate       Relocated      Completed     2023-05-31T13:48:15Z   4m55.943941384s   True
openshift-gitops   cronjob-app-1-placement-drpc     23h     pbyregow-c1        pbyregow-c2       Relocate       Relocated      Completed     2023-05-31T11:53:52Z   1m49.956000232s   True
openshift-gitops   cronjob-app-2-placement-drpc     23h     pbyregow-c2        pbyregow-c1       Relocate       Relocated      Completed     2023-05-31T13:48:50Z   1m53.3372577s     True
openshift-gitops   job-app-1-placement-drpc         23h     pbyregow-c1        pbyregow-c2       Relocate       Relocated      Completed     2023-05-31T11:54:05Z   1m57.943351562s   True
openshift-gitops   job-app-2-placement-drpc         23h     pbyregow-c2        pbyregow-c1       Relocate       Relocated      Completed     2023-05-31T13:49:07Z   2m3.812597752s    True
openshift-gitops   new-app-1-placement-drpc         3h11m   pbyregow-c1                                         Deployed       Completed     2023-06-01T05:58:37Z   29.01439644s      True
openshift-gitops   new-app-2-placement-drpc         3h10m   pbyregow-c2                                         Deployed       Completed     2023-06-01T05:59:14Z   1.013864968s      True

drpc of apps after recovery:
----------------------------
oc get drpc -A -o wide
NAMESPACE          NAME                             AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION      START TIME             DURATION       PEER READY
b-sub-1            b-sub-1-placement-1-drpc         59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
b-sub-2            b-sub-2-placement-1-drpc         59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
b-sub-3            b-sub-3-placement-1-drpc         59m   pbyregow-c2        pbyregow-c1       Relocate       Relocated      Cleaning Up                                            True
b-sub-4            b-sub-4-placement-1-drpc         59m   pbyregow-c2        pbyregow-c1       Relocate       Relocated      Cleaning Up                                            True
cronjob-sub-1      cronjob-sub-1-placement-1-drpc   59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
cronjob-sub-2      cronjob-sub-2-placement-1-drpc   59m   pbyregow-c2        pbyregow-c1       Relocate       Relocated      Cleaning Up                                            True
job-sub-1          job-sub-1-placement-1-drpc       59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
job-sub-2          job-sub-2-placement-1-drpc       59m   pbyregow-c2        pbyregow-c1       Relocate       Relocated      Cleaning Up                                            True
new-sub-1          new-sub-1-placement-1-drpc       59m   pbyregow-c1                                         Deployed       UpdatingPlRule   2023-06-01T10:21:16Z                  True
new-sub-2          new-sub-2-placement-1-drpc       59m   pbyregow-c2                                         Deployed       Completed        2023-06-01T09:40:40Z   116.057682ms   True
openshift-gitops   b-app-1-placement-drpc           59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   b-app-2-placement-drpc           59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   b-app-3-placement-drpc           59m   pbyregow-c2        pbyregow-c1       Relocate       Relocated      Cleaning Up                                            True
openshift-gitops   b-app-4-placement-drpc           59m   pbyregow-c2        pbyregow-c1       Relocate       Relocated      Cleaning Up                                            True
openshift-gitops   cronjob-app-1-placement-drpc     59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   cronjob-app-2-placement-drpc     59m   pbyregow-c2        pbyregow-c1       Relocate       Relocated      Cleaning Up      2023-06-01T09:40:39Z                  True
openshift-gitops   job-app-1-placement-drpc         59m   pbyregow-c1        pbyregow-c2       Relocate                                                                             Unknown
openshift-gitops   job-app-2-placement-drpc         59m   pbyregow-c2        pbyregow-c1       Relocate       Relocated      Cleaning Up                                            True
openshift-gitops   new-app-1-placement-drpc         59m   pbyregow-c1                                         Deployed       UpdatingPlRule   2023-06-01T10:21:20Z                  True
openshift-gitops   new-app-2-placement-drpc         59m   pbyregow-c2                                         Deployed       Completed        2023-06-01T09:40:41Z   376.606272ms   True

Version of all relevant components (if applicable):
ocp: 4.13.0-0.nightly-2023-05-30-074322
odf/mco: 4.13.0-207
ACM: 2.7.4

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes, apps on running managed cluster should not be deleted

Is there any workaround available to the best of your knowledge?
no

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
4

Can this issue reproducible?
not sure

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:
yes

Steps to Reproduce:
1. Create 4 OCP clusters such that 2 hubs and 2 managed clusters. And one stretched RHCS cluster.
   Deploy cluster in such a way that
	zone a: arbiter ceph node
	zone b: c1, h1, 3 ceph nodes
	zone c: c2, h2, 3 ceph nodes
2. Configure MDR and deploy 20 applications on each managed clusters
3. Initiate a backup process, such that the active and passive hubs are in sync
4. Made zone b down, ie c1, h1 and 3 ceph nodes
5. Initiate the restore process on h2
6. Restore succeeded in new-hub, dr policy on h2 in validated state
7. Check applications on c2 cluster


Actual results:
Applications present on c2 managed cluster are deleted after hub recovery

Expected results:
Applications on c2 managed cluster should present and in running state after hub recovery.


Additional info:
Had validated the status of apps on c2 some time before hub recovery:
------------------------------------------------------------------------

$for i in {b-sub-3,b-sub-4,b-app-3,b-app-4,cronjob-sub-2,job-sub-2,cronjob-app-2,job-app-2,new-app-2,new-sub-2}; do oc get pod,pvc -n $i; done
NAME                               READY   STATUS    RESTARTS   AGE
pod/busybox-rbd-5f46b79479-h8pdn   1/1     Running   0          19h

NAME                                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
persistentvolumeclaim/busybox-rbd-pvc   Bound    pvc-3800a556-9776-4216-b8b1-c48b4989308e   5Gi        RWO            ocs-external-storagecluster-ceph-rbd   19h
NAME                                  READY   STATUS    RESTARTS   AGE
pod/busybox-cephfs-7bd55bcb67-9qn6c   1/1     Running   0          19h

NAME                                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                         AGE
persistentvolumeclaim/busybox-cephfs-pvc   Bound    pvc-8fe28974-f5ae-4848-95d9-763a9e7457e5   5Gi        RWO            ocs-external-storagecluster-cephfs   19h
NAME                               READY   STATUS    RESTARTS   AGE
pod/busybox-rbd-5f46b79479-n5jj8   1/1     Running   0          19h

NAME                                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
persistentvolumeclaim/busybox-rbd-pvc   Bound    pvc-27f7ab03-88c8-47e2-b362-e6886ae4ad22   5Gi        RWO            ocs-external-storagecluster-ceph-rbd   19h
NAME                                  READY   STATUS    RESTARTS   AGE
pod/busybox-cephfs-7bd55bcb67-282cz   1/1     Running   0          19h

NAME                                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                         AGE
persistentvolumeclaim/busybox-cephfs-pvc   Bound    pvc-28174687-31a2-4b65-947a-368655475774   5Gi        RWO            ocs-external-storagecluster-cephfs   19h
NAME                                        READY   STATUS      RESTARTS   AGE
pod/hello-world-job-cephfs-28093508-p7hkv   0/1     Completed   0          2m14s
pod/hello-world-job-cephfs-28093509-skd98   0/1     Completed   0          74s
pod/hello-world-job-cephfs-28093510-kgv9w   0/1     Completed   0          14s
pod/hello-world-job-rbd-28093508-wdxrn      0/1     Completed   0          2m14s
pod/hello-world-job-rbd-28093509-vzsg6      0/1     Completed   0          74s
pod/hello-world-job-rbd-28093510-xjkxx      0/1     Completed   0          14s

NAME                                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
persistentvolumeclaim/hello-world-cephfs   Bound    pvc-c9fe7e99-cd5a-4777-8711-42dcdeacb3c2   10Gi       RWO            ocs-external-storagecluster-cephfs     19h
persistentvolumeclaim/hello-world-rbd      Bound    pvc-7c0ee582-284c-4397-a2aa-57c7cf769568   10Gi       RWO            ocs-external-storagecluster-ceph-rbd   19h
NAME                         READY   STATUS    RESTARTS   AGE
pod/countdown-cephfs-sdvg5   1/1     Running   0          19h
pod/countdown-cephfs-sztzq   1/1     Running   0          19h
pod/countdown-cephfs-wctxf   1/1     Running   0          19h
pod/countdown-rbd-2cqd6      1/1     Running   0          19h
pod/countdown-rbd-g5mnl      1/1     Running   0          19h
pod/countdown-rbd-jbflw      1/1     Running   0          19h

NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
persistentvolumeclaim/job-cephfspvc   Bound    pvc-2e71aa9f-5eb2-4098-80e0-6fe602c5a0b4   5Gi        RWO            ocs-external-storagecluster-cephfs     19h
persistentvolumeclaim/job-rbdpvc      Bound    pvc-8b3af01c-b138-4882-94bd-742b7656c699   5Gi        RWO            ocs-external-storagecluster-ceph-rbd   19h
NAME                                        READY   STATUS      RESTARTS   AGE
pod/hello-world-job-cephfs-28093508-9zs4k   0/1     Completed   0          2m17s
pod/hello-world-job-cephfs-28093509-lvcp2   0/1     Completed   0          77s
pod/hello-world-job-cephfs-28093510-5fr6p   0/1     Completed   0          17s
pod/hello-world-job-rbd-28093508-v6bql      0/1     Completed   0          2m17s
pod/hello-world-job-rbd-28093509-kvgtr      0/1     Completed   0          77s
pod/hello-world-job-rbd-28093510-xsskr      0/1     Completed   0          17s

NAME                                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
persistentvolumeclaim/hello-world-cephfs   Bound    pvc-c840a012-b91c-4be0-aeef-bbe6c976e52e   10Gi       RWO            ocs-external-storagecluster-cephfs     19h
persistentvolumeclaim/hello-world-rbd      Bound    pvc-e9f70091-2bad-4f66-90ac-ac09927f9aa7   10Gi       RWO            ocs-external-storagecluster-ceph-rbd   19h
NAME                         READY   STATUS    RESTARTS   AGE
pod/countdown-cephfs-l2wqv   1/1     Running   0          19h
pod/countdown-cephfs-mxxct   1/1     Running   0          19h
pod/countdown-cephfs-wzt87   1/1     Running   0          19h
pod/countdown-rbd-89hb6      1/1     Running   0          19h
pod/countdown-rbd-gghfs      1/1     Running   0          19h
pod/countdown-rbd-l6h5v      1/1     Running   0          19h

NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
persistentvolumeclaim/job-cephfspvc   Bound    pvc-f533982a-9613-4aad-93f1-7ae1be45d4b3   5Gi        RWO            ocs-external-storagecluster-cephfs     19h
persistentvolumeclaim/job-rbdpvc      Bound    pvc-e463888c-04e4-4278-969c-9bbea4148294   5Gi        RWO            ocs-external-storagecluster-ceph-rbd   19h
NAME                               READY   STATUS    RESTARTS   AGE
pod/busybox-rbd-5f46b79479-52mht   1/1     Running   0          3h12m

NAME                                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
persistentvolumeclaim/busybox-rbd-pvc   Bound    pvc-37dc7b45-96b8-430d-b1c6-44b72b489805   5Gi        RWO            ocs-external-storagecluster-ceph-rbd   3h12m
NAME                               READY   STATUS    RESTARTS   AGE
pod/busybox-rbd-5f46b79479-zlwj5   1/1     Running   0          3h12m

NAME                                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
persistentvolumeclaim/busybox-rbd-pvc   Bound    pvc-0e1d7acd-c676-4b10-9b08-cae9c4847426   5Gi        RWO 

After hub recovery:
-----------------  
$for i in {b-sub-3,b-sub-4,b-app-3,b-app-4,cronjob-sub-2,job-sub-2,cronjob-app-2,job-app-2,new-app-2,new-sub-2}; do oc get pod,pvc -n $i; done
No resources found in b-sub-3 namespace.
No resources found in b-sub-4 namespace.
No resources found in b-app-3 namespace.
No resources found in b-app-4 namespace.
No resources found in cronjob-sub-2 namespace.
No resources found in job-sub-2 namespace.
No resources found in cronjob-app-2 namespace.
No resources found in job-app-2 namespace.
No resources found in new-app-2 namespace.
No resources found in new-sub-2 namespace.

Comment 10 Karolin Seeger 2023-06-14 11:59:42 UTC

Bug has been excluded of 4.12.4, because of a CVP issue.

Comment 15 Karolin Seeger 2023-06-27 14:24:23 UTC

(In reply to Karolin Seeger from comment #10)
> Bug has been excluded of 4.12.4, because of a CVP issue.

This fix did not cause the CVP issue, patch to be included in 4.12.5.

Comment 16 krishnaram Karthick 2023-06-28 06:07:35 UTC

Bug still in assigned state at the time of 4.12.5 finalization. Moving the bug to 4.12.6.

Comment 24 krishnaram Karthick 2023-09-28 07:21:01 UTC

Verification of this bug at 4.14 is still not complete. 
Moving the bug to 4.12.10 release for verification.

Comment 25 krishnaram Karthick 2023-10-30 09:42:33 UTC

We also need fix for https://issues.redhat.com/browse/ACM-7115 to verify this bug and the fix is not backported on to ACM 2.7 z streams. 
Moving the bug out to 4.12.11

Comment 26 krishnaram Karthick 2024-01-02 14:01:59 UTC

4.12.11 content finalization was complete and this fix was not part of it. 
moving the bug to next z stream.

Comment 28 krishnaram Karthick 2024-02-15 11:48:10 UTC

Harish - Given that hub recovery won't be supported in z streams for MDR, Can we close this bug?

Comment 29 krishnaram Karthick 2024-02-22 05:40:49 UTC

(In reply to krishnaram Karthick from comment #28)
> Harish - Given that hub recovery won't be supported in z streams for MDR,
> Can we close this bug?

Closing this bug, we can reopen the bug if there is an ask from PM.

Comment 30 Red Hat Bugzilla 2024-06-29 04:25:06 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.