Bug 2216676 - [RDR][ACM-Tracker] Cleanup of primary cluster remains stuck for app-set when failover is performed
Summary: [RDR][ACM-Tracker] Cleanup of primary cluster remains stuck for app-set when ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: documentation
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ODF 4.14.0
Assignee: Karolin Seeger
QA Contact: Aman Agrawal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-22 08:03 UTC by Karolin Seeger
Modified: 2023-12-07 17:19 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-12-07 17:19:58 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker ACM-5200 0 None None None 2023-06-29 07:56:26 UTC

Description Karolin Seeger 2023-06-22 08:03:24 UTC
Bug to track same issue for MDR: https://bugzilla.redhat.com/show_bug.cgi?id=2185953


Description of problem (please be detailed as possible and provide log
snippests):
In continuation to this message here- https://bugzilla.redhat.com/show_bug.cgi?id=2184748#c3
The cluster was in the same state for 4days where workloads were running on C2. So did  all pre-checks and performed a failover operation from C2 to C1.
Now moving to steps to repro.

Version of all relevant components (if applicable):
ACM 2.7.2
ODF 4.13.0-121.stable

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Ran IOs (app-set based workloads) on C2 for a few days as mentioned above in the Description.
2. Shut down all master nodes of C2
3. Scale down rbd-mirror daemon pod on C1
4. Edit drpc yaml from hub and trigger failover to C1
5. Scale up rbd-mirror daemon pod on C1 when failover completes
6. Bring up the C2 master nodes after for a few hours (2-3 hrs)
7. Observe C2 and wait for cleanup to complete


Actual results: Cleanup of primary cluster remains stuck for app-set when failover is performed

C2-

VR and VRG got cleaned in this case but Pods/PVCs remain stuck forever

amagrawa:~$ oc get pods,pvc,vr,vrg
NAME                              READY   STATUS    RESTARTS   AGE
pod/busybox-41-6b687497df-25zdg   1/1     Running   0          4d20h
pod/busybox-42-5479f6d5dc-4xz5t   1/1     Running   0          4d20h
pod/busybox-43-6d57d9d898-lcf2c   1/1     Running   0          4d20h
pod/busybox-44-6985f98f44-sjtv6   1/1     Running   0          4d20h
pod/busybox-45-7879f49d7b-jh5fd   1/1     Running   0          4d20h
pod/busybox-46-54bc657fc4-s8v5w   1/1     Running   0          4d20h
pod/busybox-47-5bfdc6d579-qwscg   1/1     Running   0          4d20h
pod/busybox-48-58dd4fc4b4-wcp89   1/1     Running   0          4d20h
pod/busybox-49-799ddc584-hxm8w    1/1     Running   0          4d20h
pod/busybox-50-58588b9ffb-dxn2b   1/1     Running   0          4d20h
pod/busybox-51-54868dd48d-8q8hh   1/1     Running   0          4d20h
pod/busybox-52-5b64fb9cff-9g28m   1/1     Running   0          4d20h
pod/busybox-53-699dff5bd4-k5mqr   1/1     Running   0          4d20h
pod/busybox-54-788744468c-drwss   1/1     Running   0          4d20h
pod/busybox-55-6bc89678b4-4rckw   1/1     Running   0          4d20h
pod/busybox-56-db586d8c8-z4qzt    1/1     Running   0          4d20h
pod/busybox-57-759979888c-kx462   1/1     Running   0          4d20h
pod/busybox-58-84fb689c4f-bm6cp   1/1     Running   0          4d20h
pod/busybox-59-59b77d856c-jj5xq   1/1     Running   0          4d20h
pod/busybox-60-57d4ff68d-hq9cd    1/1     Running   0          4d20h

NAME                                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
persistentvolumeclaim/busybox-pvc-41   Bound    pvc-d0b72be5-22f5-45ba-bbf6-2281fdebefbf   42Gi       RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-42   Bound    pvc-9b9cf55a-a75c-4d30-97d7-4b9ff2722431   81Gi       RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-43   Bound    pvc-dcf4c325-adc5-48bd-8419-e09e6a787a39   28Gi       RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-44   Bound    pvc-e00a4ac9-e813-4a51-8c22-56b8731e4bb7   118Gi      RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-45   Bound    pvc-13f42be9-36c4-414f-b492-0c0340b29afa   19Gi       RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-46   Bound    pvc-03235a82-37e8-4d6c-ad48-470e8e98fdd7   129Gi      RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-47   Bound    pvc-c59917a7-1d32-46f6-a4b1-59855ea47070   43Gi       RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-48   Bound    pvc-5a4f302b-2cad-470f-9b8c-108150635fdf   57Gi       RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-49   Bound    pvc-e28d6929-9446-4b02-bd41-5a24f0a28d2d   89Gi       RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-50   Bound    pvc-8eedf9cb-e49b-487d-a4e1-97a41a30099c   124Gi      RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-51   Bound    pvc-f18cf24a-5b40-490c-994d-b12ed9600a45   95Gi       RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-52   Bound    pvc-5560146e-8134-4e2c-b5b6-368d5294f30c   129Gi      RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-53   Bound    pvc-dc442fe7-d45f-4d95-86d0-2139b71a5e04   51Gi       RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-54   Bound    pvc-36da406f-42c9-4761-8139-0131ea9d951b   30Gi       RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-55   Bound    pvc-ad09eb08-cac7-4902-ba2c-b22acc1c7586   102Gi      RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-56   Bound    pvc-d879c820-f6c2-4a3c-b7e8-2112b703e936   40Gi       RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-57   Bound    pvc-dfbd0f5f-4f7b-4700-80fc-58ae403abc42   146Gi      RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-58   Bound    pvc-86da9780-49c6-4bbf-b7de-053b9886568d   63Gi       RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-59   Bound    pvc-11f13a73-f21c-467f-b678-52e168471e66   118Gi      RWO            ocs-storagecluster-ceph-rbd   4d20h
persistentvolumeclaim/busybox-pvc-60   Bound    pvc-776748d4-ff0e-4a14-9598-31a9ef8019ab   25Gi       RWO            ocs-storagecluster-ceph-rbd   4d20h


No events seen on pods/pvcs running on C2

Expected results: Cleanup should complete


Additional info:

Comment 3 Karolin Seeger 2023-06-29 07:58:36 UTC
Documentation is available here and has been successfully tested for MDR: https://bugzilla.redhat.com/show_bug.cgi?id=2185953#c24.
Moving bug to ON_QA.

Comment 4 Aman Agrawal 2023-07-11 05:52:47 UTC
This bug verification is blocked due to- https://issues.redhat.com/browse/ACM-5796 as the issue with submariner connectivity is consistent.


Note You need to log in before you can comment on or make changes to this bug.