Bug 2184748 - [RDR] Cleanup of primary cluster remains stuck forever after Failover operation
Summary: [RDR] Cleanup of primary cluster remains stuck forever after Failover operation
Keywords:
Status: CLOSED DUPLICATE of bug 2185953
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-dr
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Karolin Seeger
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-05 15:16 UTC by Benamar Mekhissi
Modified: 2023-08-09 17:00 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2151493
Environment:
Last Closed: 2023-05-03 14:47:17 UTC
Embargoed:


Attachments (Terms of Use)

Description Benamar Mekhissi 2023-04-05 15:16:39 UTC
+++ This bug was initially created as a clone of Bug #2151493 +++

--- Additional comment from Aman Agrawal on 2023-04-05 06:33:05 UTC ---

Hi Shyam/Madhu,

Yes I am hitting this bug during 4.13 testing, and it remains a 4.13 blocker as we have now reproduced it.

I reported it here https://bugzilla.redhat.com/show_bug.cgi?id=2160034#c62 thinking it's a different issue, but everything looks good at RBD level.
Pls check the bug and following comments on it for further updates.

Reverting needinfo and pls note that we have a live cluster in the same shape for debugging. So we won't be able to work on reproducing this bug again in the future.

All the logs are attached to the same comment.

I am providing C1 details where the cleanup has stuck.

C1-  
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/amagrawa-c1/amagrawa-c1_20230327T145717/openshift-cluster-dir/auth/kubeconfig
Web Console: https://console-openshift-console.apps.amagrawa-c1.qe.rh-ocs.com
Login: kubeadmin
Password: Q8fUJ-SfpZc-5Uf3b-HfaEL

--- Additional comment from Benamar Mekhissi on 2023-04-05 14:55:04 UTC ---

It looks like the ApplicationSet operator didn't clean up the App when requested. The workload remained in the old primary waiting for the App to be deleted.
The PlacementDecision was updated as expected;
```
oc get placementdecision -n openshift-gitops      admin-placement-decision-1 -o yaml
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: PlacementDecision
metadata:
  creationTimestamp: "2023-04-03T12:28:03Z"
  generation: 1
  labels:
    cluster.open-cluster-management.io/placement: admin-placement
  name: admin-placement-decision-1
  namespace: openshift-gitops
  ownerReferences:
  - apiVersion: cluster.open-cluster-management.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Placement
    name: admin-placement
    uid: 1dd4d1f0-932e-4690-8eba-8bcd88e0b290
  resourceVersion: "14326299"
  uid: b4517c31-b35b-4c45-be2d-3e8ba54f67a9
status:
  decisions:
  - clusterName: amagrawa-c2
    reason: ""

```

--- Additional comment from Benamar Mekhissi on 2023-04-05 14:59:31 UTC ---

DRPC waiting for cleanup
```
oc get drpc -A -o wide                                                                  
NAMESPACE          NAME                   AGE   PREFERREDCLUSTER   FAILOVERCLUSTER   DESIREDSTATE   CURRENTSTATE   PROGRESSION   START TIME             DURATION   PEER READY
openshift-gitops   admin-placement-drpc   2d    amagrawa-c1        amagrawa-c2       Failover       FailedOver     Cleaning Up   2023-04-03T19:42:34Z              True
```

The App still running on the old primary (C1)
```
oc get pods -n app-busybox-3
NAMESPACE                                          NAME                                                              READY   STATUS                   RESTARTS       AGE
app-busybox-3                                      busybox-41-6b687497df-92hg8                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-42-5479f6d5dc-f5tlp                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-43-6d57d9d898-n869f                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-44-6985f98f44-dg7rg                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-45-7879f49d7b-qrv5d                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-46-54bc657fc4-fxjnq                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-47-5bfdc6d579-dhjrk                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-48-58dd4fc4b4-cqdcb                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-49-799ddc584-hzbg9                                        1/1     Running                  1              2d2h
app-busybox-3                                      busybox-50-58588b9ffb-qjvv5                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-51-54868dd48d-brxlw                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-52-5b64fb9cff-cwkz9                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-53-699dff5bd4-cr8gb                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-54-788744468c-j8smz                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-55-6bc89678b4-mmhfv                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-56-db586d8c8-r6bfh                                        1/1     Running                  1              2d2h
app-busybox-3                                      busybox-57-759979888c-g2cw6                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-58-84fb689c4f-6r7pm                                       1/1     Running                  1              2d2h
app-busybox-3                                      busybox-59-59b77d856c-4jtfd                                       1/1     Running                  1              2d2h
```

The ApplicationSet status show an error:
```
status:
    conditions:
    - lastTransitionTime: "2023-04-03T12:33:30Z"
      message: Successfully generated parameters for all Applications
      reason: ApplicationSetUpToDate
      status: "False"
      type: ErrorOccurred
    - lastTransitionTime: "2023-04-03T12:33:30Z"
      message: Successfully generated parameters for all Applications
      reason: ParametersGenerated
      status: "True"
      type: ParametersGenerated
    - lastTransitionTime: "2023-04-03T12:33:30Z"
      message: ApplicationSet up to date
      reason: ApplicationSetUpToDate
      status: "True"
      type: ResourcesUpToDate
```

Next is to check the ApplicationSet/Application operators logs to figure out what happens...

Comment 5 Karolin Seeger 2023-04-16 12:35:03 UTC
Cannot be reproduced -> moving out of 4.13

Comment 8 Mudit Agarwal 2023-05-03 07:35:21 UTC
Benamar, can it be looked with reference to https://bugzilla.redhat.com/show_bug.cgi?id=2185953#c5


Note You need to log in before you can comment on or make changes to this bug.