+++ This bug was initially created as a clone of Bug #2151493 +++ --- Additional comment from Aman Agrawal on 2023-04-05 06:33:05 UTC --- Hi Shyam/Madhu, Yes I am hitting this bug during 4.13 testing, and it remains a 4.13 blocker as we have now reproduced it. I reported it here https://bugzilla.redhat.com/show_bug.cgi?id=2160034#c62 thinking it's a different issue, but everything looks good at RBD level. Pls check the bug and following comments on it for further updates. Reverting needinfo and pls note that we have a live cluster in the same shape for debugging. So we won't be able to work on reproducing this bug again in the future. All the logs are attached to the same comment. I am providing C1 details where the cleanup has stuck. C1- http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/amagrawa-c1/amagrawa-c1_20230327T145717/openshift-cluster-dir/auth/kubeconfig Web Console: https://console-openshift-console.apps.amagrawa-c1.qe.rh-ocs.com Login: kubeadmin Password: Q8fUJ-SfpZc-5Uf3b-HfaEL --- Additional comment from Benamar Mekhissi on 2023-04-05 14:55:04 UTC --- It looks like the ApplicationSet operator didn't clean up the App when requested. The workload remained in the old primary waiting for the App to be deleted. The PlacementDecision was updated as expected; ``` oc get placementdecision -n openshift-gitops admin-placement-decision-1 -o yaml apiVersion: cluster.open-cluster-management.io/v1beta1 kind: PlacementDecision metadata: creationTimestamp: "2023-04-03T12:28:03Z" generation: 1 labels: cluster.open-cluster-management.io/placement: admin-placement name: admin-placement-decision-1 namespace: openshift-gitops ownerReferences: - apiVersion: cluster.open-cluster-management.io/v1beta1 blockOwnerDeletion: true controller: true kind: Placement name: admin-placement uid: 1dd4d1f0-932e-4690-8eba-8bcd88e0b290 resourceVersion: "14326299" uid: b4517c31-b35b-4c45-be2d-3e8ba54f67a9 status: decisions: - clusterName: amagrawa-c2 reason: "" ``` --- Additional comment from Benamar Mekhissi on 2023-04-05 14:59:31 UTC --- DRPC waiting for cleanup ``` oc get drpc -A -o wide NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY openshift-gitops admin-placement-drpc 2d amagrawa-c1 amagrawa-c2 Failover FailedOver Cleaning Up 2023-04-03T19:42:34Z True ``` The App still running on the old primary (C1) ``` oc get pods -n app-busybox-3 NAMESPACE NAME READY STATUS RESTARTS AGE app-busybox-3 busybox-41-6b687497df-92hg8 1/1 Running 1 2d2h app-busybox-3 busybox-42-5479f6d5dc-f5tlp 1/1 Running 1 2d2h app-busybox-3 busybox-43-6d57d9d898-n869f 1/1 Running 1 2d2h app-busybox-3 busybox-44-6985f98f44-dg7rg 1/1 Running 1 2d2h app-busybox-3 busybox-45-7879f49d7b-qrv5d 1/1 Running 1 2d2h app-busybox-3 busybox-46-54bc657fc4-fxjnq 1/1 Running 1 2d2h app-busybox-3 busybox-47-5bfdc6d579-dhjrk 1/1 Running 1 2d2h app-busybox-3 busybox-48-58dd4fc4b4-cqdcb 1/1 Running 1 2d2h app-busybox-3 busybox-49-799ddc584-hzbg9 1/1 Running 1 2d2h app-busybox-3 busybox-50-58588b9ffb-qjvv5 1/1 Running 1 2d2h app-busybox-3 busybox-51-54868dd48d-brxlw 1/1 Running 1 2d2h app-busybox-3 busybox-52-5b64fb9cff-cwkz9 1/1 Running 1 2d2h app-busybox-3 busybox-53-699dff5bd4-cr8gb 1/1 Running 1 2d2h app-busybox-3 busybox-54-788744468c-j8smz 1/1 Running 1 2d2h app-busybox-3 busybox-55-6bc89678b4-mmhfv 1/1 Running 1 2d2h app-busybox-3 busybox-56-db586d8c8-r6bfh 1/1 Running 1 2d2h app-busybox-3 busybox-57-759979888c-g2cw6 1/1 Running 1 2d2h app-busybox-3 busybox-58-84fb689c4f-6r7pm 1/1 Running 1 2d2h app-busybox-3 busybox-59-59b77d856c-4jtfd 1/1 Running 1 2d2h ``` The ApplicationSet status show an error: ``` status: conditions: - lastTransitionTime: "2023-04-03T12:33:30Z" message: Successfully generated parameters for all Applications reason: ApplicationSetUpToDate status: "False" type: ErrorOccurred - lastTransitionTime: "2023-04-03T12:33:30Z" message: Successfully generated parameters for all Applications reason: ParametersGenerated status: "True" type: ParametersGenerated - lastTransitionTime: "2023-04-03T12:33:30Z" message: ApplicationSet up to date reason: ApplicationSetUpToDate status: "True" type: ResourcesUpToDate ``` Next is to check the ApplicationSet/Application operators logs to figure out what happens...
Cannot be reproduced -> moving out of 4.13
Benamar, can it be looked with reference to https://bugzilla.redhat.com/show_bug.cgi?id=2185953#c5