Description of problem: When deleting the KubeDescheduler CR, the Deployment and ConfigMap do not get deleted which results in the configuration from the CR continuing to run and rebalancing pods. Version-Release number of selected component (if applicable): OpenShift: 4.7.8 Kube Descheduler Operator: 4.7.0-202104142050.p0 How reproducible: Always Steps to Reproduce: 1. oc new-project test 2. oc create deployment testapp --image=registry.redhat.io/rhel8/httpd-24 --replicas=3 3. oc edit deploy/testapp (add the following block to the spec:) ``` affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - testapp topologyKey: kubernetes.io/hostname weight: 100 ``` 3. Install the Descheduler: https://docs.openshift.com/container-platform/4.7/nodes/scheduling/nodes-descheduler.html#nodes-descheduler-installing_nodes-descheduler 4. Create a KubeDescheduler CR with the TopologyAndDuplicates profile and an interval of 30 seconds. At this point, the pods will get rebalanced if need be. 5. Delete the KubeDescheduler CR Actual results: The pods will continue to get rebalanced even though the CR has been deleted. Expected results: Deleting the CR will also remove the Deployment and ConfigMap which result in pods getting rebalanced even after removal of the CR. Additional info:
In the operator, we do set the Descheduler cluster deployment to have an ownerref referring to the CR: https://github.com/openshift/cluster-kube-descheduler-operator/blob/8903d09/pkg/operator/target_config_reconciler.go#L270-L276 And in my own testing, the OwnerRef UID set on the Deployment does match the UID of the CR (see output below). But then, deleting the CR does not delete the deployment. I wonder if this is a limitation of ownerrefs with CRDs, or if there's some other way we need to configure it. I also noticed that after deleting the CR, if you try to recreate it then the existing Deployment's ownerrefs do not get updated to the new CR's UID. However, this would be irrelevant if the Deployment was actually deleted when the CR was (Note: We actually first discovered this in https://bugzilla.redhat.com/show_bug.cgi?id=1913821, but it got forgotten when focus shifted to documenting how to fully uninstall the Descheduler) > $ oc get -o yaml kubedescheduler/cluster > apiVersion: operator.openshift.io/v1 > kind: KubeDescheduler > metadata: > creationTimestamp: "2021-05-05T13:59:54Z" > generation: 1 > name: cluster > namespace: openshift-kube-descheduler-operator > resourceVersion: "38161" > uid: b2fe822a-736b-4a9f-9f73-5552087433a3 > spec: > deschedulingIntervalSeconds: 3600 > logLevel: Normal > managementState: Managed > operatorLogLevel: Normal > profiles: > - AffinityAndTaints > status: > generations: > - group: apps > hash: "" > lastGeneration: 1 > name: cluster > namespace: openshift-kube-descheduler-operator > resource: deployments > readyReplicas: 0 > > $ oc get -o yaml deployment.apps/cluster > apiVersion: apps/v1 > kind: Deployment > metadata: > annotations: > deployment.kubernetes.io/revision: "1" > operator.openshift.io/pull-spec: quay.io/openshift/origin-descheduler:4.7 > creationTimestamp: "2021-05-05T14:06:28Z" > generation: 1 > labels: > app: descheduler > name: cluster > namespace: openshift-kube-descheduler-operator > ownerReferences: > - apiVersion: v1 > kind: KubeDescheduler > name: cluster > uid: b2fe822a-736b-4a9f-9f73-5552087433a3 > resourceVersion: "38170" > uid: 4909677a-1dc4-4030-8aa6-3d5ef2b845c1 > ...
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.
I believe this is still relevant as it will be very visible to customers and requires a non-intuitive workaround to fix.
The LifecycleStale keyword was removed because the needinfo? flag was reset and the bug got commented on recently. The bug assignee was notified.
Verified bug with the payload below and i see that deployment and configmap for kubedescheduler CR has been moved when the same has been deleted. [knarra@knarra ~]$ oc get csv -n openshift-kube-descheduler-operator NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.11.0-202206171549 Kube Descheduler Operator 4.11.0-202206171549 Succeeded [knarra@knarra ~]$ oc get pods -n openshift-kube-descheduler-operator NAME READY STATUS RESTARTS AGE descheduler-operator-58949895ff-jcwsh 1/1 Running 0 4m19s [knarra@knarra ~]$ oc get deployment -n openshift-kube-descheduler-operator NAME READY UP-TO-DATE AVAILABLE AGE descheduler-operator 1/1 1 1 4m29s [knarra@knarra ~]$ oc get configmap -n openshift-kube-descheduler-operator NAME DATA AGE kube-root-ca.crt 1 5m21s openshift-cluster-kube-descheduler-operator-lock 0 4m33s openshift-service-ca.crt 1 5m21s Tried to reproduce the issue with 4.10 kubedescheduler operator and i was able to do so: ======================================================================================== [knarra@knarra ~]$ oc get pods -n openshift-kube-descheduler-operator NAME READY STATUS RESTARTS AGE cluster-5c5cb96c59-27tcc 1/1 Running 0 24s descheduler-operator-7775565778-4gwdm 1/1 Running 0 8m37s [knarra@knarra ~]$ oc get deployment -n openshift-kube-descheduler-operator NAME READY UP-TO-DATE AVAILABLE AGE cluster 1/1 1 1 39s descheduler-operator 1/1 1 1 8m46s [knarra@knarra ~]$ oc get configmap -n openshift-kube-descheduler-operator NAME DATA AGE cluster 1 61s kube-root-ca.crt 1 9m49s openshift-cluster-kube-descheduler-operator-lock 0 9m openshift-service-ca.crt 1 9m49s [knarra@knarra ~]$ oc get csv -n openshift-kube-descheduler-operator NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.10.0-202206010417 Kube Descheduler Operator 4.10.0-202206010417 Succeeded Based on the above moving bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069