1957012 – Deleting the KubeDescheduler CR does not remove the corresponding deployment or configmap

Bug 1957012 - Deleting the KubeDescheduler CR does not remove the corresponding deployment or configmap

Summary: Deleting the KubeDescheduler CR does not remove the corresponding deployment ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-scheduler
Sub Component:
Version:	4.7
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Lucas Severo
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-05-04 20:39 UTC by Chad Scribner
Modified:	2022-08-10 10:36 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: resources incorrectly specifies APIVersion in its owner reference to the kubedescheduler CR Consequence: owner reference is invalid, affected resources will not get deleted when the kubedescheduler CR is Fix: Specify the right APIVersion in all owner references Result: All resources with an owner reference to the kubedescheduler CR are deleted after the CR gets deleted
Clone Of:
Environment:
Last Closed:	2022-08-10 10:36:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-kube-descheduler-operator pull 265	None	open	bug 1957012: set ownerReference correctly	2022-06-16 09:36:26 UTC
Github	openshift cluster-kube-descheduler-operator pull 266	None	open	bug 1957012: docs: few tweaks to readme	2022-06-16 09:47:46 UTC
Red Hat Product Errata	RHSA-2022:5069	None	None	None	2022-08-10 10:36:37 UTC

Description Chad Scribner 2021-05-04 20:39:16 UTC

Description of problem:
When deleting the KubeDescheduler CR, the Deployment and ConfigMap do not get deleted which results in the configuration from the CR continuing to run and rebalancing pods.

Version-Release number of selected component (if applicable):
OpenShift: 4.7.8
Kube Descheduler Operator: 4.7.0-202104142050.p0

How reproducible:
Always

Steps to Reproduce:
1. oc new-project test
2. oc create deployment testapp --image=registry.redhat.io/rhel8/httpd-24 --replicas=3
3. oc edit deploy/testapp (add the following block to the spec:)

```
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- testapp
topologyKey: kubernetes.io/hostname
weight: 100
```

3. Install the Descheduler: https://docs.openshift.com/container-platform/4.7/nodes/scheduling/nodes-descheduler.html#nodes-descheduler-installing_nodes-descheduler
4. Create a KubeDescheduler CR with the TopologyAndDuplicates profile and an interval of 30 seconds. At this point, the pods will get rebalanced if need be.
5. Delete the KubeDescheduler CR

Actual results:
The pods will continue to get rebalanced even though the CR has been deleted.

Expected results:
Deleting the CR will also remove the Deployment and ConfigMap which result in pods getting rebalanced even after removal of the CR.

Additional info:

Comment 1 Mike Dame 2021-05-05 20:52:58 UTC

In the operator, we do set the Descheduler cluster deployment to have an ownerref referring to the CR: https://github.com/openshift/cluster-kube-descheduler-operator/blob/8903d09/pkg/operator/target_config_reconciler.go#L270-L276

And in my own testing, the OwnerRef UID set on the Deployment does match the UID of the CR (see output below). But then, deleting the CR does not delete the deployment. I wonder if this is a limitation of ownerrefs with CRDs, or if there's some other way we need to configure it.

I also noticed that after deleting the CR, if you try to recreate it then the existing Deployment's ownerrefs do not get updated to the new CR's UID. However, this would be irrelevant if the Deployment was actually deleted when the CR was

(Note: We actually first discovered this in https://bugzilla.redhat.com/show_bug.cgi?id=1913821, but it got forgotten when focus shifted to documenting how to fully uninstall the Descheduler)


> $ oc get -o yaml kubedescheduler/cluster
> apiVersion: operator.openshift.io/v1
> kind: KubeDescheduler
> metadata:
>   creationTimestamp: "2021-05-05T13:59:54Z"
>   generation: 1
>   name: cluster
>   namespace: openshift-kube-descheduler-operator
>   resourceVersion: "38161"
>   uid: b2fe822a-736b-4a9f-9f73-5552087433a3
> spec:
>   deschedulingIntervalSeconds: 3600
>   logLevel: Normal
>   managementState: Managed
>   operatorLogLevel: Normal
>   profiles:
>   - AffinityAndTaints
> status:
>   generations:
>   - group: apps
>     hash: ""
>     lastGeneration: 1
>     name: cluster
>     namespace: openshift-kube-descheduler-operator
>     resource: deployments
>   readyReplicas: 0
> 
> $ oc get -o yaml deployment.apps/cluster
> apiVersion: apps/v1
> kind: Deployment
> metadata:
>   annotations:
>     deployment.kubernetes.io/revision: "1"
>     operator.openshift.io/pull-spec: quay.io/openshift/origin-descheduler:4.7
>   creationTimestamp: "2021-05-05T14:06:28Z"
>   generation: 1
>   labels:
>     app: descheduler
>   name: cluster
>   namespace: openshift-kube-descheduler-operator
>   ownerReferences:
>   - apiVersion: v1
>     kind: KubeDescheduler
>     name: cluster
>     uid: b2fe822a-736b-4a9f-9f73-5552087433a3
>   resourceVersion: "38170"
>   uid: 4909677a-1dc4-4030-8aa6-3d5ef2b845c1
> ...

Comment 2 Michal Fojtik 2021-06-04 21:29:07 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 3 Chad Scribner 2021-06-07 13:24:58 UTC

I believe this is still relevant as it will be very visible to customers and requires a non-intuitive workaround to fix.

Comment 4 Michal Fojtik 2021-06-07 13:30:47 UTC

The LifecycleStale keyword was removed because the needinfo? flag was reset and the bug got commented on recently.
The bug assignee was notified.

Comment 5 Michal Fojtik 2021-07-07 14:13:24 UTC

This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that.

Comment 8 RamaKasturi 2022-06-20 12:18:14 UTC

Verified bug with the payload below and i see that deployment and configmap for kubedescheduler CR has been moved when the same has been deleted.

[knarra@knarra ~]$ oc get csv -n openshift-kube-descheduler-operator
NAME                                                 DISPLAY                     VERSION               REPLACES   PHASE
clusterkubedescheduleroperator.4.11.0-202206171549   Kube Descheduler Operator   4.11.0-202206171549              Succeeded

[knarra@knarra ~]$ oc get pods -n openshift-kube-descheduler-operator
NAME                                    READY   STATUS    RESTARTS   AGE
descheduler-operator-58949895ff-jcwsh   1/1     Running   0          4m19s
[knarra@knarra ~]$ oc get deployment -n openshift-kube-descheduler-operator
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
descheduler-operator   1/1     1            1           4m29s
[knarra@knarra ~]$ oc get configmap -n openshift-kube-descheduler-operator
NAME                                               DATA   AGE
kube-root-ca.crt                                   1      5m21s
openshift-cluster-kube-descheduler-operator-lock   0      4m33s
openshift-service-ca.crt                           1      5m21s

Tried to reproduce the issue with 4.10 kubedescheduler operator and i was able to do so:
========================================================================================
[knarra@knarra ~]$ oc get pods -n openshift-kube-descheduler-operator
NAME                                    READY   STATUS    RESTARTS   AGE
cluster-5c5cb96c59-27tcc                1/1     Running   0          24s
descheduler-operator-7775565778-4gwdm   1/1     Running   0          8m37s
[knarra@knarra ~]$ oc get deployment -n openshift-kube-descheduler-operator
NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
cluster                1/1     1            1           39s
descheduler-operator   1/1     1            1           8m46s
[knarra@knarra ~]$ oc get configmap -n openshift-kube-descheduler-operator
NAME                                               DATA   AGE
cluster                                            1      61s
kube-root-ca.crt                                   1      9m49s
openshift-cluster-kube-descheduler-operator-lock   0      9m
openshift-service-ca.crt                           1      9m49s
[knarra@knarra ~]$ oc get csv -n openshift-kube-descheduler-operator
NAME                                                 DISPLAY                     VERSION               REPLACES   PHASE
clusterkubedescheduleroperator.4.10.0-202206010417   Kube Descheduler Operator   4.10.0-202206010417              Succeeded

Based on the above moving bug to verified state.

Comment 10 errata-xmlrpc 2022-08-10 10:36:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.