Bug 1915416 - [Descheduler] descheduler evicts pod which does not have any ownerRef or descheduler evict annotation
Summary: [Descheduler] descheduler evicts pod which does not have any ownerRef or desc...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.7
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Mike Dame
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-12 15:56 UTC by RamaKasturi
Modified: 2021-02-24 15:52 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:52:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift descheduler pull 52 0 None closed Bug 1915416: Fix TopologySpread bug that evicts non-evictable pods 2021-01-19 05:30:34 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:52:45 UTC

Description RamaKasturi 2021-01-12 15:56:42 UTC
Description of problem:
Descheduler evicts pod which does not have any ownerRef or annotation http://descheduler.alpha.kubernetes.io/evict in it's spec

Version-Release number of selected component (if applicable):
[knarra@knarra openshift-client-linux-4.7.0-0.nightly-2021-01-10-070949]$ ./oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-fc.2   True        False         9h      Cluster version is 4.7.0-fc.2


How reproducible:
Always

Steps to Reproduce:
1. Install descheduler on 4.7 cluster
2. label worker nodes using the commands below
oc label node1 knarra-zone=zoneA
oc label node2 knarra-zone=zoneB
oc label node3 knarra-zone=zonec
3. Now cordon all worker nodes except the first one
4. create pod using the yaml file below
[knarra@knarra openshift-client-linux-4.7.0-0.nightly-2021-01-10-070949]$ cat /tmp/constrained-pod.yaml 
kind: Pod
apiVersion: v1
metadata:
  name: mypod-constrained
  labels:
    foo: bar
spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: knarra-zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        foo: bar
  containers:
  - name: pause
    image: quay.io/openshifttest/hello-openshift@sha256:aaea76ff622d2f8bcb32e538e7b3cd0ef6d291953f3e7c9f556c1ba5baf47e2e
5. create two more pods on node1 using the yaml file below
[knarra@knarra openshift-client-linux-4.7.0-0.nightly-2021-01-10-070949]$ cat /tmp/demo-pod.yaml 
kind: Pod
apiVersion: v1
metadata:
  generateName: mypod
  labels:
    foo: bar
spec:
  containers:
  - name: pause
    image: quay.io/openshifttest/hello-openshift@sha256:aaea76ff622d2f8bcb32e538e7b3cd0ef6d291953f3e7c9f556c1ba5baf47e2e
6. Now cordon node1 and uncordon node2.
7. create a pod on node2 using the yaml file in step5
8. Now edit kubedescheduler cluster and change IntervalSeconds to 60.

Actual results:
I see that the pod with TopologyPodConstraints set will be evicted
[knarra@knarra openshift-client-linux-4.7.0-0.nightly-2021-01-10-070949]$ ./oc logs -f cluster-65d59dc468-2hbhk -n openshift-kube-descheduler-operator
I0112 14:15:15.486932       1 node.go:46] "Node lister returned empty list, now fetch directly"
I0112 14:15:15.579853       1 topologyspreadconstraint.go:109] "Processing namespaces for topology spread constraints"
I0112 14:15:15.771832       1 evictions.go:117] "Evicted pod" pod="default/mypod-constrained" reason=" (PodTopologySpread)"


Expected results:
Pods created neither has ownerRef or annotations and they should not be evicted.

Additional info:

Comment 1 Mike Dame 2021-01-12 20:29:31 UTC
Upstream PR to fix this: https://github.com/kubernetes-sigs/descheduler/pull/484

Comment 3 RamaKasturi 2021-01-19 12:30:08 UTC
Verified bug with the payload below and i see that when the annotation "descheduler.alpha.kubernetes.io/evict": "" is not present, pods does not get evicted and when annotation is present pods get evicted to maintain right TopologyConstraint.

[knarra@knarra openshift-misc]$ oc get csv -n openshift-kube-descheduler-operator
NAME                                                   DISPLAY                     VERSION                 REPLACES   PHASE
clusterkubedescheduleroperator.4.7.0-202101160343.p0   Kube Descheduler Operator   4.7.0-202101160343.p0              Succeeded

[knarra@knarra openshift-misc]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-01-18-000316   True        False         5h19m   Cluster version is 4.7.0-0.nightly-2021-01-18-000316

Descheduler log when the descheduler annotations are not present for the pods:
===========================================================================
[knarra@knarra verification-tests]$ oc logs -f cluster-68fd5d5976-xllt2 -n openshift-kube-descheduler-operator
I0119 12:16:07.974680       1 node.go:46] "Node lister returned empty list, now fetch directly"
I0119 12:16:08.169078       1 topologyspreadconstraint.go:109] "Processing namespaces for topology spread constraints"
I0119 12:16:08.376539       1 duplicates.go:94] "Processing node" node="compute-0"
I0119 12:16:08.457438       1 duplicates.go:94] "Processing node" node="compute-1"
I0119 12:16:08.482481       1 duplicates.go:94] "Processing node" node="control-plane-0"
I0119 12:16:08.509079       1 duplicates.go:94] "Processing node" node="control-plane-1"
I0119 12:16:08.532517       1 duplicates.go:94] "Processing node" node="control-plane-2"

I0119 12:17:08.557448       1 node.go:46] "Node lister returned empty list, now fetch directly"
I0119 12:17:08.669790       1 topologyspreadconstraint.go:109] "Processing namespaces for topology spread constraints"
I0119 12:17:08.770414       1 duplicates.go:94] "Processing node" node="compute-0"
I0119 12:17:08.794727       1 duplicates.go:94] "Processing node" node="compute-1"
I0119 12:17:08.857392       1 duplicates.go:94] "Processing node" node="control-plane-0"
I0119 12:17:08.881861       1 duplicates.go:94] "Processing node" node="control-plane-1"
I0119 12:17:08.903355       1 duplicates.go:94] "Processing node" node="control-plane-2"

Descheduler log when the descheduler annotations are present for the pod:
========================================================================
[knarra@knarra verification-tests]$ oc logs -f cluster-55b679cd94-bf65m -n openshift-kube-descheduler-operator
I0119 12:19:47.873990       1 node.go:46] "Node lister returned empty list, now fetch directly"
I0119 12:19:48.059908       1 topologyspreadconstraint.go:109] "Processing namespaces for topology spread constraints"
I0119 12:19:48.297513       1 evictions.go:117] "Evicted pod" pod="knarra/mypodsv4sp" reason=" (PodTopologySpread)"
I0119 12:19:48.297657       1 duplicates.go:94] "Processing node" node="compute-0"
I0119 12:19:48.324803       1 duplicates.go:94] "Processing node" node="compute-1"
I0119 12:19:48.347550       1 duplicates.go:94] "Processing node" node="control-plane-0"
I0119 12:19:48.381212       1 duplicates.go:94] "Processing node" node="control-plane-1"
I0119 12:19:48.457461       1 duplicates.go:94] "Processing node" node="control-plane-2"


I0119 12:20:48.497487       1 node.go:46] "Node lister returned empty list, now fetch directly"
I0119 12:20:48.507606       1 duplicates.go:94] "Processing node" node="compute-0"
I0119 12:20:48.527653       1 duplicates.go:94] "Processing node" node="compute-1"
I0119 12:20:48.548893       1 duplicates.go:94] "Processing node" node="control-plane-0"
I0119 12:20:48.567381       1 duplicates.go:94] "Processing node" node="control-plane-1"
I0119 12:20:48.589618       1 duplicates.go:94] "Processing node" node="control-plane-2"
I0119 12:20:48.659071       1 topologyspreadconstraint.go:109] "Processing namespaces for topology spread constraints"
I0119 12:20:48.668271       1 topologyspreadconstraint.go:183] "Skipping topology constraint because it is already balanced" constraint={MaxSkew:1 TopologyKey:knarra-zone WhenUnsatisfiable:DoNotSchedule LabelSelector:&LabelSelector{MatchLabels:map[string]string{foo: bar,},MatchExpressions:[]LabelSelectorRequirement{},}}

I0119 12:21:48.902531       1 node.go:46] "Node lister returned empty list, now fetch directly"
I0119 12:21:48.958797       1 topologyspreadconstraint.go:109] "Processing namespaces for topology spread constraints"
I0119 12:21:48.968011       1 topologyspreadconstraint.go:183] "Skipping topology constraint because it is already balanced" constraint={MaxSkew:1 TopologyKey:knarra-zone WhenUnsatisfiable:DoNotSchedule LabelSelector:&LabelSelector{MatchLabels:map[string]string{foo: bar,},MatchExpressions:[]LabelSelectorRequirement{},}}
I0119 12:21:48.980690       1 duplicates.go:94] "Processing node" node="compute-0"
I0119 12:21:49.007296       1 duplicates.go:94] "Processing node" node="compute-1"
I0119 12:21:49.030910       1 duplicates.go:94] "Processing node" node="control-plane-0"
I0119 12:21:49.083342       1 duplicates.go:94] "Processing node" node="control-plane-1"
I0119 12:21:49.129958       1 duplicates.go:94] "Processing node" node="control-plane-2"

Based on the above moving bug to verified state.

Comment 6 errata-xmlrpc 2021-02-24 15:52:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.