Hide Forgot
Description of problem: After setup with descheduler enabled and descheduler cronjob will evicted its pod since it is not in the blacklist Version-Release number of selected component (if applicable): # oc version oc v3.10.0-0.47.0 kubernetes v1.10.0+b81c8f8 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-18-14-232.ec2.internal:8443 openshift v3.10.0-0.47.0 kubernetes v1.10.0+b81c8f8 How reproducible: Always Steps to Reproduce: 1. Setup openshift with following parameters: openshift_descheduler_install: true openshift_descheduler_image_prefix: xxxx:443/openshift3/ose- openshift_descheduler_image_version: v3.10 openshift_descheduler_strategies_dict: "{'remove_duplicates': True, 'remove_pods_violating_inter_pod_anti_affinity': True, 'low_node_utilization': True}" openshift_descheduler_cronjob_node_selector: "" openshift_descheduler_cronjob_schedule: "*/5 * * * *" 2. Make some pod to make low_node_utilization work 3. Check the descheduler log Actual results: # oc logs -f descheduler-cronjob-1526540100-tjkdd I0517 06:55:10.340245 1 reflector.go:202] Starting reflector *v1.Node (1h0m0s) from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:84 I0517 06:55:10.340368 1 reflector.go:240] Listing and watching *v1.Node from github.com/kubernetes-incubator/descheduler/pkg/descheduler/node/node.go:84 I0517 06:55:10.440492 1 duplicates.go:50] Processing node: "ip-172-18-4-140.ec2.internal" I0517 06:55:10.496043 1 duplicates.go:54] "ReplicationController/hello-1" I0517 06:55:10.496074 1 duplicates.go:65] Evicted pod: "hello-1-4q9zs" (<nil>) I0517 06:55:10.496080 1 duplicates.go:65] Evicted pod: "hello-1-54xhw" (<nil>) I0517 06:55:10.496085 1 duplicates.go:65] Evicted pod: "hello-1-5xv2m" (<nil>) I0517 06:55:10.496089 1 duplicates.go:65] Evicted pod: "hello-1-8f85b" (<nil>) I0517 06:55:10.496094 1 duplicates.go:65] Evicted pod: "hello-1-94bdz" (<nil>) I0517 06:55:10.496098 1 duplicates.go:65] Evicted pod: "hello-1-bl7k7" (<nil>) I0517 06:55:10.496104 1 duplicates.go:65] Evicted pod: "hello-1-n5htm" (<nil>) I0517 06:55:10.496108 1 duplicates.go:65] Evicted pod: "hello-1-rr28r" (<nil>) I0517 06:55:10.496113 1 duplicates.go:65] Evicted pod: "hello-1-vkvfw" (<nil>) I0517 06:55:10.496117 1 duplicates.go:65] Evicted pod: "hello-1-vvvt9" (<nil>) I0517 06:55:10.496121 1 duplicates.go:65] Evicted pod: "hello-1-x5rk8" (<nil>) I0517 06:55:10.496126 1 duplicates.go:65] Evicted pod: "hello-1-xq4kj" (<nil>) I0517 06:55:10.496130 1 duplicates.go:50] Processing node: "ip-172-18-14-232.ec2.internal" I0517 06:55:10.504325 1 duplicates.go:54] "ReplicationController/hello-1" I0517 06:55:10.504353 1 duplicates.go:65] Evicted pod: "hello-1-hswrv" (<nil>) I0517 06:55:10.504360 1 duplicates.go:65] Evicted pod: "hello-1-jtw9g" (<nil>) I0517 06:55:10.504365 1 duplicates.go:65] Evicted pod: "hello-1-kbznn" (<nil>) I0517 06:55:10.504369 1 duplicates.go:65] Evicted pod: "hello-1-kn5m2" (<nil>) I0517 06:55:10.504374 1 duplicates.go:65] Evicted pod: "hello-1-m67q9" (<nil>) I0517 06:55:10.504378 1 duplicates.go:65] Evicted pod: "hello-1-sn5r8" (<nil>) I0517 06:55:10.504383 1 duplicates.go:65] Evicted pod: "hello-1-zsl2q" (<nil>) I0517 06:55:10.504388 1 duplicates.go:50] Processing node: "ip-172-18-8-195.ec2.internal" I0517 06:55:10.512867 1 duplicates.go:54] "ReplicationController/hello-1" I0517 06:55:10.512897 1 duplicates.go:65] Evicted pod: "hello-1-9pxrm" (<nil>) I0517 06:55:10.512904 1 duplicates.go:65] Evicted pod: "hello-1-b8qtn" (<nil>) I0517 06:55:10.512908 1 duplicates.go:65] Evicted pod: "hello-1-jqdbk" (<nil>) I0517 06:55:10.512913 1 duplicates.go:65] Evicted pod: "hello-1-l5wsj" (<nil>) I0517 06:55:10.512917 1 duplicates.go:65] Evicted pod: "hello-1-mq87n" (<nil>) I0517 06:55:10.512922 1 duplicates.go:65] Evicted pod: "hello-1-s2qkz" (<nil>) I0517 06:55:10.512926 1 duplicates.go:65] Evicted pod: "hello-1-sptzm" (<nil>) I0517 06:55:10.512931 1 duplicates.go:65] Evicted pod: "hello-1-vfhfq" (<nil>) I0517 06:55:10.542384 1 lownodeutilization.go:144] Node "ip-172-18-4-140.ec2.internal" is over utilized with usage: api.ResourceThresholds{"cpu":40, "memory":25.873227362806837, "pods":7.6} I0517 06:55:10.542436 1 lownodeutilization.go:149] allPods:19, nonRemovablePods:3, bePods:2, bPods:14, gPods:0 I0517 06:55:10.542523 1 lownodeutilization.go:141] Node "ip-172-18-14-232.ec2.internal" is under utilized with usage: api.ResourceThresholds{"cpu":27.5, "memory":16.775785028577992, "pods":7.6} I0517 06:55:10.542543 1 lownodeutilization.go:149] allPods:19, nonRemovablePods:9, bePods:1, bPods:9, gPods:0 I0517 06:55:10.542625 1 lownodeutilization.go:144] Node "ip-172-18-8-195.ec2.internal" is over utilized with usage: api.ResourceThresholds{"cpu":32.5, "memory":27.495062326909224, "pods":6.4} I0517 06:55:10.542643 1 lownodeutilization.go:149] allPods:16, nonRemovablePods:5, bePods:0, bPods:11, gPods:0 I0517 06:55:10.542649 1 lownodeutilization.go:65] Criteria for a node under utilization: CPU: 28, Mem: 20, Pods: 8 I0517 06:55:10.542707 1 lownodeutilization.go:72] Total number of underutilized nodes: 1 I0517 06:55:10.542712 1 lownodeutilization.go:89] Criteria for a node above target utilization: CPU: 30, Mem: 25, Pods: 7 I0517 06:55:10.542717 1 lownodeutilization.go:91] Total number of nodes above target utilization: 2 I0517 06:55:10.542731 1 lownodeutilization.go:183] Total capacity to be moved: CPU:100, Mem:1.3612236799999998e+09, Pods:-1.4999999999999991 I0517 06:55:10.542745 1 lownodeutilization.go:184] ********Number of pods evicted from each node:*********** I0517 06:55:10.542751 1 lownodeutilization.go:191] evicting pods from node "ip-172-18-4-140.ec2.internal" with usage: api.ResourceThresholds{"cpu":40, "memory":25.873227362806837, "pods":7.6} I0517 06:55:10.542763 1 lownodeutilization.go:230] Evicted pod: "asb-1-42nq7" (<nil>) I0517 06:55:10.542769 1 lownodeutilization.go:244] updated node usage: api.ResourceThresholds{"cpu":40, "memory":25.873227362806837, "pods":7.199999999999999} I0517 06:55:10.542778 1 lownodeutilization.go:230] Evicted pod: "descheduler-cronjob-1526540100-tjkdd" (<nil>) I0517 06:55:10.542784 1 lownodeutilization.go:244] updated node usage: api.ResourceThresholds{"cpu":40, "memory":25.873227362806837, "pods":6.799999999999999} I0517 06:55:10.542792 1 lownodeutilization.go:230] Evicted pod: "hello-1-2gbpx" (<nil>) I0517 06:55:10.542798 1 lownodeutilization.go:244] updated node usage: api.ResourceThresholds{"cpu":37.5, "memory":24.251399202944313, "pods":6.399999999999999} I0517 06:55:10.542806 1 lownodeutilization.go:230] Evicted pod: "hello-1-4q9zs" (<nil>) I0517 06:55:10.542811 1 lownodeutilization.go:244] updated node usage: api.ResourceThresholds{"memory":22.62957104308179, "pods":5.999999999999998, "cpu":35} I0517 06:55:10.542818 1 lownodeutilization.go:230] Evicted pod: "hello-1-54xhw" (<nil>) I0517 06:55:10.542824 1 lownodeutilization.go:244] updated node usage: api.ResourceThresholds{"cpu":32.5, "memory":21.007742883219265, "pods":5.599999999999998} I0517 06:55:10.542832 1 lownodeutilization.go:230] Evicted pod: "hello-1-5xv2m" (<nil>) I0517 06:55:10.542838 1 lownodeutilization.go:244] updated node usage: api.ResourceThresholds{"cpu":30, "memory":19.38591472335674, "pods":5.1999999999999975} I0517 06:55:10.542847 1 lownodeutilization.go:202] 18 pods evicted from node "ip-172-18-4-140.ec2.internal" with usage map[memory:19.38591472335674 pods:5.1999999999999975 cpu:30] I0517 06:55:10.542869 1 lownodeutilization.go:191] evicting pods from node "ip-172-18-8-195.ec2.internal" with usage: api.ResourceThresholds{"cpu":32.5, "memory":27.495062326909224, "pods":6.4} I0517 06:55:10.542880 1 lownodeutilization.go:230] Evicted pod: "hello-1-4rmtt" (<nil>) I0517 06:55:10.542885 1 lownodeutilization.go:244] updated node usage: api.ResourceThresholds{"cpu":30, "memory":25.873233765690614, "pods":6} I0517 06:55:10.542893 1 lownodeutilization.go:230] Evicted pod: "hello-1-9pxrm" (<nil>) I0517 06:55:10.542898 1 lownodeutilization.go:244] updated node usage: api.ResourceThresholds{"cpu":27.5, "memory":24.251405204472004, "pods":5.6} I0517 06:55:10.542907 1 lownodeutilization.go:202] 10 pods evicted from node "ip-172-18-8-195.ec2.internal" with usage map[cpu:27.5 memory:24.251405204472004 pods:5.6] I0517 06:55:10.542916 1 lownodeutilization.go:94] Total number of pods evicted: 28 I0517 06:55:10.542923 1 pod_antiaffinity.go:45] Processing node: "ip-172-18-4-140.ec2.internal" I0517 06:55:10.550841 1 pod_antiaffinity.go:45] Processing node: "ip-172-18-14-232.ec2.internal" I0517 06:55:10.558162 1 pod_antiaffinity.go:45] Processing node: "ip-172-18-8-195.ec2.internal" Expected results: Descheduler pod should not be descheduled Additional info: Description of problem: Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag
PR https://github.com/openshift/openshift-ansible/pull/8397
Fix is available in openshift-ansible-3.10.0-0.48.0
Checked with openshift-ansible-3.10.0-0.50.0 and the descheduler pod now have the annotation scheduler.alpha.kubernetes.io/critical-pod.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1816