Description of problem: The descheduler CR can't copy thresholdPriorityClassName to lowNodeutilization strategy Version-Release number of selected component (if applicable): [root@dhcp-140-138 ~]# oc get csv NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.6.0-202009221732.p0 Kube Descheduler Operator 4.6.0-202009221732.p0 clusterkubedescheduleroperator.4.6.0-202009192030.p0 Succeeded How reproducible: Always Steps to Reproduce: 1. Create priorityclass as follow: [zhouying@dhcp-140-138 tmp]$ cat priority.yaml apiVersion: scheduling.k8s.io/v1beta1 kind: PriorityClass metadata: name: priorityclass1 value: 99 globalDefault: false description: "This priority class should be used for XYZ service pods only." 2. Config the priorityclass to descheduler strategy: LowNodeUtilization strategies: - name: LowNodeUtilization params: - name: cputhreshold value: "60" - name: memorythreshold value: "60" - name: podsthreshold value: "60" - name: memorytargetthreshold value: "80" - name: cputargetthreshold value: "80" - name: podstargetthreshold value: "80" - name: nodes value: "3" - name: thresholdPriorityClassName value: priorityclass1 3. Check the descheduler CR should copy the configure Actual results: 3. The CR did not copy the thresholdPriorityClassName to configmap correctly , and did not trigger now update for cluster pod. Expected results: 3. The CR should copy the thresholdPriorityClassName to configmap correctly Additional info: The descheduler operator pod has logs: E0923 05:32:54.844907 1 target_config_reconciler.go:467] key failed with : strconv.Atoi: parsing "priorityclass1": invalid syntax
Verified with the payload below and i see that thresholdPriorityclassName name gets copied to configmap with lownodeutilization, where priorityclass can either be a user created one or existing in the system already. [ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ ./oc get csv NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.6.0-202009231847.p0 Kube Descheduler Operator 4.6.0-202009231847.p0 Succeeded [ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ ./oc version Client Version: 4.6.0-0.nightly-2020-09-24-015627 Server Version: 4.6.0-0.nightly-2020-09-24-015627 Kubernetes Version: v1.19.0+fff8183 user created priorityclass: =========================== [ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ oc get configmap cluster -o yaml apiVersion: v1 data: policy.yaml: | strategies: LowNodeUtilization: enabled: true params: namespaces: null nodeResourceUtilizationThresholds: numberOfNodes: 3 targetThresholds: cpu: 80 memory: 80 pods: 80 thresholds: cpu: 60 memory: 60 pods: 60 thresholdPriority: null thresholdPriorityClassName: priorityh kind: ConfigMap Already existing in the system: =================================== [ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ oc get configmap cluster -o yaml apiVersion: v1 data: policy.yaml: | strategies: LowNodeUtilization: enabled: true params: namespaces: null nodeResourceUtilizationThresholds: numberOfNodes: 3 targetThresholds: cpu: 80 memory: 80 pods: 80 thresholds: cpu: 60 memory: 60 pods: 60 thresholdPriority: null thresholdPriorityClassName: system-cluster-critical No errors in descheduler operator logs as well as cluster pod logs. Tested with all strategies and i see that thresholdPriorityClassName works. [ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ oc get configmap cluster -o yaml apiVersion: v1 data: policy.yaml: | strategies: LowNodeUtilization: enabled: true params: namespaces: null nodeResourceUtilizationThresholds: numberOfNodes: 3 targetThresholds: cpu: 80 memory: 80 pods: 80 thresholds: cpu: 60 memory: 60 pods: 60 thresholdPriority: null thresholdPriorityClassName: priorityh PodLifeTime: enabled: true params: maxPodLifeTimeSeconds: 3600 namespaces: exclude: null include: null thresholdPriority: null thresholdPriorityClassName: priorityh RemoveDuplicates: enabled: true params: namespaces: null removeDuplicates: {} thresholdPriority: null thresholdPriorityClassName: priorityh RemovePodsHavingTooManyRestarts: enabled: true params: namespaces: exclude: null include: null podsHavingTooManyRestarts: podRestartThreshold: 4 thresholdPriority: null thresholdPriorityClassName: priorityh RemovePodsViolatingInterPodAntiAffinity: enabled: true params: namespaces: exclude: null include: null thresholdPriority: null thresholdPriorityClassName: priorityh RemovePodsViolatingNodeAffinity: enabled: true params: namespaces: exclude: null include: null nodeAffinityType: - requiredDuringSchedulingIgnoredDuringExecution thresholdPriority: null thresholdPriorityClassName: priorityh RemovePodsViolatingNodeTaints: enabled: true params: namespaces: exclude: null include: null thresholdPriority: null thresholdPriorityClassName: priorityh Based on the above moving bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196