Bug 1881869 - The descheduler CR can't copy thresholdPriorityClassName to lowNodeutilization strategy
Summary: The descheduler CR can't copy thresholdPriorityClassName to lowNodeutilizatio...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Jan Chaloupka
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-23 09:17 UTC by zhou ying
Modified: 2020-10-27 16:44 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:44:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-descheduler-operator pull 140 0 None closed bug 1881869: LowNodeUtilization: don't Atoi thresholdPriority params as other params 2020-10-30 13:07:50 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:44:25 UTC

Description zhou ying 2020-09-23 09:17:11 UTC
Description of problem:
The descheduler CR can't copy thresholdPriorityClassName to lowNodeutilization strategy

Version-Release number of selected component (if applicable):
[root@dhcp-140-138 ~]#  oc get csv
NAME                                                   DISPLAY                     VERSION                 REPLACES                                               PHASE
clusterkubedescheduleroperator.4.6.0-202009221732.p0   Kube Descheduler Operator   4.6.0-202009221732.p0   clusterkubedescheduleroperator.4.6.0-202009192030.p0   Succeeded

How reproducible:
Always

Steps to Reproduce:
1. Create priorityclass as follow:
[zhouying@dhcp-140-138 tmp]$ cat priority.yaml 
apiVersion: scheduling.k8s.io/v1beta1
kind: PriorityClass
metadata:
  name: priorityclass1
value: 99
globalDefault: false
description: "This priority class should be used for XYZ service pods only."

2. Config the priorityclass to descheduler strategy:  LowNodeUtilization

  strategies:
  - name: LowNodeUtilization
    params:
    - name: cputhreshold
      value: "60"
    - name: memorythreshold
      value: "60"
    - name: podsthreshold
      value: "60"
    - name: memorytargetthreshold
      value: "80"
    - name: cputargetthreshold
      value: "80"
    - name: podstargetthreshold
      value: "80"
    - name: nodes
      value: "3"
    - name: thresholdPriorityClassName
      value: priorityclass1

3. Check the descheduler CR should copy the configure   
 
Actual results:
3. The CR did not copy the thresholdPriorityClassName to configmap correctly , and did not trigger now update for cluster pod.

Expected results:
3. The CR should copy the thresholdPriorityClassName to configmap correctly

Additional info:
The descheduler operator pod has logs: 
E0923 05:32:54.844907       1 target_config_reconciler.go:467] key failed with : strconv.Atoi: parsing "priorityclass1": invalid syntax

Comment 2 RamaKasturi 2020-09-24 13:42:17 UTC
Verified with the payload below and i see that thresholdPriorityclassName name gets copied to configmap with lownodeutilization,  where priorityclass can either be a user created one or existing in the system already.

[ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ ./oc get csv
NAME                                                   DISPLAY                     VERSION                 REPLACES   PHASE
clusterkubedescheduleroperator.4.6.0-202009231847.p0   Kube Descheduler Operator   4.6.0-202009231847.p0              Succeeded
[ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ ./oc version
Client Version: 4.6.0-0.nightly-2020-09-24-015627
Server Version: 4.6.0-0.nightly-2020-09-24-015627
Kubernetes Version: v1.19.0+fff8183


user created priorityclass:
===========================
[ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ oc get configmap cluster -o yaml
apiVersion: v1
data:
  policy.yaml: |
    strategies:
      LowNodeUtilization:
        enabled: true
        params:
          namespaces: null
          nodeResourceUtilizationThresholds:
            numberOfNodes: 3
            targetThresholds:
              cpu: 80
              memory: 80
              pods: 80
            thresholds:
              cpu: 60
              memory: 60
              pods: 60
          thresholdPriority: null
          thresholdPriorityClassName: priorityh
kind: ConfigMap

Already existing in the system:
===================================
[ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ oc get configmap cluster -o yaml
apiVersion: v1
data:
  policy.yaml: |
    strategies:
      LowNodeUtilization:
        enabled: true
        params:
          namespaces: null
          nodeResourceUtilizationThresholds:
            numberOfNodes: 3
            targetThresholds:
              cpu: 80
              memory: 80
              pods: 80
            thresholds:
              cpu: 60
              memory: 60
              pods: 60
          thresholdPriority: null
          thresholdPriorityClassName: system-cluster-critical

No errors in descheduler operator logs as well as cluster pod logs.

Tested with all strategies and i see that thresholdPriorityClassName works.

[ramakasturinarra@dhcp35-60 openshift-client-linux-4.6.0-0.nightly-2020-09-24-015627]$ oc get configmap cluster -o yaml
apiVersion: v1
data:
  policy.yaml: |
    strategies:
      LowNodeUtilization:
        enabled: true
        params:
          namespaces: null
          nodeResourceUtilizationThresholds:
            numberOfNodes: 3
            targetThresholds:
              cpu: 80
              memory: 80
              pods: 80
            thresholds:
              cpu: 60
              memory: 60
              pods: 60
          thresholdPriority: null
          thresholdPriorityClassName: priorityh
      PodLifeTime:
        enabled: true
        params:
          maxPodLifeTimeSeconds: 3600
          namespaces:
            exclude: null
            include: null
          thresholdPriority: null
          thresholdPriorityClassName: priorityh
      RemoveDuplicates:
        enabled: true
        params:
          namespaces: null
          removeDuplicates: {}
          thresholdPriority: null
          thresholdPriorityClassName: priorityh
      RemovePodsHavingTooManyRestarts:
        enabled: true
        params:
          namespaces:
            exclude: null
            include: null
          podsHavingTooManyRestarts:
            podRestartThreshold: 4
          thresholdPriority: null
          thresholdPriorityClassName: priorityh
      RemovePodsViolatingInterPodAntiAffinity:
        enabled: true
        params:
          namespaces:
            exclude: null
            include: null
          thresholdPriority: null
          thresholdPriorityClassName: priorityh
      RemovePodsViolatingNodeAffinity:
        enabled: true
        params:
          namespaces:
            exclude: null
            include: null
          nodeAffinityType:
          - requiredDuringSchedulingIgnoredDuringExecution
          thresholdPriority: null
          thresholdPriorityClassName: priorityh
      RemovePodsViolatingNodeTaints:
        enabled: true
        params:
          namespaces:
            exclude: null
            include: null
          thresholdPriority: null
          thresholdPriorityClassName: priorityh

Based on the above moving bug to verified state.

Comment 5 errata-xmlrpc 2020-10-27 16:44:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.