1989720 – Descheduler operator should allow configuration of PodLifetime seconds

Bug 1989720 - Descheduler operator should allow configuration of PodLifetime seconds

Summary: Descheduler operator should allow configuration of PodLifetime seconds

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-scheduler
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Mike Dame
QA Contact:	RamaKasturi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-03 19:35 UTC by Mike Dame
Modified:	2021-10-18 17:44 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:	PodLifetime seconds is now a configurable value for the Descheduler Operator, for the LifecycleAndUtilization profile
Clone Of:
Clones:	1989722 (view as bug list)
Environment:
Last Closed:	2021-10-18 17:44:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-kube-descheduler-operator pull 201	None	None	None	2021-08-09 07:40:40 UTC
Github	openshift cluster-kube-descheduler-operator pull 208	None	None	None	2021-08-09 17:32:38 UTC
Github	openshift cluster-kube-descheduler-operator pull 211	None	None	None	2021-08-16 14:49:02 UTC
Github	openshift cluster-kube-descheduler-operator pull 213	None	None	None	2021-08-20 19:21:45 UTC
Red Hat Product Errata	RHSA-2021:3759	None	None	None	2021-10-18 17:44:37 UTC

Description Mike Dame 2021-08-03 19:35:41 UTC

The Descheduler operator should allow configuration of the PodLifetime, for profiles that enable that strategy (ie LifecycleAndUtilization). The current default of 24h is unfit for many production use cases.

Comment 1 Mike Dame 2021-08-09 17:32:24 UTC

Moving back to Assigned for another PR to merge

Comment 3 RamaKasturi 2021-08-14 11:51:44 UTC

Moving the bug back to assigned state as i do not see that pod does not get evicted in the specified seconds in podLifetime.Upon further checking i see that configmap still has maxPodLifeTimeSeconds as 86400 even after setting the below in kubedescheduler cluster object.

profileCustomizations:
    podLifetime: 20

[knarra@knarra openshift-client-linux-4.9.0-0.nightly-2021-08-14-044521]$ ./oc get csv -n openshift-kube-descheduler-operator
NAME                                                DISPLAY                     VERSION              REPLACES   PHASE
clusterkubedescheduleroperator.4.9.0-202108130204   Kube Descheduler Operator   4.9.0-202108130204              Succeeded


output of `./oc get kubedescheduler cluster -o yaml -n openshift-kube-descheduler-operator`:
================================================================================================
[knarra@knarra openshift-client-linux-4.9.0-0.nightly-2021-08-14-044521]$ ./oc get kubedescheduler cluster -o yaml -n openshift-kube-descheduler-operator
apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
metadata:
  creationTimestamp: "2021-08-14T11:42:31Z"
  generation: 1
  name: cluster
  namespace: openshift-kube-descheduler-operator
  resourceVersion: "61432"
  uid: 36a8eada-233f-4fe9-871f-5f4980fae252
spec:
  deschedulingIntervalSeconds: 3600
  logLevel: Normal
  managementState: Managed
  operatorLogLevel: Normal
  profileCustomizations:
    podLifetime: 20
  profiles:
  - LifecycleAndUtilization
status:
  conditions:
  - lastTransitionTime: "2021-08-14T11:42:31Z"
    status: "False"
    type: TargetConfigControllerDegraded
  generations:
  - group: apps
    hash: ""
    lastGeneration: 1
    name: cluster
    namespace: openshift-kube-descheduler-operator
    resource: deployments
  readyReplicas: 0

Output of `./oc get configmap cluster -n openshift-kube-descheduler-operator -o yaml`:
=============================================================================================
[knarra@knarra openshift-client-linux-4.9.0-0.nightly-2021-08-14-044521]$ ./oc get configmap cluster -n openshift-kube-descheduler-operator -o yaml
apiVersion: v1
data:
  policy.yaml: |
    apiVersion: descheduler/v1alpha1
    ignorePvcPods: true
    kind: DeschedulerPolicy
    strategies:
      LowNodeUtilization:
        enabled: true
        params:
          includeSoftConstraints: false
          namespaces: null
          nodeResourceUtilizationThresholds:
            targetThresholds:
              cpu: 50
              memory: 50
              pods: 50
            thresholds:
              cpu: 20
              memory: 20
              pods: 20
          thresholdPriority: null
          thresholdPriorityClassName: ""
      PodLifeTime:
        enabled: true
        params:
          includeSoftConstraints: false
          namespaces:
            exclude:
            - kube-system
            - openshift-apiserver
            - openshift-apiserver-operator
            - openshift-authentication
            - openshift-authentication-operator
            - openshift-cloud-controller-manager
            - openshift-cloud-controller-manager-operator
            - openshift-cloud-credential-operator
            - openshift-cluster-csi-drivers
            - openshift-cluster-machine-approver
            - openshift-cluster-node-tuning-operator
            - openshift-cluster-samples-operator
            - openshift-cluster-storage-operator
            - openshift-cluster-version
            - openshift-config
            - openshift-config-managed
            - openshift-config-operator
            - openshift-console
            - openshift-console-operator
            - openshift-console-user-settings
            - openshift-controller-manager
            - openshift-controller-manager-operator
            - openshift-dns
            - openshift-dns-operator
            - openshift-etcd
            - openshift-etcd-operator
            - openshift-host-network
            - openshift-image-registry
            - openshift-infra
            - openshift-ingress
            - openshift-ingress-canary
            - openshift-ingress-operator
            - openshift-insights
            - openshift-kni-infra
            - openshift-kube-apiserver
            - openshift-kube-apiserver-operator
            - openshift-kube-controller-manager
            - openshift-kube-controller-manager-operator
            - openshift-kube-descheduler-operator
            - openshift-kube-scheduler
            - openshift-kube-scheduler-operator
            - openshift-kube-storage-version-migrator
            - openshift-kube-storage-version-migrator-operator
            - openshift-kubevirt-infra
            - openshift-machine-api
            - openshift-machine-config-operator
            - openshift-marketplace
            - openshift-monitoring
            - openshift-multus
            - openshift-network-diagnostics
            - openshift-network-operator
            - openshift-node
            - openshift-oauth-apiserver
            - openshift-openstack-infra
            - openshift-operator-lifecycle-manager
            - openshift-operators
            - openshift-ovirt-infra
            - openshift-sdn
            - openshift-service-ca
            - openshift-service-ca-operator
            - openshift-user-workload-monitoring
            - openshift-vsphere-infra
            include: null
          podLifeTime:
            maxPodLifeTimeSeconds: 86400
          thresholdPriority: null
          thresholdPriorityClassName: ""
      RemovePodsHavingTooManyRestarts:
        enabled: true
        params:
          includeSoftConstraints: false
          namespaces:
            exclude:
            - kube-system
            - openshift-apiserver
            - openshift-apiserver-operator
            - openshift-authentication
            - openshift-authentication-operator
            - openshift-cloud-controller-manager
            - openshift-cloud-controller-manager-operator
            - openshift-cloud-credential-operator
            - openshift-cluster-csi-drivers
            - openshift-cluster-machine-approver
            - openshift-cluster-node-tuning-operator
            - openshift-cluster-samples-operator
            - openshift-cluster-storage-operator
            - openshift-cluster-version
            - openshift-config
            - openshift-config-managed
            - openshift-config-operator
            - openshift-console
            - openshift-console-operator
            - openshift-console-user-settings
            - openshift-controller-manager
            - openshift-controller-manager-operator
            - openshift-dns
            - openshift-dns-operator
            - openshift-etcd
            - openshift-etcd-operator
            - openshift-host-network
            - openshift-image-registry
            - openshift-infra
            - openshift-ingress
            - openshift-ingress-canary
            - openshift-ingress-operator
            - openshift-insights
            - openshift-kni-infra
            - openshift-kube-apiserver
            - openshift-kube-apiserver-operator
            - openshift-kube-controller-manager
            - openshift-kube-controller-manager-operator
            - openshift-kube-descheduler-operator
            - openshift-kube-scheduler
            - openshift-kube-scheduler-operator
            - openshift-kube-storage-version-migrator
            - openshift-kube-storage-version-migrator-operator
            - openshift-kubevirt-infra
            - openshift-machine-api
            - openshift-machine-config-operator
            - openshift-marketplace
            - openshift-monitoring
            - openshift-multus
            - openshift-network-diagnostics
            - openshift-network-operator
            - openshift-node
            - openshift-oauth-apiserver
            - openshift-openstack-infra
            - openshift-operator-lifecycle-manager
            - openshift-operators
            - openshift-ovirt-infra
            - openshift-sdn
            - openshift-service-ca
            - openshift-service-ca-operator
            - openshift-user-workload-monitoring
            - openshift-vsphere-infra
            include: null
          podsHavingTooManyRestarts:
            includingInitContainers: true
            podRestartThreshold: 100
          thresholdPriority: null
          thresholdPriorityClassName: ""
kind: ConfigMap
metadata:
  creationTimestamp: "2021-08-14T11:42:31Z"
  name: cluster
  namespace: openshift-kube-descheduler-operator
  ownerReferences:
  - apiVersion: v1
    kind: KubeDescheduler
    name: cluster
    uid: 36a8eada-233f-4fe9-871f-5f4980fae252
  resourceVersion: "61415"
  uid: e0d10e61-8e7f-45ca-9450-4eb6461766e1


Based on the above moving the bug to assigned state.

Comment 4 Mike Dame 2021-08-16 14:50:48 UTC

@RamaKasturi as mentioned in https://github.com/openshift/cluster-kube-descheduler-operator/pull/208#discussion_r688919199, this *should* take the duration format (ie, "5m") but there was a bug in the CRD that rejected it. I have opened a PR to update the CRD to take a string instead of an integer, when that merges and a new build is available please re-test thank you

Comment 6 zhou ying 2021-08-20 06:43:40 UTC

I do not see that pod does not get evicted in the specified seconds in podLifetime.Upon further checking i see that configmap still has maxPodLifeTimeSeconds as 86400

[root@localhost roottest]# oc get kubedescheduler cluster -o yaml 
apiVersion: operator.openshift.io/v1
kind: KubeDescheduler
metadata:
  creationTimestamp: "2021-08-20T03:12:43Z"
  generation: 4
  name: cluster
  namespace: openshift-kube-descheduler-operator
  resourceVersion: "142064"
  uid: a2010876-ddcd-482c-a128-a2c41669e304
spec:
  deschedulingIntervalSeconds: 60
  logLevel: Normal
  managementState: Managed
  operatorLogLevel: Normal
  profileCustomizations:
    podLifetime: 5m
  profiles:
  - LifecycleAndUtilization



[root@localhost roottest]# oc get configmap cluster -o yaml 
apiVersion: v1
data:
  policy.yaml: |
    apiVersion: descheduler/v1alpha1
    ignorePvcPods: true
    kind: DeschedulerPolicy
    strategies:
      LowNodeUtilization:
        enabled: true
        params:
          includeSoftConstraints: false
          namespaces: null
          nodeResourceUtilizationThresholds:
            targetThresholds:
              cpu: 50
              memory: 50
              pods: 50
            thresholds:
              cpu: 20
              memory: 20
              pods: 20
          thresholdPriority: null
          thresholdPriorityClassName: ""
      PodLifeTime:
        enabled: true
        params:
          includeSoftConstraints: false
          namespaces:
            exclude:
            - kube-system
            - openshift-apiserver
            - openshift-apiserver-operator
            - openshift-authentication
            - openshift-authentication-operator
            - openshift-cloud-controller-manager
            - openshift-cloud-controller-manager-operator
            - openshift-cloud-credential-operator
            - openshift-cluster-csi-drivers
            - openshift-cluster-machine-approver
            - openshift-cluster-node-tuning-operator
            - openshift-cluster-samples-operator
            - openshift-cluster-storage-operator
            - openshift-cluster-version
            - openshift-config
            - openshift-config-managed
            - openshift-config-operator
            - openshift-console
            - openshift-console-operator
            - openshift-console-user-settings
            - openshift-controller-manager
            - openshift-controller-manager-operator
            - openshift-dns
            - openshift-dns-operator
            - openshift-etcd
            - openshift-etcd-operator
            - openshift-host-network
            - openshift-image-registry
            - openshift-infra
            - openshift-ingress
            - openshift-ingress-canary
            - openshift-ingress-operator
            - openshift-insights
            - openshift-kni-infra
            - openshift-kube-apiserver
            - openshift-kube-apiserver-operator
            - openshift-kube-controller-manager
            - openshift-kube-controller-manager-operator
            - openshift-kube-descheduler-operator
            - openshift-kube-scheduler
            - openshift-kube-scheduler-operator
            - openshift-kube-storage-version-migrator
            - openshift-kube-storage-version-migrator-operator
            - openshift-kubevirt-infra
            - openshift-logging
            - openshift-machine-api
            - openshift-machine-config-operator
            - openshift-marketplace
            - openshift-monitoring
            - openshift-multus
            - openshift-network-diagnostics
            - openshift-network-operator
            - openshift-node
            - openshift-oauth-apiserver
            - openshift-openstack-infra
            - openshift-operator-lifecycle-manager
            - openshift-operators
            - openshift-operators-redhat
            - openshift-ovirt-infra
            - openshift-sdn
            - openshift-service-ca
            - openshift-service-ca-operator
            - openshift-user-workload-monitoring
            - openshift-vsphere-infra
            include: null
          podLifeTime:
            maxPodLifeTimeSeconds: 86400
          thresholdPriority: null
          thresholdPriorityClassName: ""

[root@localhost roottest]# oc get csv -n openshift-kube-descheduler-operator
NAME                                                DISPLAY                            VERSION              REPLACES   PHASE
clusterkubedescheduleroperator.4.9.0-202108171159   Kube Descheduler Operator          4.9.0-202108171159              Succeeded

Comment 7 Paige Rubendall 2021-08-20 18:03:03 UTC

Are we able to give an example of the expected input format either in the description below the text box or as light text in the text box?

Comment 8 Mike Dame 2021-08-20 19:22:41 UTC

That format ("5m") is correct, it is the standard "duration" format as indicated by the type in the CRD. We still had a bug in the code that processed it

I have opened https://github.com/openshift/cluster-kube-descheduler-operator/pull/213 to fix this new bug and added a test to ensure it works now. When that merges, please test again

Comment 10 RamaKasturi 2021-08-23 10:28:37 UTC

Tested bug with podLifetime as 420s, 5m & 1h with the build below and i see that the right value gets copied over to configmap and pod gets evicted correctly.

[knarra@knarra cucushift]$ oc get csv -n openshift-kube-descheduler-operator
NAME                                                DISPLAY                     VERSION              REPLACES   PHASE
clusterkubedescheduleroperator.4.9.0-202108210926   Kube Descheduler Operator   4.9.0-202108210926              Succeeded

Logs when podLifetime set to 5m:
====================================
I0823 09:12:22.895313       1 pod_lifetime.go:92] "Processing node" node="ip-10-0-154-172.us-east-2.compute.internal"
I0823 09:12:22.936389       1 evictions.go:130] "Evicted pod" pod="knarra/hello2-68c946777-klrbf" reason="PodLifeTime"
I0823 09:12:22.936601       1 pod_lifetime.go:98] "Evicted pod because it exceeded its lifetime" pod="knarra/hello2-68c946777-klrbf" maxPodLifeTime=300
I0823 09:12:22.936616       1 pod_lifetime.go:92] "Processing node" node="ip-10-0-158-188.us-east-2.compute.internal"
I0823 09:12:22.987746       1 pod_lifetime.go:92] "Processing node" node="ip-10-0-163-175.us-east-2.compute.internal"
I0823 09:12:23.015997       1 pod_lifetime.go:92] "Processing node" node="ip-10-0-183-30.us-east-2.compute.internal"
I0823 09:12:23.343887       1 pod_lifetime.go:92] "Processing node" node="ip-10-0-199-190.us-east-2.compute.internal"
I0823 09:12:23.548121       1 pod_lifetime.go:92] "Processing node" node="ip-10-0-222-237.us-east-2.compute.internal"
I0823 09:12:23.743412       1 descheduler.go:151] "Number of evicted pods" totalEvicted=1

Logs when podLifetime set to 420s:
==================================
I0823 09:20:14.729656       1 evictions.go:130] "Evicted pod" pod="knarra/hello2-68c946777-dskdz" reason="PodLifeTime"
I0823 09:20:14.729798       1 pod_lifetime.go:98] "Evicted pod because it exceeded its lifetime" pod="knarra/hello2-68c946777-dskdz" maxPodLifeTime=420
I0823 09:20:14.729815       1 descheduler.go:151] "Number of evicted pods" totalEvicted=1

Logs when podLifetime set to 1h:
======================================
I0823 10:20:32.952833       1 pod_lifetime.go:92] "Processing node" node="ip-10-0-154-172.us-east-2.compute.internal"
I0823 10:20:33.003504       1 evictions.go:130] "Evicted pod" pod="knarra/hello2-68c946777-qc288" reason="PodLifeTime"
I0823 10:20:33.005576       1 pod_lifetime.go:98] "Evicted pod because it exceeded its lifetime" pod="knarra/hello2-68c946777-qc288" maxPodLifeTime=3600
I0823 10:20:33.005737       1 pod_lifetime.go:92] "Processing node" node="ip-10-0-158-188.us-east-2.compute.internal"
I0823 10:20:33.028240       1 pod_lifetime.go:92] "Processing node" node="ip-10-0-163-175.us-east-2.compute.internal"
I0823 10:20:33.051591       1 pod_lifetime.go:92] "Processing node" node="ip-10-0-183-30.us-east-2.compute.internal"
I0823 10:20:33.074338       1 pod_lifetime.go:92] "Processing node" node="ip-10-0-199-190.us-east-2.compute.internal"
I0823 10:20:33.106819       1 pod_lifetime.go:92] "Processing node" node="ip-10-0-222-237.us-east-2.compute.internal"
I0823 10:20:33.128137       1 duplicates.go:99] "Processing node" node="ip-10-0-154-172.us-east-2.compute.internal"
I0823 10:20:33.153901       1 duplicates.go:99] "Processing node" node="ip-10-0-158-188.us-east-2.compute.internal"
I0823 10:20:33.175834       1 duplicates.go:99] "Processing node" node="ip-10-0-163-175.us-east-2.compute.internal"
I0823 10:20:33.196243       1 duplicates.go:99] "Processing node" node="ip-10-0-183-30.us-east-2.compute.internal"
I0823 10:20:33.365329       1 duplicates.go:99] "Processing node" node="ip-10-0-199-190.us-east-2.compute.internal"
I0823 10:20:33.565394       1 duplicates.go:99] "Processing node" node="ip-10-0-222-237.us-east-2.compute.internal"
I0823 10:20:33.765639       1 toomanyrestarts.go:78] "Processing node" node="ip-10-0-154-172.us-east-2.compute.internal"
I0823 10:20:33.967391       1 toomanyrestarts.go:78] "Processing node" node="ip-10-0-158-188.us-east-2.compute.internal"
I0823 10:20:34.164954       1 toomanyrestarts.go:78] "Processing node" node="ip-10-0-163-175.us-east-2.compute.internal"
I0823 10:20:34.367402       1 toomanyrestarts.go:78] "Processing node" node="ip-10-0-183-30.us-east-2.compute.internal"
I0823 10:20:34.567434       1 toomanyrestarts.go:78] "Processing node" node="ip-10-0-199-190.us-east-2.compute.internal"
I0823 10:20:34.765857       1 toomanyrestarts.go:78] "Processing node" node="ip-10-0-222-237.us-east-2.compute.internal"
I0823 10:20:35.191698       1 topologyspreadconstraint.go:139] "Processing namespaces for topology spread constraints"
I0823 10:20:37.620460       1 nodeutilization.go:170] "Node is appropriately utilized" node="ip-10-0-154-172.us-east-2.compute.internal" usage=map[cpu:582m memory:2963Mi pods:27] usagePercentage=map[cpu:38.8 memory:45.297287936197975 pods:10.8]
I0823 10:20:37.620511       1 nodeutilization.go:167] "Node is overutilized" node="ip-10-0-158-188.us-east-2.compute.internal" usage=map[cpu:1759m memory:5597Mi pods:34] usagePercentage=map[cpu:50.25714285714286 memory:38.32301504027063 pods:13.6]
I0823 10:20:37.620534       1 nodeutilization.go:167] "Node is overutilized" node="ip-10-0-163-175.us-east-2.compute.internal" usage=map[cpu:2 memory:6787Mi pods:57] usagePercentage=map[cpu:57.142857142857146 memory:46.47102073938123 pods:22.8]
I0823 10:20:37.620555       1 nodeutilization.go:170] "Node is appropriately utilized" node="ip-10-0-183-30.us-east-2.compute.internal" usage=map[cpu:657m memory:2791Mi pods:19] usagePercentage=map[cpu:43.8 memory:42.126836389535974 pods:7.6]
I0823 10:20:37.620913       1 nodeutilization.go:167] "Node is overutilized" node="ip-10-0-199-190.us-east-2.compute.internal" usage=map[cpu:1795m memory:5472Mi pods:32] usagePercentage=map[cpu:51.285714285714285 memory:37.467112046519354 pods:12.8]
I0823 10:20:37.621511       1 nodeutilization.go:170] "Node is appropriately utilized" node="ip-10-0-222-237.us-east-2.compute.internal" usage=map[cpu:612m memory:2097Mi pods:26] usagePercentage=map[cpu:40.8 memory:31.651729096688264 pods:10.4]
I0823 10:20:37.621532       1 lownodeutilization.go:99] "Criteria for a node under utilization" CPU=20 Mem=20 Pods=20
I0823 10:20:37.621543       1 lownodeutilization.go:100] "Number of underutilized nodes" totalNumber=0
I0823 10:20:37.621573       1 lownodeutilization.go:113] "Criteria for a node above target utilization" CPU=50 Mem=50 Pods=50
I0823 10:20:37.621585       1 lownodeutilization.go:114] "Number of overutilized nodes" totalNumber=3
I0823 10:20:37.621596       1 lownodeutilization.go:117] "No node is underutilized, nothing to do here, you might tune your thresholds further"
I0823 10:20:37.621608       1 descheduler.go:151] "Number of evicted pods" totalEvicted=1

Based on the above moving bug to verified state.

Comment 13 errata-xmlrpc 2021-10-18 17:44:27 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.