The Descheduler operator should allow configuration of the PodLifetime, for profiles that enable that strategy (ie LifecycleAndUtilization). The current default of 24h is unfit for many production use cases.
Moving back to Assigned for another PR to merge
Moving the bug back to assigned state as i do not see that pod does not get evicted in the specified seconds in podLifetime.Upon further checking i see that configmap still has maxPodLifeTimeSeconds as 86400 even after setting the below in kubedescheduler cluster object. profileCustomizations: podLifetime: 20 [knarra@knarra openshift-client-linux-4.9.0-0.nightly-2021-08-14-044521]$ ./oc get csv -n openshift-kube-descheduler-operator NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.9.0-202108130204 Kube Descheduler Operator 4.9.0-202108130204 Succeeded output of `./oc get kubedescheduler cluster -o yaml -n openshift-kube-descheduler-operator`: ================================================================================================ [knarra@knarra openshift-client-linux-4.9.0-0.nightly-2021-08-14-044521]$ ./oc get kubedescheduler cluster -o yaml -n openshift-kube-descheduler-operator apiVersion: operator.openshift.io/v1 kind: KubeDescheduler metadata: creationTimestamp: "2021-08-14T11:42:31Z" generation: 1 name: cluster namespace: openshift-kube-descheduler-operator resourceVersion: "61432" uid: 36a8eada-233f-4fe9-871f-5f4980fae252 spec: deschedulingIntervalSeconds: 3600 logLevel: Normal managementState: Managed operatorLogLevel: Normal profileCustomizations: podLifetime: 20 profiles: - LifecycleAndUtilization status: conditions: - lastTransitionTime: "2021-08-14T11:42:31Z" status: "False" type: TargetConfigControllerDegraded generations: - group: apps hash: "" lastGeneration: 1 name: cluster namespace: openshift-kube-descheduler-operator resource: deployments readyReplicas: 0 Output of `./oc get configmap cluster -n openshift-kube-descheduler-operator -o yaml`: ============================================================================================= [knarra@knarra openshift-client-linux-4.9.0-0.nightly-2021-08-14-044521]$ ./oc get configmap cluster -n openshift-kube-descheduler-operator -o yaml apiVersion: v1 data: policy.yaml: | apiVersion: descheduler/v1alpha1 ignorePvcPods: true kind: DeschedulerPolicy strategies: LowNodeUtilization: enabled: true params: includeSoftConstraints: false namespaces: null nodeResourceUtilizationThresholds: targetThresholds: cpu: 50 memory: 50 pods: 50 thresholds: cpu: 20 memory: 20 pods: 20 thresholdPriority: null thresholdPriorityClassName: "" PodLifeTime: enabled: true params: includeSoftConstraints: false namespaces: exclude: - kube-system - openshift-apiserver - openshift-apiserver-operator - openshift-authentication - openshift-authentication-operator - openshift-cloud-controller-manager - openshift-cloud-controller-manager-operator - openshift-cloud-credential-operator - openshift-cluster-csi-drivers - openshift-cluster-machine-approver - openshift-cluster-node-tuning-operator - openshift-cluster-samples-operator - openshift-cluster-storage-operator - openshift-cluster-version - openshift-config - openshift-config-managed - openshift-config-operator - openshift-console - openshift-console-operator - openshift-console-user-settings - openshift-controller-manager - openshift-controller-manager-operator - openshift-dns - openshift-dns-operator - openshift-etcd - openshift-etcd-operator - openshift-host-network - openshift-image-registry - openshift-infra - openshift-ingress - openshift-ingress-canary - openshift-ingress-operator - openshift-insights - openshift-kni-infra - openshift-kube-apiserver - openshift-kube-apiserver-operator - openshift-kube-controller-manager - openshift-kube-controller-manager-operator - openshift-kube-descheduler-operator - openshift-kube-scheduler - openshift-kube-scheduler-operator - openshift-kube-storage-version-migrator - openshift-kube-storage-version-migrator-operator - openshift-kubevirt-infra - openshift-machine-api - openshift-machine-config-operator - openshift-marketplace - openshift-monitoring - openshift-multus - openshift-network-diagnostics - openshift-network-operator - openshift-node - openshift-oauth-apiserver - openshift-openstack-infra - openshift-operator-lifecycle-manager - openshift-operators - openshift-ovirt-infra - openshift-sdn - openshift-service-ca - openshift-service-ca-operator - openshift-user-workload-monitoring - openshift-vsphere-infra include: null podLifeTime: maxPodLifeTimeSeconds: 86400 thresholdPriority: null thresholdPriorityClassName: "" RemovePodsHavingTooManyRestarts: enabled: true params: includeSoftConstraints: false namespaces: exclude: - kube-system - openshift-apiserver - openshift-apiserver-operator - openshift-authentication - openshift-authentication-operator - openshift-cloud-controller-manager - openshift-cloud-controller-manager-operator - openshift-cloud-credential-operator - openshift-cluster-csi-drivers - openshift-cluster-machine-approver - openshift-cluster-node-tuning-operator - openshift-cluster-samples-operator - openshift-cluster-storage-operator - openshift-cluster-version - openshift-config - openshift-config-managed - openshift-config-operator - openshift-console - openshift-console-operator - openshift-console-user-settings - openshift-controller-manager - openshift-controller-manager-operator - openshift-dns - openshift-dns-operator - openshift-etcd - openshift-etcd-operator - openshift-host-network - openshift-image-registry - openshift-infra - openshift-ingress - openshift-ingress-canary - openshift-ingress-operator - openshift-insights - openshift-kni-infra - openshift-kube-apiserver - openshift-kube-apiserver-operator - openshift-kube-controller-manager - openshift-kube-controller-manager-operator - openshift-kube-descheduler-operator - openshift-kube-scheduler - openshift-kube-scheduler-operator - openshift-kube-storage-version-migrator - openshift-kube-storage-version-migrator-operator - openshift-kubevirt-infra - openshift-machine-api - openshift-machine-config-operator - openshift-marketplace - openshift-monitoring - openshift-multus - openshift-network-diagnostics - openshift-network-operator - openshift-node - openshift-oauth-apiserver - openshift-openstack-infra - openshift-operator-lifecycle-manager - openshift-operators - openshift-ovirt-infra - openshift-sdn - openshift-service-ca - openshift-service-ca-operator - openshift-user-workload-monitoring - openshift-vsphere-infra include: null podsHavingTooManyRestarts: includingInitContainers: true podRestartThreshold: 100 thresholdPriority: null thresholdPriorityClassName: "" kind: ConfigMap metadata: creationTimestamp: "2021-08-14T11:42:31Z" name: cluster namespace: openshift-kube-descheduler-operator ownerReferences: - apiVersion: v1 kind: KubeDescheduler name: cluster uid: 36a8eada-233f-4fe9-871f-5f4980fae252 resourceVersion: "61415" uid: e0d10e61-8e7f-45ca-9450-4eb6461766e1 Based on the above moving the bug to assigned state.
@RamaKasturi as mentioned in https://github.com/openshift/cluster-kube-descheduler-operator/pull/208#discussion_r688919199, this *should* take the duration format (ie, "5m") but there was a bug in the CRD that rejected it. I have opened a PR to update the CRD to take a string instead of an integer, when that merges and a new build is available please re-test thank you
I do not see that pod does not get evicted in the specified seconds in podLifetime.Upon further checking i see that configmap still has maxPodLifeTimeSeconds as 86400 [root@localhost roottest]# oc get kubedescheduler cluster -o yaml apiVersion: operator.openshift.io/v1 kind: KubeDescheduler metadata: creationTimestamp: "2021-08-20T03:12:43Z" generation: 4 name: cluster namespace: openshift-kube-descheduler-operator resourceVersion: "142064" uid: a2010876-ddcd-482c-a128-a2c41669e304 spec: deschedulingIntervalSeconds: 60 logLevel: Normal managementState: Managed operatorLogLevel: Normal profileCustomizations: podLifetime: 5m profiles: - LifecycleAndUtilization [root@localhost roottest]# oc get configmap cluster -o yaml apiVersion: v1 data: policy.yaml: | apiVersion: descheduler/v1alpha1 ignorePvcPods: true kind: DeschedulerPolicy strategies: LowNodeUtilization: enabled: true params: includeSoftConstraints: false namespaces: null nodeResourceUtilizationThresholds: targetThresholds: cpu: 50 memory: 50 pods: 50 thresholds: cpu: 20 memory: 20 pods: 20 thresholdPriority: null thresholdPriorityClassName: "" PodLifeTime: enabled: true params: includeSoftConstraints: false namespaces: exclude: - kube-system - openshift-apiserver - openshift-apiserver-operator - openshift-authentication - openshift-authentication-operator - openshift-cloud-controller-manager - openshift-cloud-controller-manager-operator - openshift-cloud-credential-operator - openshift-cluster-csi-drivers - openshift-cluster-machine-approver - openshift-cluster-node-tuning-operator - openshift-cluster-samples-operator - openshift-cluster-storage-operator - openshift-cluster-version - openshift-config - openshift-config-managed - openshift-config-operator - openshift-console - openshift-console-operator - openshift-console-user-settings - openshift-controller-manager - openshift-controller-manager-operator - openshift-dns - openshift-dns-operator - openshift-etcd - openshift-etcd-operator - openshift-host-network - openshift-image-registry - openshift-infra - openshift-ingress - openshift-ingress-canary - openshift-ingress-operator - openshift-insights - openshift-kni-infra - openshift-kube-apiserver - openshift-kube-apiserver-operator - openshift-kube-controller-manager - openshift-kube-controller-manager-operator - openshift-kube-descheduler-operator - openshift-kube-scheduler - openshift-kube-scheduler-operator - openshift-kube-storage-version-migrator - openshift-kube-storage-version-migrator-operator - openshift-kubevirt-infra - openshift-logging - openshift-machine-api - openshift-machine-config-operator - openshift-marketplace - openshift-monitoring - openshift-multus - openshift-network-diagnostics - openshift-network-operator - openshift-node - openshift-oauth-apiserver - openshift-openstack-infra - openshift-operator-lifecycle-manager - openshift-operators - openshift-operators-redhat - openshift-ovirt-infra - openshift-sdn - openshift-service-ca - openshift-service-ca-operator - openshift-user-workload-monitoring - openshift-vsphere-infra include: null podLifeTime: maxPodLifeTimeSeconds: 86400 thresholdPriority: null thresholdPriorityClassName: "" [root@localhost roottest]# oc get csv -n openshift-kube-descheduler-operator NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.9.0-202108171159 Kube Descheduler Operator 4.9.0-202108171159 Succeeded
Are we able to give an example of the expected input format either in the description below the text box or as light text in the text box?
That format ("5m") is correct, it is the standard "duration" format as indicated by the type in the CRD. We still had a bug in the code that processed it I have opened https://github.com/openshift/cluster-kube-descheduler-operator/pull/213 to fix this new bug and added a test to ensure it works now. When that merges, please test again
Tested bug with podLifetime as 420s, 5m & 1h with the build below and i see that the right value gets copied over to configmap and pod gets evicted correctly. [knarra@knarra cucushift]$ oc get csv -n openshift-kube-descheduler-operator NAME DISPLAY VERSION REPLACES PHASE clusterkubedescheduleroperator.4.9.0-202108210926 Kube Descheduler Operator 4.9.0-202108210926 Succeeded Logs when podLifetime set to 5m: ==================================== I0823 09:12:22.895313 1 pod_lifetime.go:92] "Processing node" node="ip-10-0-154-172.us-east-2.compute.internal" I0823 09:12:22.936389 1 evictions.go:130] "Evicted pod" pod="knarra/hello2-68c946777-klrbf" reason="PodLifeTime" I0823 09:12:22.936601 1 pod_lifetime.go:98] "Evicted pod because it exceeded its lifetime" pod="knarra/hello2-68c946777-klrbf" maxPodLifeTime=300 I0823 09:12:22.936616 1 pod_lifetime.go:92] "Processing node" node="ip-10-0-158-188.us-east-2.compute.internal" I0823 09:12:22.987746 1 pod_lifetime.go:92] "Processing node" node="ip-10-0-163-175.us-east-2.compute.internal" I0823 09:12:23.015997 1 pod_lifetime.go:92] "Processing node" node="ip-10-0-183-30.us-east-2.compute.internal" I0823 09:12:23.343887 1 pod_lifetime.go:92] "Processing node" node="ip-10-0-199-190.us-east-2.compute.internal" I0823 09:12:23.548121 1 pod_lifetime.go:92] "Processing node" node="ip-10-0-222-237.us-east-2.compute.internal" I0823 09:12:23.743412 1 descheduler.go:151] "Number of evicted pods" totalEvicted=1 Logs when podLifetime set to 420s: ================================== I0823 09:20:14.729656 1 evictions.go:130] "Evicted pod" pod="knarra/hello2-68c946777-dskdz" reason="PodLifeTime" I0823 09:20:14.729798 1 pod_lifetime.go:98] "Evicted pod because it exceeded its lifetime" pod="knarra/hello2-68c946777-dskdz" maxPodLifeTime=420 I0823 09:20:14.729815 1 descheduler.go:151] "Number of evicted pods" totalEvicted=1 Logs when podLifetime set to 1h: ====================================== I0823 10:20:32.952833 1 pod_lifetime.go:92] "Processing node" node="ip-10-0-154-172.us-east-2.compute.internal" I0823 10:20:33.003504 1 evictions.go:130] "Evicted pod" pod="knarra/hello2-68c946777-qc288" reason="PodLifeTime" I0823 10:20:33.005576 1 pod_lifetime.go:98] "Evicted pod because it exceeded its lifetime" pod="knarra/hello2-68c946777-qc288" maxPodLifeTime=3600 I0823 10:20:33.005737 1 pod_lifetime.go:92] "Processing node" node="ip-10-0-158-188.us-east-2.compute.internal" I0823 10:20:33.028240 1 pod_lifetime.go:92] "Processing node" node="ip-10-0-163-175.us-east-2.compute.internal" I0823 10:20:33.051591 1 pod_lifetime.go:92] "Processing node" node="ip-10-0-183-30.us-east-2.compute.internal" I0823 10:20:33.074338 1 pod_lifetime.go:92] "Processing node" node="ip-10-0-199-190.us-east-2.compute.internal" I0823 10:20:33.106819 1 pod_lifetime.go:92] "Processing node" node="ip-10-0-222-237.us-east-2.compute.internal" I0823 10:20:33.128137 1 duplicates.go:99] "Processing node" node="ip-10-0-154-172.us-east-2.compute.internal" I0823 10:20:33.153901 1 duplicates.go:99] "Processing node" node="ip-10-0-158-188.us-east-2.compute.internal" I0823 10:20:33.175834 1 duplicates.go:99] "Processing node" node="ip-10-0-163-175.us-east-2.compute.internal" I0823 10:20:33.196243 1 duplicates.go:99] "Processing node" node="ip-10-0-183-30.us-east-2.compute.internal" I0823 10:20:33.365329 1 duplicates.go:99] "Processing node" node="ip-10-0-199-190.us-east-2.compute.internal" I0823 10:20:33.565394 1 duplicates.go:99] "Processing node" node="ip-10-0-222-237.us-east-2.compute.internal" I0823 10:20:33.765639 1 toomanyrestarts.go:78] "Processing node" node="ip-10-0-154-172.us-east-2.compute.internal" I0823 10:20:33.967391 1 toomanyrestarts.go:78] "Processing node" node="ip-10-0-158-188.us-east-2.compute.internal" I0823 10:20:34.164954 1 toomanyrestarts.go:78] "Processing node" node="ip-10-0-163-175.us-east-2.compute.internal" I0823 10:20:34.367402 1 toomanyrestarts.go:78] "Processing node" node="ip-10-0-183-30.us-east-2.compute.internal" I0823 10:20:34.567434 1 toomanyrestarts.go:78] "Processing node" node="ip-10-0-199-190.us-east-2.compute.internal" I0823 10:20:34.765857 1 toomanyrestarts.go:78] "Processing node" node="ip-10-0-222-237.us-east-2.compute.internal" I0823 10:20:35.191698 1 topologyspreadconstraint.go:139] "Processing namespaces for topology spread constraints" I0823 10:20:37.620460 1 nodeutilization.go:170] "Node is appropriately utilized" node="ip-10-0-154-172.us-east-2.compute.internal" usage=map[cpu:582m memory:2963Mi pods:27] usagePercentage=map[cpu:38.8 memory:45.297287936197975 pods:10.8] I0823 10:20:37.620511 1 nodeutilization.go:167] "Node is overutilized" node="ip-10-0-158-188.us-east-2.compute.internal" usage=map[cpu:1759m memory:5597Mi pods:34] usagePercentage=map[cpu:50.25714285714286 memory:38.32301504027063 pods:13.6] I0823 10:20:37.620534 1 nodeutilization.go:167] "Node is overutilized" node="ip-10-0-163-175.us-east-2.compute.internal" usage=map[cpu:2 memory:6787Mi pods:57] usagePercentage=map[cpu:57.142857142857146 memory:46.47102073938123 pods:22.8] I0823 10:20:37.620555 1 nodeutilization.go:170] "Node is appropriately utilized" node="ip-10-0-183-30.us-east-2.compute.internal" usage=map[cpu:657m memory:2791Mi pods:19] usagePercentage=map[cpu:43.8 memory:42.126836389535974 pods:7.6] I0823 10:20:37.620913 1 nodeutilization.go:167] "Node is overutilized" node="ip-10-0-199-190.us-east-2.compute.internal" usage=map[cpu:1795m memory:5472Mi pods:32] usagePercentage=map[cpu:51.285714285714285 memory:37.467112046519354 pods:12.8] I0823 10:20:37.621511 1 nodeutilization.go:170] "Node is appropriately utilized" node="ip-10-0-222-237.us-east-2.compute.internal" usage=map[cpu:612m memory:2097Mi pods:26] usagePercentage=map[cpu:40.8 memory:31.651729096688264 pods:10.4] I0823 10:20:37.621532 1 lownodeutilization.go:99] "Criteria for a node under utilization" CPU=20 Mem=20 Pods=20 I0823 10:20:37.621543 1 lownodeutilization.go:100] "Number of underutilized nodes" totalNumber=0 I0823 10:20:37.621573 1 lownodeutilization.go:113] "Criteria for a node above target utilization" CPU=50 Mem=50 Pods=50 I0823 10:20:37.621585 1 lownodeutilization.go:114] "Number of overutilized nodes" totalNumber=3 I0823 10:20:37.621596 1 lownodeutilization.go:117] "No node is underutilized, nothing to do here, you might tune your thresholds further" I0823 10:20:37.621608 1 descheduler.go:151] "Number of evicted pods" totalEvicted=1 Based on the above moving bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759