Bug 2035927
| Summary: | Cannot enable HighNodeUtilization scheduler profile | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | RamaKasturi <knarra> |
| Component: | kube-scheduler | Assignee: | Jan Chaloupka <jchaloup> |
| Status: | CLOSED ERRATA | QA Contact: | RamaKasturi <knarra> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.10 | CC: | aos-bugs, mfojtik |
| Target Milestone: | --- | ||
| Target Release: | 4.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-10 16:36:47 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The NodeResourcesMostAllocated plugin was removed as part of https://github.com/kubernetes/kubernetes/pull/101822. For more detail see https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/2458-node-resource-score-strategy enhancement. From https://github.com/kubernetes/kubernetes/blob/1727cea64c1d53f7badbc03b0ca77543283e6157/pkg/scheduler/apis/config/v1beta2/default_plugins.go: ``` Score: v1beta2.PluginSet{ Enabled: []v1beta2.Plugin{ {Name: names.NodeResourcesBalancedAllocation, Weight: pointer.Int32Ptr(1)}, {Name: names.ImageLocality, Weight: pointer.Int32Ptr(1)}, {Name: names.InterPodAffinity, Weight: pointer.Int32Ptr(1)}, {Name: names.NodeResourcesFit, Weight: pointer.Int32Ptr(1)}, {Name: names.NodeAffinity, Weight: pointer.Int32Ptr(1)}, // Weight is doubled because: // - This is a score coming from user preference. // - It makes its signal comparable to NodeResourcesFit.LeastAllocated. {Name: names.PodTopologySpread, Weight: pointer.Int32Ptr(2)}, {Name: names.TaintToleration, Weight: pointer.Int32Ptr(1)}, }, }, ``` From https://github.com/kubernetes/kubernetes/blob/release-1.22/pkg/scheduler/apis/config/v1beta1/default_plugins.go: ``` Score: &v1beta1.PluginSet{ Enabled: []v1beta1.Plugin{ {Name: names.NodeResourcesBalancedAllocation, Weight: pointer.Int32Ptr(1)}, {Name: names.ImageLocality, Weight: pointer.Int32Ptr(1)}, {Name: names.InterPodAffinity, Weight: pointer.Int32Ptr(1)}, {Name: names.NodeResourcesLeastAllocated, Weight: pointer.Int32Ptr(1)}, {Name: names.NodeAffinity, Weight: pointer.Int32Ptr(1)}, {Name: names.NodePreferAvoidPods, Weight: pointer.Int32Ptr(10000)}, // Weight is doubled because: // - This is a score coming from user preference. // - It makes its signal comparable to NodeResourcesLeastAllocated. {Name: names.PodTopologySpread, Weight: pointer.Int32Ptr(2)}, {Name: names.TaintToleration, Weight: pointer.Int32Ptr(1)}, }, }, ``` NodeResourcesLeastAllocated turns into NodeResourcesFit. kubescheduler.config.k8s.io/v1beta2 after applying the fix:
```
...
profiles:
- pluginConfig:
...
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: NodeResourcesFitArgs
scoringStrategy:
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
type: MostAllocated
name: NodeResourcesFit
...
plugins:
...
score:
enabled:
- name: ImageLocality
weight: 1
- name: InterPodAffinity
weight: 1
- name: NodeAffinity
weight: 1
- name: PodTopologySpread
weight: 2
- name: TaintToleration
weight: 1
- name: NodeResourcesFit
weight: 5
...
```
No sign of NodeResourcesBalancedAllocation as in the previous case. NodeResourcesLeastAllocated completely gone, only NodeResourcesFit kept with the "type: MostAllocated" configuration.
In 4.9 case with HighNodeUtilization profile on:
```
profiles:
- pluginConfig:
...
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: NodeResourcesFitArgs
scoringStrategy:
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
type: LeastAllocated
name: NodeResourcesFit
...
plugins:
...
score:
enabled:
- name: NodeResourcesBalancedAllocation
weight: 1
- name: ImageLocality
weight: 1
- name: InterPodAffinity
weight: 1
- name: NodeAffinity
weight: 1
- name: NodePreferAvoidPods
weight: 10000
- name: PodTopologySpread
weight: 2
- name: TaintToleration
weight: 1
- name: NodeResourcesMostAllocated
weight: 0
...
```
NodeResourcesMostAllocated weight is 0 making it appear disabled. However, based on https://github.com/openshift/kubernetes/blob/release-4.9/pkg/scheduler/framework/runtime/framework.go#L293-L299:
```
for _, e := range profile.Plugins.Score.Enabled {
// a weight of zero is not permitted, plugins can be disabled explicitly
// when configured.
f.scorePluginWeight[e.Name] = int(e.Weight)
if f.scorePluginWeight[e.Name] == 0 {
f.scorePluginWeight[e.Name] = 1
}
```
the plugin is enabled as expected. Thus, no need to backport the change to 4.9. The NodeResourcesBalancedAllocation plugin is enabled since https://github.com/openshift/cluster-kube-scheduler-operator/pull/379 has not merged yet.
Due to higher priority tasks I have been able to resolve this issue in time. Moving to the next sprint. Tested with latest nightly build which is 4.10.0-0.nightly-2022-01-18-044014 and i still see NodeResourcesBalancedAllocation which is not expected to be present after the bug fix here. As suggested by jan and maciej i am going to wait until the bug https://bugzilla.redhat.com/show_bug.cgi?id=2033751 moves to ON_QA to verify this bug. Have tried verifying the bug with the build below but still see NodeResourceBalancedAllocation parameters, so moving the bug to assigned state.
[knarra@knarra ~]$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.0-0.nightly-2022-01-22-102609 True False 3h13m Cluster version is 4.10.0-0.nightly-2022-01-22-102609
profiles:
- pluginConfig:
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: DefaultPreemptionArgs
minCandidateNodesAbsolute: 100
minCandidateNodesPercentage: 10
name: DefaultPreemption
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta2
hardPodAffinityWeight: 1
kind: InterPodAffinityArgs
name: InterPodAffinity
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: NodeAffinityArgs
name: NodeAffinity
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: NodeResourcesBalancedAllocationArgs
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
name: NodeResourcesBalancedAllocation
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: NodeResourcesFitArgs
scoringStrategy:
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
type: LeastAllocated
name: NodeResourcesFit
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta2
defaultingType: System
kind: PodTopologySpreadArgs
name: PodTopologySpread
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta2
bindTimeoutSeconds: 600
kind: VolumeBindingArgs
name: VolumeBinding
plugins:
bind:
enabled:
- name: DefaultBinder
weight: 0
filter:
enabled:
- name: NodeUnschedulable
weight: 0
- name: NodeName
weight: 0
- name: TaintToleration
weight: 0
- name: NodeAffinity
weight: 0
- name: NodePorts
weight: 0
- name: NodeResourcesFit
weight: 0
- name: VolumeRestrictions
weight: 0
- name: EBSLimits
weight: 0
- name: GCEPDLimits
weight: 0
- name: NodeVolumeLimits
weight: 0
- name: AzureDiskLimits
weight: 0
- name: VolumeBinding
weight: 0
- name: VolumeZone
weight: 0
- name: PodTopologySpread
weight: 0
- name: InterPodAffinity
weight: 0
multiPoint: {}
permit: {}
postBind: {}
postFilter:
enabled:
- name: DefaultPreemption
weight: 0
preBind:
enabled:
- name: VolumeBinding
weight: 0
preFilter:
enabled:
- name: NodeResourcesFit
weight: 0
- name: NodePorts
weight: 0
- name: VolumeRestrictions
weight: 0
- name: PodTopologySpread
weight: 0
- name: InterPodAffinity
weight: 0
- name: VolumeBinding
weight: 0
- name: NodeAffinity
weight: 0
preScore:
enabled:
- name: InterPodAffinity
weight: 0
- name: PodTopologySpread
weight: 0
- name: TaintToleration
weight: 0
- name: NodeAffinity
weight: 0
queueSort:
enabled:
- name: PrioritySort
weight: 0
reserve:
enabled:
- name: VolumeBinding
weight: 0
score:
enabled:
- name: NodeResourcesBalancedAllocation
weight: 1
- name: ImageLocality
weight: 1
- name: InterPodAffinity
weight: 1
- name: NodeResourcesFit
weight: 1
- name: NodeAffinity
weight: 1
- name: PodTopologySpread
weight: 2
- name: TaintToleration
weight: 1
schedulerName: default-scheduler
------------------------------------Configuration File Contents End Here---------------------------------
Verified with the build below and i could successfully enable HighNodeUtilization profile, did not see any crash with respect to kube-scheduler while enabling this profile.
[knarra@knarra ~]$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.0-0.nightly-2022-01-24-070025 True False 5h24m Cluster version is 4.10.0-0.nightly-2022-01-24-070025
-------------------------Configuration File Contents Start Here----------------------
apiVersion: kubescheduler.config.k8s.io/v1beta3
clientConnection:
acceptContentTypes: ""
burst: 100
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /etc/kubernetes/static-pod-resources/configmaps/scheduler-kubeconfig/kubeconfig
qps: 50
enableContentionProfiling: true
enableProfiling: true
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: true
leaseDuration: 2m17s
renewDeadline: 1m47s
resourceLock: configmaps
resourceName: kube-scheduler
resourceNamespace: openshift-kube-scheduler
retryPeriod: 26s
parallelism: 16
percentageOfNodesToScore: 0
podInitialBackoffSeconds: 1
podMaxBackoffSeconds: 10
profiles:
- pluginConfig:
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: DefaultPreemptionArgs
minCandidateNodesAbsolute: 100
minCandidateNodesPercentage: 10
name: DefaultPreemption
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
hardPodAffinityWeight: 1
kind: InterPodAffinityArgs
name: InterPodAffinity
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: NodeAffinityArgs
name: NodeAffinity
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: NodeResourcesBalancedAllocationArgs
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
name: NodeResourcesBalancedAllocation
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: NodeResourcesFitArgs
scoringStrategy:
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
type: MostAllocated
name: NodeResourcesFit
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
defaultingType: System
kind: PodTopologySpreadArgs
name: PodTopologySpread
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
bindTimeoutSeconds: 600
kind: VolumeBindingArgs
name: VolumeBinding
plugins:
bind: {}
filter: {}
multiPoint:
enabled:
- name: PrioritySort
weight: 0
- name: NodeUnschedulable
weight: 0
- name: NodeName
weight: 0
- name: TaintToleration
weight: 3
- name: NodeAffinity
weight: 2
- name: NodePorts
weight: 0
- name: NodeResourcesFit
weight: 1
- name: VolumeRestrictions
weight: 0
- name: EBSLimits
weight: 0
- name: GCEPDLimits
weight: 0
- name: NodeVolumeLimits
weight: 0
- name: AzureDiskLimits
weight: 0
- name: VolumeBinding
weight: 0
- name: VolumeZone
weight: 0
- name: PodTopologySpread
weight: 2
- name: InterPodAffinity
weight: 2
- name: DefaultPreemption
weight: 0
- name: NodeResourcesBalancedAllocation
weight: 1
- name: ImageLocality
weight: 1
- name: DefaultBinder
weight: 0
permit: {}
postBind: {}
postFilter: {}
preBind: {}
preFilter: {}
preScore: {}
queueSort: {}
reserve: {}
score:
disabled:
- name: NodeResourcesBalancedAllocation
weight: 0
enabled:
- name: NodeResourcesFit
weight: 5
schedulerName: default-scheduler
------------------------------------Configuration File Contents End Here---------------------------------
Enabled LowNodeUtilization profile and no error seen with the same.
-------------------------Configuration File Contents Start Here----------------------
apiVersion: kubescheduler.config.k8s.io/v1beta3
clientConnection:
acceptContentTypes: ""
burst: 100
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /etc/kubernetes/static-pod-resources/configmaps/scheduler-kubeconfig/kubeconfig
qps: 50
enableContentionProfiling: true
enableProfiling: true
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: true
leaseDuration: 2m17s
renewDeadline: 1m47s
resourceLock: configmaps
resourceName: kube-scheduler
resourceNamespace: openshift-kube-scheduler
retryPeriod: 26s
parallelism: 16
percentageOfNodesToScore: 0
podInitialBackoffSeconds: 1
podMaxBackoffSeconds: 10
profiles:
- pluginConfig:
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: DefaultPreemptionArgs
minCandidateNodesAbsolute: 100
minCandidateNodesPercentage: 10
name: DefaultPreemption
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
hardPodAffinityWeight: 1
kind: InterPodAffinityArgs
name: InterPodAffinity
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: NodeAffinityArgs
name: NodeAffinity
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: NodeResourcesBalancedAllocationArgs
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
name: NodeResourcesBalancedAllocation
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: NodeResourcesFitArgs
scoringStrategy:
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
type: LeastAllocated
name: NodeResourcesFit
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
defaultingType: System
kind: PodTopologySpreadArgs
name: PodTopologySpread
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
bindTimeoutSeconds: 600
kind: VolumeBindingArgs
name: VolumeBinding
plugins:
bind: {}
filter: {}
multiPoint:
enabled:
- name: PrioritySort
weight: 0
- name: NodeUnschedulable
weight: 0
- name: NodeName
weight: 0
- name: TaintToleration
weight: 3
- name: NodeAffinity
weight: 2
- name: NodePorts
weight: 0
- name: NodeResourcesFit
weight: 1
- name: VolumeRestrictions
weight: 0
- name: EBSLimits
weight: 0
- name: GCEPDLimits
weight: 0
- name: NodeVolumeLimits
weight: 0
- name: AzureDiskLimits
weight: 0
- name: VolumeBinding
weight: 0
- name: VolumeZone
weight: 0
- name: PodTopologySpread
weight: 2
- name: InterPodAffinity
weight: 2
- name: DefaultPreemption
weight: 0
- name: NodeResourcesBalancedAllocation
weight: 1
- name: ImageLocality
weight: 1
- name: DefaultBinder
weight: 0
permit: {}
postBind: {}
postFilter: {}
preBind: {}
preFilter: {}
preScore: {}
queueSort: {}
reserve: {}
score: {}
schedulerName: default-scheduler
------------------------------------Configuration File Contents End Here---------------------------------
Enabled "NoScoring" profile and do not see any issues with the same.
-------------------------Configuration File Contents Start Here----------------------
apiVersion: kubescheduler.config.k8s.io/v1beta3
clientConnection:
acceptContentTypes: ""
burst: 100
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /etc/kubernetes/static-pod-resources/configmaps/scheduler-kubeconfig/kubeconfig
qps: 50
enableContentionProfiling: true
enableProfiling: true
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: true
leaseDuration: 2m17s
renewDeadline: 1m47s
resourceLock: configmaps
resourceName: kube-scheduler
resourceNamespace: openshift-kube-scheduler
retryPeriod: 26s
parallelism: 16
percentageOfNodesToScore: 0
podInitialBackoffSeconds: 1
podMaxBackoffSeconds: 10
profiles:
- pluginConfig:
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: DefaultPreemptionArgs
minCandidateNodesAbsolute: 100
minCandidateNodesPercentage: 10
name: DefaultPreemption
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
hardPodAffinityWeight: 1
kind: InterPodAffinityArgs
name: InterPodAffinity
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: NodeAffinityArgs
name: NodeAffinity
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: NodeResourcesBalancedAllocationArgs
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
name: NodeResourcesBalancedAllocation
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
kind: NodeResourcesFitArgs
scoringStrategy:
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
type: LeastAllocated
name: NodeResourcesFit
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
defaultingType: System
kind: PodTopologySpreadArgs
name: PodTopologySpread
- args:
apiVersion: kubescheduler.config.k8s.io/v1beta3
bindTimeoutSeconds: 600
kind: VolumeBindingArgs
name: VolumeBinding
plugins:
bind: {}
filter: {}
multiPoint:
enabled:
- name: PrioritySort
weight: 0
- name: NodeUnschedulable
weight: 0
- name: NodeName
weight: 0
- name: TaintToleration
weight: 3
- name: NodeAffinity
weight: 2
- name: NodePorts
weight: 0
- name: NodeResourcesFit
weight: 1
- name: VolumeRestrictions
weight: 0
- name: EBSLimits
weight: 0
- name: GCEPDLimits
weight: 0
- name: NodeVolumeLimits
weight: 0
- name: AzureDiskLimits
weight: 0
- name: VolumeBinding
weight: 0
- name: VolumeZone
weight: 0
- name: PodTopologySpread
weight: 2
- name: InterPodAffinity
weight: 2
- name: DefaultPreemption
weight: 0
- name: NodeResourcesBalancedAllocation
weight: 1
- name: ImageLocality
weight: 1
- name: DefaultBinder
weight: 0
permit: {}
postBind: {}
postFilter: {}
preBind: {}
preFilter: {}
preScore:
disabled:
- name: '*'
weight: 0
queueSort: {}
reserve: {}
score:
disabled:
- name: '*'
weight: 0
schedulerName: default-scheduler
------------------------------------Configuration File Contents End Here---------------------------------
Based on the above moving bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |
Description of problem: When Enabling HigNodeUtilization scheduler profile the scheduler pods goes into crashLoopBackOffState with error “profiles[0].plugins.score.enabled[6]: Invalid value: "NodeResourcesMostAllocated": was removed in version "kubescheduler.config.k8s.io/v1beta2" (KubeSchedulerConfiguration is version "kubescheduler.config.k8s.io/v1beta2")” Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2021-12-20-231053 How reproducible: Always Steps to Reproduce: 1. Install latest 4.10 cluster 2. Now run the command below to enable HigNodeUtilization scheduler profile 3. oc patch Scheduler cluster --type='json' -p='[{"op": "add", "path": "/spec/profile", "value":"HighNodeUtilization"}]' Actual results: Scheduler pod goes into CrashLoopBackOffState with error “profiles[0].plugins.score.enabled[6]: Invalid value: "NodeResourcesMostAllocated": was removed in version "kubescheduler.config.k8s.io/v1beta2" (KubeSchedulerConfiguration is version "kubescheduler.config.k8s.io/v1beta2")” openshift-kube-scheduler-master-00.knarra2712.qe.devcluster.openshift.com 2/3 CrashLoopBackOff 8 (2m51s ago) 19m openshift-kube-scheduler-master-01.knarra2712.qe.devcluster.openshift.com 3/3 Running 0 80m openshift-kube-scheduler-master-02.knarra2712.qe.devcluster.openshift.com 3/3 Running 0 81m Expected results: Scheduler pod should not go into CrashLoopBackOffState or right way to enable HighNodeUtilization from 4.10 should be defined Additional info: