Bug 1532967 - cpumanager should graceful exit other than panic if not set 'kube-reserved' & 'system-reserved'
Summary: cpumanager should graceful exit other than panic if not set 'kube-reserved' &...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.9.0
Assignee: Seth Jennings
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-10 06:31 UTC by DeShuai Ma
Modified: 2018-03-28 14:18 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-03-28 14:18:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 0 None None None 2018-03-28 14:18:45 UTC

Description DeShuai Ma 2018-01-10 06:31:42 UTC
Description of problem:
Clone the upstream bug to ocp-3.9 https://github.com/kubernetes/kubernetes/issues/57239
When enable cpumanager by --cpu-manager-policy=static, but not set --kube-reserved and --system-reserved, kubelet will get panic error in log

Version-Release number of selected component (if applicable):
openshift v3.9.0-0.16.0
kubernetes v1.9.0-beta1
etcd 3.2.8

How reproducible:
Always

Steps to Reproduce:
1. Configure cpu-manager without '--kube-reserved and --system-reserved' the restart node service

# cat /etc/origin/node/node-config.yaml
...
kubeletArguments:
...
  feature-gates:
  - CPUManager=true
  cpu-manager-policy:
  - static
  cpu-manager-reconcile-period:
  - 5s

# systemctl restart atomic-openshift-node

2. Check the node log
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: I0110 05:40:51.560559  115328 container_manager_linux.go:242] container manager verified user specified cgroup-root exists: /
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: I0110 05:40:51.560579  115328 container_manager_linux.go:247] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/origin/openshift.local.volumes ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>} {Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>}]} ExperimentalQOSReserved:map[] ExperimentalCPUManagerPolicy:static ExperimentalCPUManagerReconcilePeriod:5s}
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: I0110 05:40:51.560711  115328 container_manager_linux.go:266] Creating device plugin manager: false
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: I0110 05:40:51.560740  115328 cpu_manager.go:120] [cpumanager] detected CPU topology: &{4 2 1 map[0:{0 0} 2:{0 0} 1:{0 1} 3:{0 1}]}
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: panic: [cpumanager] unable to determine reserved CPU resources for static policy
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: goroutine 208 [running]:
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/cm/cpumanager.NewManager(0xc420cf6870, 0x6, 0x12a05f200, 0xc4212c88f8, 0xc4212676b0, 0xc420d06240, 0x27, 0xffffffffffffffff, 0x0, 0x0, ...)
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: /builddir/build/BUILD/atomic-openshift-git-0.e0f1109/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/cm/cpumanager/cpu_manager.go:124 +0x65b
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/cm.NewContainerManager(0xf0198c0, 0xc42014c830, 0xf017dc0, 0xc420a95740, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: /builddir/build/BUILD/atomic-openshift-git-0.e0f1109/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/cm/container_manager_linux.go:278 +0xb0e
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: github.com/openshift/origin/vendor/k8s.io/kubernetes/cmd/kubelet/app.run(0xc420a35800, 0xc4201c62c0, 0xc42003bee8, 0x1)
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: /builddir/build/BUILD/atomic-openshift-git-0.e0f1109/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/cmd/kubelet/app/server.go:454 +0xc22
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: github.com/openshift/origin/vendor/k8s.io/kubernetes/cmd/kubelet/app.Run(0xc420a35800, 0xc4201c62c0, 0x0, 0x0)
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: /builddir/build/BUILD/atomic-openshift-git-0.e0f1109/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/cmd/kubelet/app/server.go:183 +0xfa
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: github.com/openshift/origin/pkg/cmd/server/kubernetes/node.(*NodeConfig).RunKubelet.func1(0xc4211ca0a0)
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: /builddir/build/BUILD/atomic-openshift-git-0.e0f1109/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/kubernetes/node/node.go:258 +0x3c
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: created by github.com/openshift/origin/pkg/cmd/server/kubernetes/node.(*NodeConfig).RunKubelet
Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: /builddir/build/BUILD/atomic-openshift-git-0.e0f1109/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/kubernetes/node/node.go:257 +0x11f
Jan 10 05:40:51 qe-dma-master-etcd-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=2/INVALIDARGUMENT


Actual results:


Expected results:


Additional info:

Comment 1 Seth Jennings 2018-01-10 17:03:49 UTC
This is fixed in kube in this PR:
https://github.com/kubernetes/kubernetes/pull/57247

Origin PR:
https://github.com/openshift/origin/pull/18051

Comment 3 weiwei jiang 2018-01-23 07:50:22 UTC
Checked with v3.9.0-0.22.0 and found the panic is gone. But this must work with resource reservation for atomic-openshift-node.

# openshift version 
openshift v3.9.0-0.22.0
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.8

Comment 6 errata-xmlrpc 2018-03-28 14:18:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489


Note You need to log in before you can comment on or make changes to this bug.