Description of problem: Clone the upstream bug to ocp-3.9 https://github.com/kubernetes/kubernetes/issues/57239 When enable cpumanager by --cpu-manager-policy=static, but not set --kube-reserved and --system-reserved, kubelet will get panic error in log Version-Release number of selected component (if applicable): openshift v3.9.0-0.16.0 kubernetes v1.9.0-beta1 etcd 3.2.8 How reproducible: Always Steps to Reproduce: 1. Configure cpu-manager without '--kube-reserved and --system-reserved' the restart node service # cat /etc/origin/node/node-config.yaml ... kubeletArguments: ... feature-gates: - CPUManager=true cpu-manager-policy: - static cpu-manager-reconcile-period: - 5s # systemctl restart atomic-openshift-node 2. Check the node log Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: I0110 05:40:51.560559 115328 container_manager_linux.go:242] container manager verified user specified cgroup-root exists: / Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: I0110 05:40:51.560579 115328 container_manager_linux.go:247] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/origin/openshift.local.volumes ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>} {Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>}]} ExperimentalQOSReserved:map[] ExperimentalCPUManagerPolicy:static ExperimentalCPUManagerReconcilePeriod:5s} Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: I0110 05:40:51.560711 115328 container_manager_linux.go:266] Creating device plugin manager: false Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: I0110 05:40:51.560740 115328 cpu_manager.go:120] [cpumanager] detected CPU topology: &{4 2 1 map[0:{0 0} 2:{0 0} 1:{0 1} 3:{0 1}]} Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: panic: [cpumanager] unable to determine reserved CPU resources for static policy Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: goroutine 208 [running]: Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/cm/cpumanager.NewManager(0xc420cf6870, 0x6, 0x12a05f200, 0xc4212c88f8, 0xc4212676b0, 0xc420d06240, 0x27, 0xffffffffffffffff, 0x0, 0x0, ...) Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: /builddir/build/BUILD/atomic-openshift-git-0.e0f1109/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/cm/cpumanager/cpu_manager.go:124 +0x65b Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/cm.NewContainerManager(0xf0198c0, 0xc42014c830, 0xf017dc0, 0xc420a95740, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: /builddir/build/BUILD/atomic-openshift-git-0.e0f1109/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/cm/container_manager_linux.go:278 +0xb0e Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: github.com/openshift/origin/vendor/k8s.io/kubernetes/cmd/kubelet/app.run(0xc420a35800, 0xc4201c62c0, 0xc42003bee8, 0x1) Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: /builddir/build/BUILD/atomic-openshift-git-0.e0f1109/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/cmd/kubelet/app/server.go:454 +0xc22 Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: github.com/openshift/origin/vendor/k8s.io/kubernetes/cmd/kubelet/app.Run(0xc420a35800, 0xc4201c62c0, 0x0, 0x0) Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: /builddir/build/BUILD/atomic-openshift-git-0.e0f1109/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/cmd/kubelet/app/server.go:183 +0xfa Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: github.com/openshift/origin/pkg/cmd/server/kubernetes/node.(*NodeConfig).RunKubelet.func1(0xc4211ca0a0) Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: /builddir/build/BUILD/atomic-openshift-git-0.e0f1109/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/kubernetes/node/node.go:258 +0x3c Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: created by github.com/openshift/origin/pkg/cmd/server/kubernetes/node.(*NodeConfig).RunKubelet Jan 10 05:40:51 qe-dma-master-etcd-1 atomic-openshift-node[115301]: /builddir/build/BUILD/atomic-openshift-git-0.e0f1109/_output/local/go/src/github.com/openshift/origin/pkg/cmd/server/kubernetes/node/node.go:257 +0x11f Jan 10 05:40:51 qe-dma-master-etcd-1 systemd[1]: atomic-openshift-node.service: main process exited, code=exited, status=2/INVALIDARGUMENT Actual results: Expected results: Additional info:
This is fixed in kube in this PR: https://github.com/kubernetes/kubernetes/pull/57247 Origin PR: https://github.com/openshift/origin/pull/18051
Checked with v3.9.0-0.22.0 and found the panic is gone. But this must work with resource reservation for atomic-openshift-node. # openshift version openshift v3.9.0-0.22.0 kubernetes v1.9.1+a0ce1bc657 etcd 3.2.8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0489