Description of problem: Run openshift cluster in container env, then enable experimental-cgroups-per-qos in node, node become notready. There is some warning event: Failed to start ContainerManager failed to initialise top level QOS containers: failed to create top level Burstable QOS cgroup : mkdir /sys/fs/cgroup/cpuacct,cpu: read-only file system Version-Release number of selected component (if applicable): openshift v3.5.0.4+86a6117 kubernetes v1.5.0-beta.2+225eecc etcd 3.1.0-rc.0 docker-1.12.5-14.el7.x86_64 How reproducible: Always Steps to Reproduce: 1.Enable 'experimental-cgroups-per-qos' in /etc/origin/node/node-config.yaml kubeletArguments: experimental-cgroups-per-qos: - 'true' cgroup-driver: - 'systemd' cgroup-root: - '/' 2.Restart atomic-openshift-node service $ systemctl restart atomic-openshift-node 3.Check node status $ oc get node Actual results: 3. Node become notready [root@openshift-105 ~]# oc get node NAME STATUS AGE openshift-105.lab.sjc.redhat.com NotReady 29m openshift-135.lab.sjc.redhat.com NotReady 4h [root@openshift-105 ~]# oc describe node openshift-135.lab.sjc.redhat.com Name: openshift-135.lab.sjc.redhat.com Role: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/hostname=openshift-135.lab.sjc.redhat.com registry=enabled role=node router=enabled Taints: <none> CreationTimestamp: Mon, 16 Jan 2017 21:30:28 -0500 Phase: Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- OutOfDisk False Tue, 17 Jan 2017 00:49:58 -0500 Mon, 16 Jan 2017 21:30:28 -0500 KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False Tue, 17 Jan 2017 00:49:58 -0500 Mon, 16 Jan 2017 21:30:28 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 17 Jan 2017 00:49:58 -0500 Mon, 16 Jan 2017 21:30:28 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure Ready False Tue, 17 Jan 2017 00:49:58 -0500 Tue, 17 Jan 2017 00:47:15 -0500 KubeletNotReady Failed to start ContainerManager failed to initialise top level QOS containers: failed to create top level Burstable QOS cgroup : mkdir /sys/fs/cgroup/cpuacct,cpu: read-only file system Addresses: 10.14.6.135,10.14.6.135,openshift-135.lab.sjc.redhat.com Capacity: alpha.kubernetes.io/nvidia-gpu: 0 cpu: 2 memory: 3881932Ki pods: 250 Allocatable: alpha.kubernetes.io/nvidia-gpu: 0 cpu: 2 memory: 3881932Ki pods: 250 System Info: Machine ID: 154605e6253a40e68b2a688089a97270 System UUID: 3BB5D0CD-EFFD-4C84-BB97-F8055FAC17E5 Boot ID: 6590e472-7ca5-4fc8-803c-66f727911680 Kernel Version: 3.10.0-514.2.2.el7.x86_64 OS Image: Red Hat Enterprise Linux Server 7.3 (Maipo) Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.5 Kubelet Version: v1.5.0-beta.2+225eecc Kube-Proxy Version: v1.5.0-beta.2+225eecc ExternalID: openshift-135.lab.sjc.redhat.com Non-terminated Pods: (5 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- default docker-registry-2-rfwpj 100m (5%) 0 (0%) 256Mi (6%) 0 (0%) default registry-console-1-o3nv5 0 (0%) 0 (0%) 0 (0%) 0 (0%) default router-1-8dpla 100m (5%) 0 (0%) 256Mi (6%) 0 (0%) install-test cakephp-mysql-example-1-0k36z 0 (0%) 0 (0%) 512Mi (13%) 512Mi (13%) install-test mysql-1-l07k5 0 (0%) 0 (0%) 512Mi (13%) 512Mi (13%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted. CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 200m (10%) 0 (0%) 1536Mi (40%) 1Gi (27%) Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 2m 2m 1 {kubelet openshift-135.lab.sjc.redhat.com} Normal Starting Starting kubelet. 2m 2m 1 {kubelet openshift-135.lab.sjc.redhat.com} Warning ImageGCFailed unable to find data for container / 2m 2m 1 {kubelet openshift-135.lab.sjc.redhat.com} Warning KubeletSetupFailed Failed to start ContainerManager failed to initialise top level QOS containers: failed to create top level Burstable QOS cgroup : mkdir /sys/fs/cgroup/cpuacct,cpu: read-only file system 2m 2m 1 {kubelet openshift-135.lab.sjc.redhat.com} Normal NodeHasSufficientDisk Node openshift-135.lab.sjc.redhat.com status is now: NodeHasSufficientDisk 2m 2m 1 {kubelet openshift-135.lab.sjc.redhat.com} Normal NodeHasSufficientMemory Node openshift-135.lab.sjc.redhat.com status is now: NodeHasSufficientMemory 2m 2m 1 {kubelet openshift-135.lab.sjc.redhat.com} Normal NodeHasNoDiskPressure Node openshift-135.lab.sjc.redhat.com status is now: NodeHasNoDiskPressure 2m 2m 1 {kubelet openshift-135.lab.sjc.redhat.com} Normal NodeNotReady Node openshift-135.lab.sjc.redhat.com status is now: NodeNotReady Expected results: Additional info:
I will try to reproduce.
we need to make sure that the container has RW to /sys/fs/cgroup in containerized install.
Recording a note to my future self that pod cgroups appeared to function correctly when not containerized, so the issue is just localized to the containerized use case. Possible problems could be not having RW on /sys/fs/cgroup, not mounting in /sys/fs/cgroup from the host so the mount point looks different in the container than on the host, etc. Needs some more investigation.
Origin PR: https://github.com/openshift/origin/pull/12545
openshift-ansible PR: https://github.com/openshift/openshift-ansible/pull/3112
doc PR https://github.com/openshift/openshift-docs/pull/3527
origin and openshift-ansible PRs have merged
This has been merged into ocp and is in OCP v3.5.0.10 or newer.
Verified on openshift v3.5.0.14+20b49d0.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0884