Description of problem: That is the same issue https://github.com/kubernetes/kubernetes/issues/43856. Upgrade failed for not can not be started. The following message was reported. "Failed to start ContainerManager failed to initialise top level QOS containers: failed to create top level Burstable QOS cgroup : Unit kubepods-burstable.slice already exists... Version-Release number of selected component (if applicable): openshift-ansible-3.6.126.4 openshift-ansible-3.6.126.3 How reproducible: 3 times Steps to Reproduce: 1. Install OCP v3.6.116 and enabled logging 2. Upgrade to OCP v3.6.126.4 Actual results: TASK [openshift_node_upgrade : Wait for node to be ready] ********************** FAILED - RETRYING: TASK: openshift_node_upgrade : Wait for node to be ready (24 retries left). <--snip--> <--snip--> FAILED - RETRYING: TASK: openshift_node_upgrade : Wait for node to be ready (1 retries left). fatal: [openshift-182.lab.eng.nay.redhat.com -> openshift-181.lab.eng.nay.redhat.com]: FAILED! => { "attempts": 24, "changed": false, "failed": true, "results": { "cmd": "/bin/oc get node openshift-182.lab.eng.nay.redhat.com -o json -n default", "results": [ { "apiVersion": "v1", "kind": "Node", "metadata": { "annotations": { "volumes.kubernetes.io/controller-managed-attach-detach": "true" }, "creationTimestamp": "2017-06-23T06:02:03Z", "labels": { "beta.kubernetes.io/arch": "amd64", "beta.kubernetes.io/os": "linux", "kubernetes.io/hostname": "openshift-182.lab.eng.nay.redhat.com", "logging-infra-fluentd": "true" }, "name": "openshift-182.lab.eng.nay.redhat.com", "resourceVersion": "346399", "selfLink": "/api/v1/nodes/openshift-182.lab.eng.nay.redhat.com", "uid": "7a7d1d6f-57d9-11e7-ba44-fa163ee4c573" }, "spec": { "externalID": "openshift-182.lab.eng.nay.redhat.com", "unschedulable": true }, "status": { "addresses": [ { "address": "10.66.147.182", "type": "LegacyHostIP" }, { "address": "10.66.147.182", "type": "InternalIP" }, { "address": "openshift-182.lab.eng.nay.redhat.com", "type": "Hostname" } ], "allocatable": { "alpha.kubernetes.io/nvidia-gpu": "0", "cpu": "2", "memory": "3779516Ki", "pods": "20" }, "capacity": { "alpha.kubernetes.io/nvidia-gpu": "0", "cpu": "2", "memory": "3881916Ki", "pods": "20" }, "conditions": [ { "lastHeartbeatTime": "2017-06-29T07:47:46Z", "lastTransitionTime": "2017-06-29T07:45:16Z", "message": "kubelet has sufficient disk space available", "reason": "KubeletHasSufficientDisk", "status": "False", "type": "OutOfDisk" }, { "lastHeartbeatTime": "2017-06-29T07:47:46Z", "lastTransitionTime": "2017-06-29T07:45:16Z", "message": "kubelet has sufficient memory available", "reason": "KubeletHasSufficientMemory", "status": "False", "type": "MemoryPressure" }, { "lastHeartbeatTime": "2017-06-29T07:47:46Z", "lastTransitionTime": "2017-06-29T07:45:16Z", "message": "kubelet has no disk pressure", "reason": "KubeletHasNoDiskPressure", "status": "False", "type": "DiskPressure" }, { "lastHeartbeatTime": "2017-06-29T07:47:46Z", "lastTransitionTime": "2017-06-29T07:45:16Z", "message": "Failed to start ContainerManager failed to initialise top level QOS containers: failed to create top level Burstable QOS cgroup : Unit kubepods-burstable.slice already exists.", "reason": "KubeletNotReady", "status": "False", "type": "Ready" } ], "daemonEndpoints": { "kubeletEndpoint": { "Port": 10250 } }, "images": [ { "names": [ "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/openvswitch@sha256:aa06f901ae766a9df7c2821c7ee054502fe2a425f78ad58bcea49f550af894a8", "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/openvswitch:v3.6.126.3" ], "sizeBytes": 1175709817 }, { "names": [ "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/node@sha256:b0daa784be0eb66d1d4b22909eeaa3da22852904720c58a6217da6e84ed53174", "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/node:v3.6.126.3" ], "sizeBytes": 1173977766 }, { "names": [ "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/openvswitch@sha256:209a4ab50740ac1eee035a25b69ddde091dbef3c4c27a1cfad71e1ecbaadd178", "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/openvswitch:v3.6.116" ], "sizeBytes": 1173945782 }, { "names": [ "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/node@sha256:7b4cf43b0aee640a48295d3db82bf5420db7e0e9bd849063c55faca67635526f", "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/node:v3.6.116" ], "sizeBytes": 1172213733 }, { "names": [ "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose@sha256:ecb48f755f82e65deda2cfd486b3e93aaa4f67b0e45f886383cc1930ca2fac17", "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose:v3.6.126.3" ], "sizeBytes": 994087173 }, { "names": [ "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose@sha256:3e80f55faa8554ed005f86dfc57a5632fba668260e8bcb7435c5416fe855b58c", "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose:v3.6.116" ], "sizeBytes": 992454082 }, { "names": [ "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/node@sha256:9ad91b0541ea511529e7ed1515bf7b0ba99e17ed50fa06823ab362cc237c84d4", "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/node:v3.5.5.24" ], "sizeBytes": 924830800 }, { "names": [ "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose@sha256:36eb9f81d9343a5e08c8d5ee3d54d542a5e87feaf6cb63ecc934f6f3bdfb457e", "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose:v3.5.5.24" ], "sizeBytes": 755009078 }, { "names": [ "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/openvswitch@sha256:ef280801c8b1dffd16f98ea9958b82a414665af9a7593f57113a6475bb3dc1e8", "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/openvswitch:v3.5.5.24" ], "sizeBytes": 424667128 }, { "names": [ "registry.access.redhat.com/rhel7/etcd@sha256:5330fa97f3369b8dbdc323cdf1b88694831b1196b8ecd986d0e6e3e00716ae83", "registry.access.redhat.com/rhel7/etcd:latest" ], "sizeBytes": 233332973 }, { "names": [ "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/logging-fluentd@sha256:c14b969335f5360aeb6ca25674df4ba7702380936d85e2a52d70432818093000", "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/logging-fluentd:3.5.0" ], "sizeBytes": 232792538 }, { "names": [ "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose-pod@sha256:bb8264430fea618d1db3b9193910672d084278f5b00e8a765b95d2aa4161337e", "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose-pod:v3.6.116" ], "sizeBytes": 205768309 }, { "names": [ "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose-pod@sha256:417e81e65efe42ad7ac22b8e4cb4b785e2c89fe2bd920583384a0ffab407d802", "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose-pod:v3.5.5.24" ], "sizeBytes": 205250210 } ], "nodeInfo": { "architecture": "amd64", "bootID": "2ce9f7bc-b4e9-46b8-8dfc-ab8da383c1af", "containerRuntimeVersion": "docker://1.12.6", "kernelVersion": "3.10.0-514.21.2.el7.x86_64", "kubeProxyVersion": "v1.6.1+5115d708d7", "kubeletVersion": "v1.6.1+5115d708d7", "machineID": "42233649eedf45449cef4991305fd32c", "operatingSystem": "linux", "osImage": "Red Hat Enterprise Linux Server 7.3 (Maipo)", "systemUUID": "3EB4E523-8209-4B56-9A4B-ED4D8C644626" } } } ], "returncode": 0 }, "state": "list" } NO MORE HOSTS LEFT ************************************************************* NO MORE HOSTS LEFT ************************************************************* to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_6/upgrade.retry PLAY RECAP ********************************************************************* localhost : ok=26 changed=0 unreachable=0 failed=0 openshift-181.lab.eng.nay.redhat.com : ok=383 changed=20 unreachable=0 failed=0 openshift-182.lab.eng.nay.redhat.com : ok=392 changed=28 unreachable=0 failed=1 openshift-217.lab.eng.nay.redhat.com : ok=90 changed=2 unreachable=0 failed=0 openshift-220.lab.eng.nay.redhat.com : ok=56 changed=2 unreachable=0 failed=0 openshift-221.lab.eng.nay.redhat.com : ok=266 changed=16 unreachable=0 failed=0 [root@openshift-182 ~]# rpm -qa|grep systemd systemd-219-30.el7_3.9.x86_64 systemd-sysv-219-30.el7_3.9.x86_64 systemd-libs-219-30.el7_3.9.x86_64 oci-systemd-hook-0.1.7-2.git2788078.el7.x86_64 #for i in $(systemctl list-unit-files --no-legend --no-pager -l | grep --color=never -o .*.slice | grep kubepod);do echo systemctl status $i;systemctl status $i;done systemctl status kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice ● kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice - libcontainer container kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice Loaded: loaded (/run/systemd/system/kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice; static; vendor preset: disabled) Drop-In: /run/systemd/system/kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice.d └─50-BlockIOAccounting.conf, 50-CPUAccounting.conf, 50-CPUShares.conf, 50-DefaultDependencies.conf, 50-Delegate.conf, 50-Description.conf, 50-MemoryAccounting.conf, 50-MemoryLimit.conf, 50-Wants-kubepods\x2eslice.conf Active: inactive (dead) since Thu 2017-06-29 04:11:08 EDT; 3min 38s ago CGroup: /kubepods.slice/kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice ├─docker-d6b52a917122e840d9f94f2f64d3ce4957503c5a3303e1a13bd814c4d68b03ec.scope │ ├─3927 /usr/bin/ruby /usr/bin/fluentd │ └─4129 /usr/bin/ruby /usr/bin/fluentd └─docker-d14e27c65ff4a74499903914e4905547293bd28658766a760903d6298438ed2c.scope └─3681 /usr/bin/pod Jun 29 03:33:02 container-ha-3.novalocal systemd[1]: Created slice libcontainer container kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice. Jun 29 03:33:02 container-ha-3.novalocal systemd[1]: Starting libcontainer container kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice. Jun 29 04:11:08 openshift-182.lab.eng.nay.redhat.com systemd[1]: Removed slice libcontainer container kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice. Jun 29 04:11:08 openshift-182.lab.eng.nay.redhat.com systemd[1]: Stopping libcontainer container kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice. systemctl status kubepods.slice ● kubepods.slice - libcontainer container kubepods.slice Loaded: loaded (/run/systemd/system/kubepods.slice; static; vendor preset: disabled) Drop-In: /run/systemd/system/kubepods.slice.d └─50-BlockIOAccounting.conf, 50-CPUAccounting.conf, 50-CPUShares.conf, 50-DefaultDependencies.conf, 50-Delegate.conf, 50-Description.conf, 50-MemoryAccounting.conf, 50-MemoryLimit.conf, 50-Wants--\x2eslice.conf Active: inactive (dead) since Thu 2017-06-29 04:11:08 EDT; 3min 38s ago Memory: 97.8M (limit: 3.7G) CGroup: /kubepods.slice └─kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice ├─docker-d6b52a917122e840d9f94f2f64d3ce4957503c5a3303e1a13bd814c4d68b03ec.scope │ ├─3927 /usr/bin/ruby /usr/bin/fluentd │ └─4129 /usr/bin/ruby /usr/bin/fluentd └─docker-d14e27c65ff4a74499903914e4905547293bd28658766a760903d6298438ed2c.scope └─3681 /usr/bin/pod Jun 29 03:32:55 container-ha-3.novalocal systemd[1]: Created slice libcontainer container kubepods.slice. Jun 29 03:32:55 container-ha-3.novalocal systemd[1]: Starting libcontainer container kubepods.slice. Jun 29 04:11:08 openshift-182.lab.eng.nay.redhat.com systemd[1]: Removed slice libcontainer container kubepods.slice. Jun 29 04:11:08 openshift-182.lab.eng.nay.redhat.com systemd[1]: Stopping libcontainer container kubepods.slice Expected results: Additional info:
As mentioned in https://github.com/kubernetes/kubernetes/issues/43856. the issue can be solved by stopping kubepod service for i in $(systemctl list-unit-files --no-legend --no-pager -l | grep --color=never -o .*.slice | grep kubepod);do systemctl status $i; systemctl stop $i;done
Derek, Can I get your assessment on how we should fix this if this needs to be addressed by the installer?
we need to bump runc to get https://github.com/opencontainers/runc/pull/1124
runc PR: https://github.com/opencontainers/runc/pull/1124 kube issue: https://github.com/kubernetes/kubernetes/issues/43856 kube master fix (bump runc): https://github.com/kubernetes/kubernetes/pull/44940 kube 1.6 fix (bump runc): https://github.com/kubernetes/kubernetes/pull/48117 I'll prepare a PR to bump runc for origin.
Origin PR: https://github.com/openshift/origin/pull/14980
No such issue when upgrade to atomic-openshift-3.6.132.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188
Similar symptoms are back in 4.8. I've opened bug 1965545 to track (mentioning here to route folks who happen to stumble across this bug while searching for the already-exists log message).