1466636 – node can not be started for Unit kubepods-burstable.slice already exists

Bug 1466636 - node can not be started for Unit kubepods-burstable.slice already exists

Summary: node can not be started for Unit kubepods-burstable.slice already exists

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	3.6.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.7.0
Assignee:	Ryan Phillips
QA Contact:	Sunil Choudhary
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-30 06:26 UTC by Anping Li
Modified:	2021-05-28 02:26 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2017-11-28 21:59:33 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:3188	0	normal	SHIPPED_LIVE	Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update	2017-11-29 02:34:54 UTC

Description Anping Li 2017-06-30 06:26:43 UTC

Description of problem:
That is the same issue https://github.com/kubernetes/kubernetes/issues/43856.

Upgrade failed for not can not be started. The following message was reported.

"Failed to start ContainerManager failed to initialise top level QOS containers: failed to create top level Burstable QOS cgroup : Unit kubepods-burstable.slice already exists...



Version-Release number of selected component (if applicable):
openshift-ansible-3.6.126.4
openshift-ansible-3.6.126.3

How reproducible:
3 times

Steps to Reproduce:
1. Install OCP v3.6.116 and enabled logging
2. Upgrade to OCP v3.6.126.4


Actual results:
TASK [openshift_node_upgrade : Wait for node to be ready] **********************
FAILED - RETRYING: TASK: openshift_node_upgrade : Wait for node to be ready (24 retries left).
<--snip-->
<--snip-->
FAILED - RETRYING: TASK: openshift_node_upgrade : Wait for node to be ready (1 retries left).
fatal: [openshift-182.lab.eng.nay.redhat.com -> openshift-181.lab.eng.nay.redhat.com]: FAILED! => {
    "attempts": 24, 
    "changed": false, 
    "failed": true, 
    "results": {
        "cmd": "/bin/oc get node openshift-182.lab.eng.nay.redhat.com -o json -n default", 
        "results": [
            {
                "apiVersion": "v1", 
                "kind": "Node", 
                "metadata": {
                    "annotations": {
                        "volumes.kubernetes.io/controller-managed-attach-detach": "true"
                    }, 
                    "creationTimestamp": "2017-06-23T06:02:03Z", 
                    "labels": {
                        "beta.kubernetes.io/arch": "amd64", 
                        "beta.kubernetes.io/os": "linux", 
                        "kubernetes.io/hostname": "openshift-182.lab.eng.nay.redhat.com", 
                        "logging-infra-fluentd": "true"
                    }, 
                    "name": "openshift-182.lab.eng.nay.redhat.com", 
                    "resourceVersion": "346399", 
                    "selfLink": "/api/v1/nodes/openshift-182.lab.eng.nay.redhat.com", 
                    "uid": "7a7d1d6f-57d9-11e7-ba44-fa163ee4c573"
                }, 
                "spec": {
                    "externalID": "openshift-182.lab.eng.nay.redhat.com", 
                    "unschedulable": true
                }, 
                "status": {
                    "addresses": [
                        {
                            "address": "10.66.147.182", 
                            "type": "LegacyHostIP"
                        }, 
                        {
                            "address": "10.66.147.182", 
                            "type": "InternalIP"
                        }, 
                        {
                            "address": "openshift-182.lab.eng.nay.redhat.com", 
                            "type": "Hostname"
                        }
                    ], 
                    "allocatable": {
                        "alpha.kubernetes.io/nvidia-gpu": "0", 
                        "cpu": "2", 
                        "memory": "3779516Ki", 
                        "pods": "20"
                    }, 
                    "capacity": {
                        "alpha.kubernetes.io/nvidia-gpu": "0", 
                        "cpu": "2", 
                        "memory": "3881916Ki", 
                        "pods": "20"
                    }, 
                    "conditions": [
                        {
                            "lastHeartbeatTime": "2017-06-29T07:47:46Z", 
                            "lastTransitionTime": "2017-06-29T07:45:16Z", 
                            "message": "kubelet has sufficient disk space available", 
                            "reason": "KubeletHasSufficientDisk", 
                            "status": "False", 
                            "type": "OutOfDisk"
                        }, 
                        {
                            "lastHeartbeatTime": "2017-06-29T07:47:46Z", 
                            "lastTransitionTime": "2017-06-29T07:45:16Z", 
                            "message": "kubelet has sufficient memory available", 
                            "reason": "KubeletHasSufficientMemory", 
                            "status": "False", 
                            "type": "MemoryPressure"
                        }, 
                        {
                            "lastHeartbeatTime": "2017-06-29T07:47:46Z", 
                            "lastTransitionTime": "2017-06-29T07:45:16Z", 
                            "message": "kubelet has no disk pressure", 
                            "reason": "KubeletHasNoDiskPressure", 
                            "status": "False", 
                            "type": "DiskPressure"
                        }, 
                        {
                            "lastHeartbeatTime": "2017-06-29T07:47:46Z", 
                            "lastTransitionTime": "2017-06-29T07:45:16Z", 
                            "message": "Failed to start ContainerManager failed to initialise top level QOS containers: failed to create top level Burstable QOS cgroup : Unit kubepods-burstable.slice already exists.", 
                            "reason": "KubeletNotReady", 
                            "status": "False", 
                            "type": "Ready"
                        }
                    ], 
                    "daemonEndpoints": {
                        "kubeletEndpoint": {
                            "Port": 10250
                        }
                    }, 
                    "images": [
                        {
                            "names": [
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/openvswitch@sha256:aa06f901ae766a9df7c2821c7ee054502fe2a425f78ad58bcea49f550af894a8", 
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/openvswitch:v3.6.126.3"
                            ], 
                            "sizeBytes": 1175709817
                        }, 
                        {
                            "names": [
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/node@sha256:b0daa784be0eb66d1d4b22909eeaa3da22852904720c58a6217da6e84ed53174", 
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/node:v3.6.126.3"
                            ], 
                            "sizeBytes": 1173977766
                        }, 
                        {
                            "names": [
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/openvswitch@sha256:209a4ab50740ac1eee035a25b69ddde091dbef3c4c27a1cfad71e1ecbaadd178", 
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/openvswitch:v3.6.116"
                            ], 
                            "sizeBytes": 1173945782
                        }, 
                        {
                            "names": [
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/node@sha256:7b4cf43b0aee640a48295d3db82bf5420db7e0e9bd849063c55faca67635526f", 
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/node:v3.6.116"
                            ], 
                            "sizeBytes": 1172213733
                        }, 
                        {
                            "names": [
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose@sha256:ecb48f755f82e65deda2cfd486b3e93aaa4f67b0e45f886383cc1930ca2fac17", 
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose:v3.6.126.3"
                            ], 
                            "sizeBytes": 994087173
                        }, 
                        {
                            "names": [
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose@sha256:3e80f55faa8554ed005f86dfc57a5632fba668260e8bcb7435c5416fe855b58c", 
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose:v3.6.116"
                            ], 
                            "sizeBytes": 992454082
                        }, 
                        {
                            "names": [
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/node@sha256:9ad91b0541ea511529e7ed1515bf7b0ba99e17ed50fa06823ab362cc237c84d4", 
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/node:v3.5.5.24"
                            ], 
                            "sizeBytes": 924830800
                        }, 
                        {
                            "names": [
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose@sha256:36eb9f81d9343a5e08c8d5ee3d54d542a5e87feaf6cb63ecc934f6f3bdfb457e", 
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose:v3.5.5.24"
                            ], 
                            "sizeBytes": 755009078
                        }, 
                        {
                            "names": [
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/openvswitch@sha256:ef280801c8b1dffd16f98ea9958b82a414665af9a7593f57113a6475bb3dc1e8", 
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/openvswitch:v3.5.5.24"
                            ], 
                            "sizeBytes": 424667128
                        }, 
                        {
                            "names": [
                                "registry.access.redhat.com/rhel7/etcd@sha256:5330fa97f3369b8dbdc323cdf1b88694831b1196b8ecd986d0e6e3e00716ae83", 
                                "registry.access.redhat.com/rhel7/etcd:latest"
                            ], 
                            "sizeBytes": 233332973
                        }, 
                        {
                            "names": [
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/logging-fluentd@sha256:c14b969335f5360aeb6ca25674df4ba7702380936d85e2a52d70432818093000", 
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/logging-fluentd:3.5.0"
                            ], 
                            "sizeBytes": 232792538
                        }, 
                        {
                            "names": [
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose-pod@sha256:bb8264430fea618d1db3b9193910672d084278f5b00e8a765b95d2aa4161337e", 
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose-pod:v3.6.116"
                            ], 
                            "sizeBytes": 205768309
                        }, 
                        {
                            "names": [
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose-pod@sha256:417e81e65efe42ad7ac22b8e4cb4b785e2c89fe2bd920583384a0ffab407d802", 
                                "virt-openshift-05.lab.eng.nay.redhat.com:5000/openshift3/ose-pod:v3.5.5.24"
                            ], 
                            "sizeBytes": 205250210
                        }
                    ], 
                    "nodeInfo": {
                        "architecture": "amd64", 
                        "bootID": "2ce9f7bc-b4e9-46b8-8dfc-ab8da383c1af", 
                        "containerRuntimeVersion": "docker://1.12.6", 
                        "kernelVersion": "3.10.0-514.21.2.el7.x86_64", 
                        "kubeProxyVersion": "v1.6.1+5115d708d7", 
                        "kubeletVersion": "v1.6.1+5115d708d7", 
                        "machineID": "42233649eedf45449cef4991305fd32c", 
                        "operatingSystem": "linux", 
                        "osImage": "Red Hat Enterprise Linux Server 7.3 (Maipo)", 
                        "systemUUID": "3EB4E523-8209-4B56-9A4B-ED4D8C644626"
                    }
                }
            }
        ], 
        "returncode": 0
    }, 
    "state": "list"
}

NO MORE HOSTS LEFT *************************************************************

NO MORE HOSTS LEFT *************************************************************
    to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_6/upgrade.retry

PLAY RECAP *********************************************************************
localhost                  : ok=26   changed=0    unreachable=0    failed=0   
openshift-181.lab.eng.nay.redhat.com : ok=383  changed=20   unreachable=0    failed=0   
openshift-182.lab.eng.nay.redhat.com : ok=392  changed=28   unreachable=0    failed=1   
openshift-217.lab.eng.nay.redhat.com : ok=90   changed=2    unreachable=0    failed=0   
openshift-220.lab.eng.nay.redhat.com : ok=56   changed=2    unreachable=0    failed=0   
openshift-221.lab.eng.nay.redhat.com : ok=266  changed=16   unreachable=0    failed=0   



[root@openshift-182 ~]# rpm -qa|grep systemd
systemd-219-30.el7_3.9.x86_64
systemd-sysv-219-30.el7_3.9.x86_64
systemd-libs-219-30.el7_3.9.x86_64
oci-systemd-hook-0.1.7-2.git2788078.el7.x86_64


#for i in $(systemctl list-unit-files --no-legend --no-pager -l | grep --color=never -o .*.slice | grep kubepod);do echo systemctl status $i;systemctl status $i;done

systemctl status kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice

● kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice - libcontainer container kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice
   Loaded: loaded (/run/systemd/system/kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice; static; vendor preset: disabled)
  Drop-In: /run/systemd/system/kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice.d
           └─50-BlockIOAccounting.conf, 50-CPUAccounting.conf, 50-CPUShares.conf, 50-DefaultDependencies.conf, 50-Delegate.conf, 50-Description.conf, 50-MemoryAccounting.conf, 50-MemoryLimit.conf, 50-Wants-kubepods\x2eslice.conf
   Active: inactive (dead) since Thu 2017-06-29 04:11:08 EDT; 3min 38s ago
   CGroup: /kubepods.slice/kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice
           ├─docker-d6b52a917122e840d9f94f2f64d3ce4957503c5a3303e1a13bd814c4d68b03ec.scope
           │ ├─3927 /usr/bin/ruby /usr/bin/fluentd
           │ └─4129 /usr/bin/ruby /usr/bin/fluentd
           └─docker-d14e27c65ff4a74499903914e4905547293bd28658766a760903d6298438ed2c.scope
             └─3681 /usr/bin/pod

Jun 29 03:33:02 container-ha-3.novalocal systemd[1]: Created slice libcontainer container kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice.
Jun 29 03:33:02 container-ha-3.novalocal systemd[1]: Starting libcontainer container kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice.
Jun 29 04:11:08 openshift-182.lab.eng.nay.redhat.com systemd[1]: Removed slice libcontainer container kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice.
Jun 29 04:11:08 openshift-182.lab.eng.nay.redhat.com systemd[1]: Stopping libcontainer container kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice.
systemctl status kubepods.slice
● kubepods.slice - libcontainer container kubepods.slice
   Loaded: loaded (/run/systemd/system/kubepods.slice; static; vendor preset: disabled)
  Drop-In: /run/systemd/system/kubepods.slice.d
           └─50-BlockIOAccounting.conf, 50-CPUAccounting.conf, 50-CPUShares.conf, 50-DefaultDependencies.conf, 50-Delegate.conf, 50-Description.conf, 50-MemoryAccounting.conf, 50-MemoryLimit.conf, 50-Wants--\x2eslice.conf
   Active: inactive (dead) since Thu 2017-06-29 04:11:08 EDT; 3min 38s ago
   Memory: 97.8M (limit: 3.7G)
   CGroup: /kubepods.slice
           └─kubepods-pod53578e22_57df_11e7_9b59_fa163ee0059e.slice
             ├─docker-d6b52a917122e840d9f94f2f64d3ce4957503c5a3303e1a13bd814c4d68b03ec.scope
             │ ├─3927 /usr/bin/ruby /usr/bin/fluentd
             │ └─4129 /usr/bin/ruby /usr/bin/fluentd
             └─docker-d14e27c65ff4a74499903914e4905547293bd28658766a760903d6298438ed2c.scope
               └─3681 /usr/bin/pod

Jun 29 03:32:55 container-ha-3.novalocal systemd[1]: Created slice libcontainer container kubepods.slice.
Jun 29 03:32:55 container-ha-3.novalocal systemd[1]: Starting libcontainer container kubepods.slice.
Jun 29 04:11:08 openshift-182.lab.eng.nay.redhat.com systemd[1]: Removed slice libcontainer container kubepods.slice.
Jun 29 04:11:08 openshift-182.lab.eng.nay.redhat.com systemd[1]: Stopping libcontainer container kubepods.slice

Expected results:


Additional info:

Comment 1 Anping Li 2017-06-30 06:41:52 UTC

As mentioned in https://github.com/kubernetes/kubernetes/issues/43856. the issue can be solved by stopping kubepod service 

for  i in $(systemctl list-unit-files --no-legend --no-pager -l | grep  --color=never -o .*.slice | grep kubepod);do systemctl status  $i; systemctl stop $i;done

Comment 2 Scott Dodson 2017-06-30 13:46:11 UTC

Derek,

Can I get your assessment on how we should fix this if this needs to be addressed by the installer?

Comment 3 Derek Carr 2017-06-30 14:47:21 UTC

we need to bump runc to get https://github.com/opencontainers/runc/pull/1124

Comment 4 Seth Jennings 2017-06-30 14:51:12 UTC

runc PR:
https://github.com/opencontainers/runc/pull/1124

kube issue:
https://github.com/kubernetes/kubernetes/issues/43856

kube master fix (bump runc):
https://github.com/kubernetes/kubernetes/pull/44940

kube 1.6 fix (bump runc):
https://github.com/kubernetes/kubernetes/pull/48117

I'll prepare a PR to bump runc for origin.

Comment 5 Seth Jennings 2017-06-30 15:46:29 UTC

Origin PR:
https://github.com/openshift/origin/pull/14980

Comment 7 Anping Li 2017-07-05 09:07:50 UTC

No such issue when upgrade to atomic-openshift-3.6.132.

Comment 11 errata-xmlrpc 2017-11-28 21:59:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188

Comment 12 W. Trevor King 2021-05-28 02:26:52 UTC

Similar symptoms are back in 4.8.  I've opened bug 1965545 to track (mentioning here to route folks who happen to stumble across this bug while searching for the already-exists log message).

Note You need to log in before you can comment on or make changes to this bug.