Bug 1634092 - [RCA] unable to ensure pod container exists: failed to create container for /kubepods/burstable/podd6089668-b695-11e8-9873-02c892353296 : dbus: connection closed by user
Summary: [RCA] unable to ensure pod container exists: failed to create container for /...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.9.0
Hardware: Unspecified
OS: Linux
high
urgent
Target Milestone: ---
: 3.9.z
Assignee: Ryan Phillips
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-28 18:11 UTC by Bruno Andrade
Modified: 2019-11-20 18:48 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-20 18:48:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Bruno Andrade 2018-09-28 18:11:36 UTC
Description of problem:


The error we get is the following :
unable to ensure pod container exists: failed to create container for /kubepods/burstable/podd6089668-b695-11e8-9873-02c892353296 : dbus: connection closed by user

Example:
oc describe pod eaa-pa-cmdb-sit-1-hd86c
.
.
.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 28s default-scheduler Successfully assigned eaa-pa-cmdb-sit-1-hd86c to ip-172-28-248-145.eu-central-1.compute.internal
Normal SuccessfulMountVolume 28s kubelet, ip-172-28-248-145.eu-central-1.compute.internal MountVolume.SetUp succeeded for volume "default-token-twxfj"
Normal SuccessfulMountVolume 28s kubelet, ip-172-28-248-145.eu-central-1.compute.internal MountVolume.SetUp succeeded for volume "preprod-eaa-sit-volume"
Warning FailedCreatePodContainer 3s (x3 over 28s) kubelet, ip-172-28-248-145.eu-central-1.compute.internal unable to ensure pod container exists: failed to create container for /kubepods/burstable/podd6089668-b695-11e8-9873-02c892353296 : dbus: connection closed by user

Problem solved with:
systemctl restart systemd-logind
systemctl restart dbus
systemctl restart dnsmasq NetworkManager
systemctl restart atomic-openshift-node.service

We need assistance to find the root cause:

Attach the sosreport of the affected node on fubar:

fubar.gsslab.rdu2.redhat.com:/fubar/02185872

Found some interesting logs on journalctl_--no-pager_--unit_atomic-openshift-node that may help to find the RCA.

/journalctl_--no-pager_--unit_atomic-openshift-node:Sep 05 15:42:57 ip-172-28-248-64.eu-central-1.compute.internal atomic-openshift-node[23289]: E0905 15:42:57.738883   23289 dnsmasq.go:105] unable to periodically refresh dnsmasq status: dbus: connection closed by user
./journalctl_--no-pager_--unit_atomic-openshift-node:Sep 05 15:43:27 ip-172-28-248-64.eu-central-1.compute.internal atomic-openshift-node[23289]: E0905 15:43:27.744564   23289 dnsmasq.go:105] unable to periodically refresh dnsmasq status: dbus: connection closed by user
./journalctl_--no-pager_--unit_atomic-openshift-node:Sep 05 15:43:57 ip-172-28-248-64.eu-central-1.compute.internal atomic-openshift-node[23289]: E0905 15:43:57.750316   23289 dnsmasq.go:105] unable to periodically refresh dnsmasq status: dbus: connection closed by user
./journalctl_--no-pager_--unit_atomic-openshift-node:Sep 05 15:44:27 ip-172-28-248-64.eu-central-1.compute.internal atomic-openshift-node[23289]: E0905 15:44:27.750543   23289 dnsmasq.go:105] unable to periodically refresh dnsmasq status: dbus: connection closed by user
./journalctl_--no-pager_--unit_atomic-openshift-node:Sep 05 15:44:57 ip-172-28-248-64.eu-central-1.compute.internal atomic-openshift-node[23289]: E0905 15:44:57.750769   23289 dnsmasq.go:105] unable to periodically refresh dnsmasq status: dbus: connection closed by user
./journalctl_--no-pager_--unit_atomic-openshift-node:Sep 05 15:45:27 ip-172-28-248-64.eu-central-1.compute.internal atomic-openshift-node[23289]: E0905 15:45:27.751035   23289 dnsmasq.go:105] unable to periodically refresh dnsmasq status: dbus: connection closed by user
./journalctl_--no-pager_--unit_atomic-openshift-node:Sep 05 15:45:57 ip-172-28-248-64.eu-central-1.compute.internal atomic-openshift-node[23289]: E0905 15:45:57.751272   23289 dnsmasq.go:105] unable to periodically refresh dnsmasq status: dbus: connection closed by user

Version-Release number of selected component (if applicable):
OCP 3.9

Comment 4 Dan Mace 2019-07-18 15:07:03 UTC
I'm not sure why a Kubelet/container runtime issue is assigned to Routing — I'm reassigning this to Node. Please let me know if I've made a mistake.

Comment 5 Dan Mace 2019-07-18 15:08:46 UTC
Maybe it was sent here because the DNS subsystem was collateral damage of a dbus issue.

Comment 10 Stephen Cuppett 2019-11-20 18:48:58 UTC
OCP 3.6-3.10 is no longer on full support [1]. Marking CLOSED DEFERRED. If you have a customer case with a support exception or have reproduced on 3.11+, please reopen and include those details. When reopening, please set the Target Release to the appropriate version where needed.

[1]: https://access.redhat.com/support/policy/updates/openshift


Note You need to log in before you can comment on or make changes to this bug.