Bug 1634092

Summary: [RCA] unable to ensure pod container exists: failed to create container for /kubepods/burstable/podd6089668-b695-11e8-9873-02c892353296 : dbus: connection closed by user
Product: OpenShift Container Platform Reporter: Bruno Andrade <bandrade>
Component: NodeAssignee: Ryan Phillips <rphillips>
Status: CLOSED DEFERRED QA Contact: Sunil Choudhary <schoudha>
Severity: urgent Docs Contact:
Priority: high    
Version: 3.9.0CC: aos-bugs, dahernan, ggore, jokerman, jrosenta, maupadhy, mmccomas, mtaru, pamoedom, rhowe, rphillips
Target Milestone: ---   
Target Release: 3.9.z   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-20 18:48:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bruno Andrade 2018-09-28 18:11:36 UTC
Description of problem:


The error we get is the following :
unable to ensure pod container exists: failed to create container for /kubepods/burstable/podd6089668-b695-11e8-9873-02c892353296 : dbus: connection closed by user

Example:
oc describe pod eaa-pa-cmdb-sit-1-hd86c
.
.
.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 28s default-scheduler Successfully assigned eaa-pa-cmdb-sit-1-hd86c to ip-172-28-248-145.eu-central-1.compute.internal
Normal SuccessfulMountVolume 28s kubelet, ip-172-28-248-145.eu-central-1.compute.internal MountVolume.SetUp succeeded for volume "default-token-twxfj"
Normal SuccessfulMountVolume 28s kubelet, ip-172-28-248-145.eu-central-1.compute.internal MountVolume.SetUp succeeded for volume "preprod-eaa-sit-volume"
Warning FailedCreatePodContainer 3s (x3 over 28s) kubelet, ip-172-28-248-145.eu-central-1.compute.internal unable to ensure pod container exists: failed to create container for /kubepods/burstable/podd6089668-b695-11e8-9873-02c892353296 : dbus: connection closed by user

Problem solved with:
systemctl restart systemd-logind
systemctl restart dbus
systemctl restart dnsmasq NetworkManager
systemctl restart atomic-openshift-node.service

We need assistance to find the root cause:

Attach the sosreport of the affected node on fubar:

fubar.gsslab.rdu2.redhat.com:/fubar/02185872

Found some interesting logs on journalctl_--no-pager_--unit_atomic-openshift-node that may help to find the RCA.

/journalctl_--no-pager_--unit_atomic-openshift-node:Sep 05 15:42:57 ip-172-28-248-64.eu-central-1.compute.internal atomic-openshift-node[23289]: E0905 15:42:57.738883   23289 dnsmasq.go:105] unable to periodically refresh dnsmasq status: dbus: connection closed by user
./journalctl_--no-pager_--unit_atomic-openshift-node:Sep 05 15:43:27 ip-172-28-248-64.eu-central-1.compute.internal atomic-openshift-node[23289]: E0905 15:43:27.744564   23289 dnsmasq.go:105] unable to periodically refresh dnsmasq status: dbus: connection closed by user
./journalctl_--no-pager_--unit_atomic-openshift-node:Sep 05 15:43:57 ip-172-28-248-64.eu-central-1.compute.internal atomic-openshift-node[23289]: E0905 15:43:57.750316   23289 dnsmasq.go:105] unable to periodically refresh dnsmasq status: dbus: connection closed by user
./journalctl_--no-pager_--unit_atomic-openshift-node:Sep 05 15:44:27 ip-172-28-248-64.eu-central-1.compute.internal atomic-openshift-node[23289]: E0905 15:44:27.750543   23289 dnsmasq.go:105] unable to periodically refresh dnsmasq status: dbus: connection closed by user
./journalctl_--no-pager_--unit_atomic-openshift-node:Sep 05 15:44:57 ip-172-28-248-64.eu-central-1.compute.internal atomic-openshift-node[23289]: E0905 15:44:57.750769   23289 dnsmasq.go:105] unable to periodically refresh dnsmasq status: dbus: connection closed by user
./journalctl_--no-pager_--unit_atomic-openshift-node:Sep 05 15:45:27 ip-172-28-248-64.eu-central-1.compute.internal atomic-openshift-node[23289]: E0905 15:45:27.751035   23289 dnsmasq.go:105] unable to periodically refresh dnsmasq status: dbus: connection closed by user
./journalctl_--no-pager_--unit_atomic-openshift-node:Sep 05 15:45:57 ip-172-28-248-64.eu-central-1.compute.internal atomic-openshift-node[23289]: E0905 15:45:57.751272   23289 dnsmasq.go:105] unable to periodically refresh dnsmasq status: dbus: connection closed by user

Version-Release number of selected component (if applicable):
OCP 3.9

Comment 4 Dan Mace 2019-07-18 15:07:03 UTC
I'm not sure why a Kubelet/container runtime issue is assigned to Routing — I'm reassigning this to Node. Please let me know if I've made a mistake.

Comment 5 Dan Mace 2019-07-18 15:08:46 UTC
Maybe it was sent here because the DNS subsystem was collateral damage of a dbus issue.

Comment 10 Stephen Cuppett 2019-11-20 18:48:58 UTC
OCP 3.6-3.10 is no longer on full support [1]. Marking CLOSED DEFERRED. If you have a customer case with a support exception or have reproduced on 3.11+, please reopen and include those details. When reopening, please set the Target Release to the appropriate version where needed.

[1]: https://access.redhat.com/support/policy/updates/openshift