Yesterday one node of my kubernetes cluster became notready. ps -ef showed some docker-runc processes had been running many days
root 26579 1303 0 2018 ? 00:00:00 docker-runc --systemd-cgroup=true events --stats c29996ea9566f16616505e7118315635582714308564ba0d9a70f8fb8cf73f0a
root 27841 2913 0 2018 ? 00:00:00 docker-runc --systemd-cgroup=true kill --all 8561b78c9cb19c0d883e30eafc8ff41ddf3007043985271386ffdbafc24d4376 SIGKILL
root 28293 1303 0 2018 ? 00:00:00 docker-runc --systemd-cgroup=true delete 25660e4c1f66593ec33ae57823def641a4c4a9ae1a7c6840afd081961b66e66e
After some investigation, I found docker-runc hang when calling systemd.UseSystemd. Below is the stack.
In fact, any dbus method call send to org.freedesktop.systemd1 was not responsed, for example, the below command would wait forever:
dbus-send --system --dest=org.freedesktop.systemd1 --type=method_call --print-reply /org/freedesktop/systemd1 org.freedesktop.DBus.Introspectable.Introspect
Also there were many systemd errors in /var/log/messages:
Jan 4 11:56:31 host-k8s-node001 systemd: Failed to propagate agent release message: Operation not supported
busctl tree reported Failed to introspect object / of service org.freedesktop.systemd1: Connection timed out
Resolved by restarting systemd: systemctl daemon-reexec
more stack info ref: https://github.com/opencontainers/runc/issues/1959
This issue fixed by https://github.com/systemd/systemd/pull/11818 in systemd upstream.
Will the rhel embedded systemd cherry-pick this fix? and witch version will resolve this?
fix merged to staging branch -> https://github.com/lnykryn/systemd-rhel/pull/322 -> post
@Lukáš Nykrýn When will this fix in systemd releases?
We don't give any release dates. The fix is currently scheduled to be released in 7.7.
*** Bug 1719004 has been marked as a duplicate of this bug. ***
(In reply to Jan Synacek from comment #7)
> We don't give any release dates. The fix is currently scheduled to be
> released in 7.7.
Is there any possibility that this fix can/will be deployed as a patch for RHEL 7.6?
Yep, 7.6 Z-Stream request is tracked by public Bug 1720699, currently ON_QA...
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
*** Bug 1716581 has been marked as a duplicate of this bug. ***