Bug 1693559

Summary: sd-bus: deal with cookie overruns
Product: Red Hat Enterprise Linux 7 Reporter: wbs9399
Component: systemdAssignee: Jan Synacek <jsynacek>
Status: CLOSED ERRATA QA Contact: Frantisek Sumsal <fsumsal>
Severity: high Docs Contact:
Priority: urgent    
Version: 7.4CC: dornelas, fkrska, jerry.cottrell, jrosenta, jsynacek, maupadhy, mrobson, msekleta, rmanes, rmullett, systemd-maint-list, tkimura, uobergfe
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: systemd-219-65.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1694999 1720699 (view as bug list) Environment:
Last Closed: 2019-08-06 12:43:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913, 1688348, 1720699    

Description wbs9399 2019-03-28 07:51:27 UTC
Yesterday one node of my kubernetes cluster became notready. ps -ef showed some docker-runc processes had been running many days

```
root 26579 1303 0 2018 ? 00:00:00 docker-runc --systemd-cgroup=true events --stats c29996ea9566f16616505e7118315635582714308564ba0d9a70f8fb8cf73f0a
root 27841 2913 0 2018 ? 00:00:00 docker-runc --systemd-cgroup=true kill --all 8561b78c9cb19c0d883e30eafc8ff41ddf3007043985271386ffdbafc24d4376 SIGKILL
root 28293 1303 0 2018 ? 00:00:00 docker-runc --systemd-cgroup=true delete 25660e4c1f66593ec33ae57823def641a4c4a9ae1a7c6840afd081961b66e66e
```

After some investigation, I found docker-runc hang when calling systemd.UseSystemd. Below is the stack.

In fact, any dbus method call send to org.freedesktop.systemd1 was not responsed, for example, the below command would wait forever:

dbus-send --system --dest=org.freedesktop.systemd1 --type=method_call --print-reply /org/freedesktop/systemd1 org.freedesktop.DBus.Introspectable.Introspect

Also there were many systemd errors in /var/log/messages:
Jan 4 11:56:31 host-k8s-node001 systemd: Failed to propagate agent release message: Operation not supported

busctl tree reported Failed to introspect object / of service org.freedesktop.systemd1: Connection timed out

Resolved by restarting systemd: systemctl daemon-reexec

more stack info ref: https://github.com/opencontainers/runc/issues/1959

Comment 2 wbs9399 2019-03-28 08:03:02 UTC
This issue fixed by https://github.com/systemd/systemd/pull/11818 in systemd upstream.

Will the rhel embedded systemd cherry-pick this fix? and witch version will resolve this?

Comment 3 Jan Synacek 2019-04-02 09:03:19 UTC
https://github.com/lnykryn/systemd-rhel/pull/322

Comment 5 Lukáš Nykrýn 2019-04-02 10:30:26 UTC
fix merged to staging branch -> https://github.com/lnykryn/systemd-rhel/pull/322 -> post

Comment 6 wbs9399 2019-04-15 08:09:39 UTC
@Lukáš Nykrýn  When will this fix in systemd releases?

Comment 7 Jan Synacek 2019-04-15 08:28:33 UTC
We don't give any release dates. The fix is currently scheduled to be released in 7.7.

Comment 9 Michal Sekletar 2019-06-14 10:58:52 UTC
*** Bug 1719004 has been marked as a duplicate of this bug. ***

Comment 16 Jerry 2019-07-10 12:24:53 UTC
(In reply to Jan Synacek from comment #7)
> We don't give any release dates. The fix is currently scheduled to be
> released in 7.7.

Is there any possibility that this fix can/will be deployed as a patch for RHEL 7.6?

Comment 17 Filip Krska 2019-07-10 14:40:31 UTC
Yep, 7.6 Z-Stream request is tracked by public Bug 1720699, currently ON_QA...

Comment 20 errata-xmlrpc 2019-08-06 12:43:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2091

Comment 21 Rose Colombo 2019-08-08 20:29:07 UTC
*** Bug 1716581 has been marked as a duplicate of this bug. ***

Comment 22 Robb Manes 2019-11-20 21:29:45 UTC
*** Bug 1772365 has been marked as a duplicate of this bug. ***