| Summary: | systemd stops to respond | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Enrico Scholz <rh-bugzilla> | ||||
| Component: | systemd | Assignee: | systemd-maint | ||||
| Status: | CLOSED CANTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 16 | CC: | johannbg, lpoetter, metherid, mschmidt, notting, plautrba, systemd-maint | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-12-14 11:55:17 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
systemd-stdout-syslog-bridge stucks < 2 minutes after boot:
# last
reboot system boot 3.1.2-1.fc16.x86 Fri Dec 2 10:18 - 11:41 (01:22)
# strace -p 505
Process 505 attached - interrupt to quit
sendmsg(6, {msg_name(0)=NULL, msg_iov(5)=[{"<30>", 4}, {"Dec 2 10:19:46 ", 16}, {"ladvd", 5}, {"[1858]: ", 8}, {"ladvd 0.9.2 running", 19}], msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1858, uid=0, gid=0}}, msg_flags=0}, MSG_NOSIGNAL
# lsof -p 505
systemd-s 505 root 0u CHR 1,3 0t0 1028 /dev/null
systemd-s 505 root 1u CHR 1,3 0t0 1028 /dev/null
systemd-s 505 root 2u CHR 1,3 0t0 1028 /dev/null
systemd-s 505 root 3w unix 0xffff8803257fc9c0 0t0 11522 /run/systemd/stdout-syslog-bridge
systemd-s 505 root 4u unix 0xffff8803257fcd00 0t0 15552 socket
systemd-s 505 root 5u 0000 0,9 0 5311 anon_inode
systemd-s 505 root 6r unix 0xffff8803257ff400 0t0 15553 socket
systemd-s 505 root 7u CHR 1,11 0t0 1034 /dev/kmsg
systemd-s 505 root 8u CHR 5,1 0t0 1037 /dev/console
systemd-s 505 root 9u CHR 5,1 0t0 1037 /dev/console
systemd-s 505 root 10u CHR 5,1 0t0 1037 /dev/console
...
--> FD 6 is marked as read-only, but sendmsg() is used on it?
After a 'kill 505', the newly respawned 'systemd-stdout-syslog-bridge'
seems to work as expected.
Created attachment 539718 [details]
systemctl dump --full
I can reproduce it on an embedded system with systemd git
f6cebb3bd5a00d79c8131637c0f6796a75e6af99:
# sendmsg(6, {msg_name(0)=NULL, msg_iov(5)=[{"<30>", 4}, {"Dec 2 17:41:58 ", 16}, {"connmand", 8}, {"[73]: ", 6}, {"connmand[73]: eth0 {newlink} ind"..., 71}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=73, uid=0, gid=0}}, msg_flags=0}, MSG_NOSIGNAL
Stuck systemd-stdout-syslog-bridge seems to be triggered by activating
the 'syslogd' daemon. The 'systemctl dump --full' output from the embedded
system is attached.
Seems to be a problem with socket based activation: 1. systemd opens /dev/log for listening 2. an application (systemd-stdout-syslog-bridge) connects to /dev/log 3. systemd starts 'syslogd', keeps /dev/log fd open which is passed by LISTEN_FDS 4. 'syslogd' is not systemd aware, removes old /dev/log and opens a new one --> The old fd (from LISTEN_FDS) stays alive but nobody reads it 5. application from step 2 logs something but nobody consumes this output --> application stalls This means, it is dangerous to use socket based activation with non systemd aware programs. Unfortunately, I do not know a way to do step 3 (starting syslogd) in such an environment :( Systemd does not support any syslog implementation that isn't systemd aware: http://www.freedesktop.org/wiki/Software/systemd/syslog Syslog on systemd systems can not properly work without socket activation. You need to patch your syslog daemon or use one that integrates with systemd. What Kay said. Notice that the sysklogd package will be removed from Fedora (see bug 748495). I recommend rsyslog. syslog-ng can work too, but be careful about its configuration (see bug 742624). |
Description of problem: systemd stops to respond when machine runs for a long time (1-2 days). E.g. 'systemctl stop foo.service' will hang until it timeouts. I see e.g. 'systemd-stdout-syslog-bridge' which tries | # strace -f -p 517 | Process 517 attached - interrupt to quit | sendmsg(6, {msg_name(0)=NULL, msg_iov(5)=[{"<30>", 4}, {"Nov 30 21:12:55 ", 16}, {"dbus-daemon", 11}, {"[1190]: ", 8}, {"dbus[1190]: avc: netlink poll: "..., 39}], msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1190, uid=0, gid=0}}, msg_flags=0}, MSG_NOSIGNAL Or hundreds of cronjobs which are stuck in | sendto(5, "<78>Dec 2 10:01:01 /USR/SBIN/CROND[9650]: (root) CMD (run-parts /etc/cron.hourly)", 82, MSG_NOSIGNAL, NULL, 0^C <unfinished ...> Both FDs above are sockets but I am unable to determine which is the other end. I see these hangups on two machines. Version-Release number of selected component (if applicable): systemd-37-3.fc16.x86_64 kernel-3.1.2-1.fc16.x86_64 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: