Hide Forgot
Description of problem: systemd stops to respond when machine runs for a long time (1-2 days). E.g. 'systemctl stop foo.service' will hang until it timeouts. I see e.g. 'systemd-stdout-syslog-bridge' which tries | # strace -f -p 517 | Process 517 attached - interrupt to quit | sendmsg(6, {msg_name(0)=NULL, msg_iov(5)=[{"<30>", 4}, {"Nov 30 21:12:55 ", 16}, {"dbus-daemon", 11}, {"[1190]: ", 8}, {"dbus[1190]: avc: netlink poll: "..., 39}], msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1190, uid=0, gid=0}}, msg_flags=0}, MSG_NOSIGNAL Or hundreds of cronjobs which are stuck in | sendto(5, "<78>Dec 2 10:01:01 /USR/SBIN/CROND[9650]: (root) CMD (run-parts /etc/cron.hourly)", 82, MSG_NOSIGNAL, NULL, 0^C <unfinished ...> Both FDs above are sockets but I am unable to determine which is the other end. I see these hangups on two machines. Version-Release number of selected component (if applicable): systemd-37-3.fc16.x86_64 kernel-3.1.2-1.fc16.x86_64 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
systemd-stdout-syslog-bridge stucks < 2 minutes after boot: # last reboot system boot 3.1.2-1.fc16.x86 Fri Dec 2 10:18 - 11:41 (01:22) # strace -p 505 Process 505 attached - interrupt to quit sendmsg(6, {msg_name(0)=NULL, msg_iov(5)=[{"<30>", 4}, {"Dec 2 10:19:46 ", 16}, {"ladvd", 5}, {"[1858]: ", 8}, {"ladvd 0.9.2 running", 19}], msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1858, uid=0, gid=0}}, msg_flags=0}, MSG_NOSIGNAL # lsof -p 505 systemd-s 505 root 0u CHR 1,3 0t0 1028 /dev/null systemd-s 505 root 1u CHR 1,3 0t0 1028 /dev/null systemd-s 505 root 2u CHR 1,3 0t0 1028 /dev/null systemd-s 505 root 3w unix 0xffff8803257fc9c0 0t0 11522 /run/systemd/stdout-syslog-bridge systemd-s 505 root 4u unix 0xffff8803257fcd00 0t0 15552 socket systemd-s 505 root 5u 0000 0,9 0 5311 anon_inode systemd-s 505 root 6r unix 0xffff8803257ff400 0t0 15553 socket systemd-s 505 root 7u CHR 1,11 0t0 1034 /dev/kmsg systemd-s 505 root 8u CHR 5,1 0t0 1037 /dev/console systemd-s 505 root 9u CHR 5,1 0t0 1037 /dev/console systemd-s 505 root 10u CHR 5,1 0t0 1037 /dev/console ... --> FD 6 is marked as read-only, but sendmsg() is used on it? After a 'kill 505', the newly respawned 'systemd-stdout-syslog-bridge' seems to work as expected.
Created attachment 539718 [details] systemctl dump --full I can reproduce it on an embedded system with systemd git f6cebb3bd5a00d79c8131637c0f6796a75e6af99: # sendmsg(6, {msg_name(0)=NULL, msg_iov(5)=[{"<30>", 4}, {"Dec 2 17:41:58 ", 16}, {"connmand", 8}, {"[73]: ", 6}, {"connmand[73]: eth0 {newlink} ind"..., 71}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=73, uid=0, gid=0}}, msg_flags=0}, MSG_NOSIGNAL Stuck systemd-stdout-syslog-bridge seems to be triggered by activating the 'syslogd' daemon. The 'systemctl dump --full' output from the embedded system is attached.
Seems to be a problem with socket based activation: 1. systemd opens /dev/log for listening 2. an application (systemd-stdout-syslog-bridge) connects to /dev/log 3. systemd starts 'syslogd', keeps /dev/log fd open which is passed by LISTEN_FDS 4. 'syslogd' is not systemd aware, removes old /dev/log and opens a new one --> The old fd (from LISTEN_FDS) stays alive but nobody reads it 5. application from step 2 logs something but nobody consumes this output --> application stalls This means, it is dangerous to use socket based activation with non systemd aware programs. Unfortunately, I do not know a way to do step 3 (starting syslogd) in such an environment :(
Systemd does not support any syslog implementation that isn't systemd aware: http://www.freedesktop.org/wiki/Software/systemd/syslog Syslog on systemd systems can not properly work without socket activation. You need to patch your syslog daemon or use one that integrates with systemd.
What Kay said. Notice that the sysklogd package will be removed from Fedora (see bug 748495). I recommend rsyslog. syslog-ng can work too, but be careful about its configuration (see bug 742624).