Bug 759383 - systemd stops to respond
Summary: systemd stops to respond
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 16
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-12-02 09:16 UTC by Enrico Scholz
Modified: 2011-12-14 11:55 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-14 11:55:17 UTC
Type: ---


Attachments (Terms of Use)
systemctl dump --full (159.67 KB, text/plain)
2011-12-02 18:05 UTC, Enrico Scholz
no flags Details

Description Enrico Scholz 2011-12-02 09:16:00 UTC
Description of problem:

systemd stops to respond when machine runs for a long time (1-2
days). E.g. 'systemctl stop foo.service' will hang until it timeouts.

I see e.g. 'systemd-stdout-syslog-bridge' which tries

| # strace -f -p 517
| Process 517 attached - interrupt to quit
| sendmsg(6, {msg_name(0)=NULL, msg_iov(5)=[{"<30>", 4}, {"Nov 30 21:12:55 ", 16}, {"dbus-daemon", 11}, {"[1190]: ", 8}, {"dbus[1190]: avc:  netlink poll: "..., 39}], msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1190, uid=0, gid=0}}, msg_flags=0}, MSG_NOSIGNAL

Or hundreds of cronjobs which are stuck in

| sendto(5, "<78>Dec  2 10:01:01 /USR/SBIN/CROND[9650]: (root) CMD (run-parts /etc/cron.hourly)", 82, MSG_NOSIGNAL, NULL, 0^C <unfinished ...>


Both FDs above are sockets but I am unable to determine which is the
other end.

I see these hangups on two machines.


Version-Release number of selected component (if applicable):

systemd-37-3.fc16.x86_64
kernel-3.1.2-1.fc16.x86_64


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Enrico Scholz 2011-12-02 10:56:33 UTC
systemd-stdout-syslog-bridge stucks < 2 minutes after boot:

# last
reboot   system boot  3.1.2-1.fc16.x86 Fri Dec  2 10:18 - 11:41  (01:22)    

# strace -p 505
Process 505 attached - interrupt to quit
sendmsg(6, {msg_name(0)=NULL, msg_iov(5)=[{"<30>", 4}, {"Dec  2 10:19:46 ", 16}, {"ladvd", 5}, {"[1858]: ", 8}, {"ladvd 0.9.2 running", 19}], msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=1858, uid=0, gid=0}}, msg_flags=0}, MSG_NOSIGNAL

# lsof -p 505
systemd-s 505 root    0u   CHR                1,3      0t0  1028 /dev/null
systemd-s 505 root    1u   CHR                1,3      0t0  1028 /dev/null
systemd-s 505 root    2u   CHR                1,3      0t0  1028 /dev/null
systemd-s 505 root    3w  unix 0xffff8803257fc9c0      0t0 11522 /run/systemd/stdout-syslog-bridge
systemd-s 505 root    4u  unix 0xffff8803257fcd00      0t0 15552 socket
systemd-s 505 root    5u  0000                0,9        0  5311 anon_inode
systemd-s 505 root    6r  unix 0xffff8803257ff400      0t0 15553 socket
systemd-s 505 root    7u   CHR               1,11      0t0  1034 /dev/kmsg
systemd-s 505 root    8u   CHR                5,1      0t0  1037 /dev/console
systemd-s 505 root    9u   CHR                5,1      0t0  1037 /dev/console
systemd-s 505 root   10u   CHR                5,1      0t0  1037 /dev/console
...

  --> FD 6 is marked as read-only, but sendmsg() is used on it?


After a 'kill 505', the newly respawned 'systemd-stdout-syslog-bridge'
seems to work as expected.

Comment 2 Enrico Scholz 2011-12-02 18:05:01 UTC
Created attachment 539718 [details]
systemctl dump --full

I can reproduce it on an embedded system with systemd git
f6cebb3bd5a00d79c8131637c0f6796a75e6af99:

# sendmsg(6, {msg_name(0)=NULL, msg_iov(5)=[{"<30>", 4}, {"Dec  2 17:41:58 ", 16}, {"connmand", 8}, {"[73]: ", 6}, {"connmand[73]: eth0 {newlink} ind"..., 71}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS{pid=73, uid=0, gid=0}}, msg_flags=0}, MSG_NOSIGNAL

Stuck systemd-stdout-syslog-bridge seems to be triggered by activating
the 'syslogd' daemon.  The 'systemctl dump --full' output from the embedded
system is attached.

Comment 3 Enrico Scholz 2011-12-02 18:51:45 UTC
Seems to be a problem with socket based activation:

1. systemd opens /dev/log for listening

2. an application (systemd-stdout-syslog-bridge) connects to /dev/log

3. systemd starts 'syslogd', keeps /dev/log fd open which is passed by LISTEN_FDS

4. 'syslogd' is not systemd aware, removes old /dev/log and opens a new one
   --> The old fd (from LISTEN_FDS) stays alive but nobody reads it

5. application from step 2 logs something but nobody consumes this output
   --> application stalls


This means, it is dangerous to use socket based activation with non systemd aware programs. Unfortunately, I do not know a way to do step 3 (starting syslogd) in such an environment :(

Comment 4 Kay Sievers 2011-12-02 18:59:59 UTC
Systemd does not support any syslog implementation that isn't systemd aware:
  http://www.freedesktop.org/wiki/Software/systemd/syslog

Syslog on systemd systems can not properly work without socket activation.
You need to patch your syslog daemon or use one that integrates with systemd.

Comment 5 Michal Schmidt 2011-12-14 11:55:17 UTC
What Kay said. Notice that the sysklogd package will be removed from Fedora (see bug 748495). I recommend rsyslog. syslog-ng can work too, but be careful about its configuration (see bug 742624).


Note You need to log in before you can comment on or make changes to this bug.