When booting Fedora-16-TC-x86_64-Live-Desktop, systemd complains about missing sm-client.pid Oct 22 13:31:37 localhost systemd[1]: Failed to read PID file /run/sm-client.pid after start. The service might be broken
Created attachment 529629 [details] /var/log/messages Added the full /var/log/messages file
There's also Oct 22 13:31:38 localhost systemd[1]: Failed to read PID file /var/run/iscsid.pid after start. The service might be broken. Oct 22 13:31:40 localhost systemd[1]: Failed to read PID file /run/sendmail.pid after start. The service might be broken.
sendmail daemonizes itself in a racy way. It exits the original process before the child writes the PID file. When systemd attempts to read the service's PID file, it is not there yet. The proper steps to daemonize a process are described in: http://0pointer.de/public/systemd-man/daemon.html Reassigning to sendmail.
Is it standardised somewhere or is it systemd requirement? I think there is no chance to get such change into sendmail upstream without really strong argument. Could we workaround this in systemd?
I don't know where else it is documented. The argument for fixing sendmail is that the current way is inherently racy. When I start sendmail and I want to know the PID sendmail is running as, I expect to be able to read the information from the PID file. Why would I have to have a sleep before sendmail writes its PID file? And more importantly, how do I know how long I should wait? The successful exit of the original process is the only synchronization event that can tell me that the daemon is fully ready, so the PID file should be present by then. That's why the recommended order of the daemonization steps makes perfect sense. systemd falls back to guessing the main PID of the service when it prints this warning (see "GuessMainPID=" in "man systemd.service"). Most of the time the guessing gets the answer right, but it is not a good idea to depend on it.
You could avoid the racy daemonization in sendmail by working towards adding socket activation to it and switching away from Type=forking, i.e. allow sendmail to be started as a "New-Style Daemon" as described in daemon.html.
(In reply to comment #5) > I don't know where else it is documented. > > The argument for fixing sendmail is that the current way is inherently racy. > When I start sendmail and I want to know the PID sendmail is running as, I > expect to be able to read the information from the PID file. Why would I have > to have a sleep before sendmail writes its PID file? And more importantly, how > do I know how long I should wait? The successful exit of the original process > is the only synchronization event that can tell me that the daemon is fully > ready, so the PID file should be present by then. > If there is no standard, it is more about systemd upstream opinion. I agree that the proposed steps makes sense, but please look on it from the sendmail perspective: Why to rewrite something that worked good for ages? Sendmail upstream is very conservative and mostly calls long-term (maybe non-optimal) behaviour features in order to be backward compatible and not to introduce new bugs. I will try to prepare patch, but personally I am very sceptic about the chance to get accepted.
*** Bug 752293 has been marked as a duplicate of this bug. ***
(In reply to comment #7) > I will try to prepare patch, but personally I am very sceptic about the chance > to get accepted. That would be yet another reason to move on to some other more modern and more secure MTA as default ... or to stop installing a MTA by default at all.
I don't want to start another flame (I am also the postfix maintainer). But it depends what is your definition for "modern". And about the less security - it seems like urban legend today.
(In reply to comment #7) > If there is no standard, it is more about systemd upstream opinion. Just a bit of the reason to ask for all that: It's not so much about 'opinion' it's about basic process supervision, which should have been offered in a sane way by init many years ago. All basic services should be watched and the the admin notified or automatically restarted when something goes wrong. It's not a strict requirement with systemd, but we want all services to able to be supervised. With old-school double forking, the PID file is what we need to be created in a reliable way when the service signifies by the double-fork exit, that it is ready.
I'm surprised priority is Low. Right now sendmail is not running on any f16 installation. Is sendmail not critical to smooth operation? Many OS and non-OS programs and processes send mail to root mailbox. How will users and administrators get the info from root mailbox from all the processes? I always thought sendmail was integral to fedora. By the way - I am running 32 bit - I must have missed that part of the initial info.
(In reply to comment #12) > Right now sendmail is not running on any f16 installation. > Any installation? Really? For me it seems to happen only on some machines. Personally, I cannot reproduce it on any of my machines. But to be fair, I am resetting the severity (which was probably set by the reporter) to medium.
Similar problem has been faced by httpd maintainers. Interesting solution was proposed by Jan Kaluža in bug 728465 comment 14 (but currently it is not 100% reliable). I will try at first to patch the forking and in case of rejection I will probably fall-back to Jan's solution.
(In reply to comment #12) > I'm surprised priority is Low. Right now sendmail is not running on any f16 > installation. For me sendmail runs, even though I'm seeing the warning message. Your problem may or may not be related to this bug. Please boot with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg" and attach the output of the "dmesg" command here. And what does "systemctl status sendmail.service" say?
(In reply to comment #12) > Right now sendmail is not running on any f16 installation. Also please test if adding "GuessMainPID=no" to /lib/systemd/system/sendmail.service makes it work.
a) where do I add "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg" b) in what section do I add "GuessMainPID=no" [Unit], [Service], or [Install]
Here is the output: [root@sendmail system]# systemctl status sendmail.service sendmail.service - Sendmail Mail Transport Agent Loaded: loaded (/lib/systemd/system/sendmail.service; enabled) Active: failed since Thu, 10 Nov 2011 15:48:30 -0600; 18min ago Process: 1432 ExecStart=/usr/sbin/sendmail -bd $SENDMAIL_OPTS $SENDMAIL_OPTARG (code=exited, status=0/SUCCESS) Process: 1427 ExecStartPre=/etc/mail/make aliases (code=exited, status=0/SUCCESS) Process: 1425 ExecStartPre=/etc/mail/make (code=exited, status=0/SUCCESS) Main PID: 1433 (code=exited, status=71) CGroup: name=systemd:/system/sendmail.service
I guessed and added "GuessMainPID=no" to the [Service] section. Now I get this: Nov 10 21:29:04 sendmail systemd[1]: PID 1433 read from file /run/sendmail.pid does not exist. Your service or init script might be broken. Nov 10 21:29:54 sendmail systemd[1]: PID 2395 read from file /run/sendmail.pid does not exist. Your service or init script might be broken. Nov 10 21:29:54 sendmail systemd[1]: sendmail.service: main process exited, code=exited, status=71 Nov 10 21:29:54 sendmail systemd[1]: Unit sendmail.service entered failed state. Nov 10 21:30:55 sendmail systemd[1]: Reloading. Nov 10 21:30:55 sendmail systemd[1]: [/etc/rc.d/init.d/racoon:11] Failed to add LSB Provides name ipsec.service, ignoring: File exists
(In reply to comment #17) > a) where do I add "log_buf_len=1M systemd.log_level=debug > systemd.log_target=kmsg" > I guess these are kernel boot args, i.e. you could add them to grub config or override in grub boot menu. Also is there anything interesting in your /var/log/maillog?
(In reply to comment #20) > (In reply to comment #17) > > a) where do I add "log_buf_len=1M systemd.log_level=debug > > systemd.log_target=kmsg" > > > I guess these are kernel boot args, i.e. you could add them to grub config or > override in grub boot menu. > > Also is there anything interesting in your /var/log/maillog? Actually, there is something interesting in maillog. Sendmail appears to be logging even though the systemctl status sendmail.service reports failed. See log example below: Nov 11 22:46:11 sendmail sm-msp-queue[6049]: pA9JfO7d001984: to=root, delay=2+09:04:47, xdelay=00:00:00, mailer=relay, pri=3181501, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1] Nov 11 22:46:11 sendmail sm-msp-queue[6049]: pA9FTJEQ001502: to=root, ctladdr=root (0/0), delay=2+13:16:52, xdelay=00:00:00, mailer=relay, pri=3630214, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1] Nov 11 22:46:11 sendmail sm-msp-queue[6049]: pA9JfO7e001984: to=root, delay=2+09:04:47, xdelay=00:00:00, mailer=relay, pri=3662825, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1] Nov 11 22:46:11 sendmail sm-msp-queue[6049]: pA9FT5b6001428: to=root@localhost, ctladdr=root (0/0), delay=2+13:17:06, xdelay=00:00:00, mailer=relay, pri=4111533, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1]
(In reply to comment #7) > If there is no standard, it is more about systemd upstream opinion. I agree > that the proposed steps makes sense, but please look on it from the sendmail > perspective: Why to rewrite something that worked good for ages? Sendmail > upstream is very conservative and mostly calls long-term (maybe non-optimal) > behaviour features in order to be backward compatible and not to introduce new > bugs. I have added a workaround to systemd upstream to handle daemons like this gracefully. See the commit message for details: http://cgit.freedesktop.org/systemd/commit/?id=3a11183858af30bc9b4e9dac430dd7541deec19b In short, it uses inotify to wait for the PID file to be ready before marking the service as running. This should make the starting of sendmail reliable. I still recommend that services get fixed to follow the steps described in daemon(7), but the availability of the workaround makes it a lower priority. Relying on the workaround implies a bit higher overhead, but the problem is not an issue of correctness anymore.
Another impact of this bug. Installation of update got stuck for some minutes: ├─packagekitd,1631 │ ├─yumBackend.py,11948 /usr/share/PackageKit/helpers/yum/yumBackend.py update-packages yes... │ │ └─sh,14190 /var/tmp/rpm-tmp.xJvWoR 1 │ │ └─systemctl,14194 try-restart sendmail.service │ ├─{packagekitd},1633 │ └─{packagekitd},1637 Dec 29 20:43:31 dworkin systemd[1]: PID 1105 read from file /run/sm-client.pid does not exist. Your service or init script might be broken. Dec 29 20:43:31 dworkin systemd[1]: Reloading. Dec 29 20:43:32 dworkin systemd[1]: Reloading. Dec 29 20:43:32 dworkin systemd[1]: PID 1105 read from file /run/sm-client.pid does not exist. Your service or init script might be broken. Dec 29 20:45:02 dworkin systemd[1]: sendmail.service operation timed out. Terminating. Dec 29 20:45:02 dworkin systemd[1]: PID 14204 read from file /run/sm-client.pid does not exist. Your service or init script might be broken. Dec 29 20:46:32 dworkin systemd[1]: sendmail.service stopping timed out (2). Killing. Dec 29 20:48:02 dworkin systemd[1]: sendmail.service still around after SIGKILL (2). Entering failed mode. Dec 29 20:48:02 dworkin systemd[1]: Unit sendmail.service entered failed state. Dec 29 20:48:02 dworkin systemd[1]: Reloading.
(In reply to comment #21) > (In reply to comment #20) > > (In reply to comment #17) > > > a) where do I add "log_buf_len=1M systemd.log_level=debug > > > systemd.log_target=kmsg" > > > > > I guess these are kernel boot args, i.e. you could add them to grub config or > > override in grub boot menu. > > > > Also is there anything interesting in your /var/log/maillog? > > Actually, there is something interesting in maillog. Sendmail appears to be > logging even though the systemctl status sendmail.service reports failed. See > log example below: > > Nov 11 22:46:11 sendmail sm-msp-queue[6049]: pA9JfO7d001984: to=root, > delay=2+09:04:47, xdelay=00:00:00, mailer=relay, pri=3181501, > relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1] > Nov 11 22:46:11 sendmail sm-msp-queue[6049]: pA9FTJEQ001502: to=root, > ctladdr=root (0/0), delay=2+13:16:52, xdelay=00:00:00, mailer=relay, > pri=3630214, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by > [127.0.0.1] > <snip> I'm getting the same error. Verified via telnet. "netstat -lpn | grep sendmail" and "netstat -lpn | grep :25 return" null. The audit log shows no selinux problems. $ systemctl status sendmail.service sendmail.service - Sendmail Mail Transport Agent Loaded: loaded (/lib/systemd/system/sendmail.service; enabled) Active: active (running) since Fri, 24 Feb 2012 21:07:45 -0700; 9min ago Process: 3965 ExecStart=/usr/sbin/sendmail $SENDMAIL_OPTS $SENDMAIL_OPTARG (code=exited, status=0/SUCCESS) Process: 3960 ExecStartPre=/etc/mail/make aliases (code=exited, status=0/SUCCESS) Process: 3958 ExecStartPre=/etc/mail/make (code=exited, status=0/SUCCESS) Main PID: 3966 (sendmail) CGroup: name=systemd:/system/sendmail.service └ 3966 sendmail: Queue runner@01:00:00 for /var/spool/mqueue Though the service is running, it isn't apparent to the system. It's machine specific, BTW. I've two other recent installs of F16 where sendmail is working fine. Indicating not a sendmail bug, IMHO... Porting over one of those working configuration scripts doesn't solve the problem. getenforce returns 0, BTW.
F16 with systemd-37-15.fc16.x86_64. Using postfix-2.8.7-1.fc16.x86_64: ls -l /usr/sbin/sendmail lrwxrwxrwx 1 root root 21 Aug 31 2009 /usr/sbin/sendmail -> /etc/alternatives/mta [root@asterisk ~]# ls -l /etc/alternatives/mta lrwxrwxrwx 1 root root 26 Sep 1 2009 /etc/alternatives/mta -> /usr/sbin/sendmail.postfix but can't get postfix to start: PID file /run/sendmail.pid not readable (yet?) after start. Mar 17 14:43:50 systemd[1]: sendmail.service operation timed out. Terminating. Mar 17 14:43:50 systemd[1]: Job sm-client.service/start failed with result 'dependency'. Mar 17 14:43:50 systemd[1]: Unit sendmail.service entered failed state. So it's not just sendmail. What I can do is disable sendmail, and use chkconfig to enable postfix. Sure smells like a systemd issue!
(In reply to comment #25) > F16 with systemd-37-15.fc16.x86_64. Using postfix-2.8.7-1.fc16.x86_64: > > ls -l /usr/sbin/sendmail > lrwxrwxrwx 1 root root 21 Aug 31 2009 /usr/sbin/sendmail -> > /etc/alternatives/mta > [root@asterisk ~]# ls -l /etc/alternatives/mta > lrwxrwxrwx 1 root root 26 Sep 1 2009 /etc/alternatives/mta -> > /usr/sbin/sendmail.postfix > > but can't get postfix to start: > > PID file /run/sendmail.pid not readable (yet?) after start. > Mar 17 14:43:50 systemd[1]: sendmail.service operation timed out. Terminating. > Mar 17 14:43:50 systemd[1]: Job sm-client.service/start failed with result > 'dependency'. > Mar 17 14:43:50 systemd[1]: Unit sendmail.service entered failed state. I certainly wouldn't expect postfix to be started using sendmail's unit files. postfix has its own unit file. > What I can do is disable sendmail, and use chkconfig to enable postfix. This looks like a good idea when you want to use postfix. > Sure smells like a systemd issue! To me it doesn't.
Created attachment 577053 [details] Output of grep sendmail /var/log/messages, interleaved with the output of grep "starting daemon" /var/log/maillog|grep sendmail I'm not sure what's going on. Just looking at the log, you'd get the impression that sendmail hadn't put out a pid file after 24 hours and systemd finally gave up. But that's not what's going on. I interleaved the systemd error messages from /var/log/messages and the "sendmail[...]: starting daemon" messages from maillog to try to get a better idea of what's going on. Not sure that's helpful because it looks like every child gets a log entry too, but I'll attach it anyway just in case it's helpful. It's been a while since I first started sendmail, and systemd is now complaining about the pid file again. I think I see the problem. The /run/sendmail.pid file seems to be overwritten by child processes which then exit when they're done, leaving the PID file stale. Eventually (apparently) it gets rewritten by the main sendmail process and it's valid again. That looks buggy. Here's my evidence. When I first checked a little while back, I'm fairly sure that the pid in /run/sendmail.pid matched a running instance of sendmail. But a later check of the pid file resulted in this: # cat /run/sendmail.pid 29252 /usr/sbin/sendmail -bd -q1h # ps ax|grep 29252 29635 pts/3 S+ 0:00 grep --color=auto 29252 # systemctl status sendmail.service sendmail.service - Sendmail Mail Transport Agent Loaded: loaded (/lib/systemd/system/sendmail.service; enabled) Active: active (running) since Thu, 12 Apr 2012 08:05:24 -0400; 1min 16s ago Process: 29567 ExecStart=/usr/sbin/sendmail -bd $SENDMAIL_OPTS $SENDMAIL_OPTARG (code=exited, status=0/SUCCESS) Process: 29562 ExecStartPre=/etc/mail/make aliases (code=exited, status=0/SUCCESS) Process: 29560 ExecStartPre=/etc/mail/make (code=exited, status=0/SUCCESS) Main PID: 29568 (sendmail) CGroup: name=systemd:/system/sendmail.service └ 29568 sendmail: accepting connections I looked again later, and the pid file changed again: # cat /run/sendmail.pid 29568 /usr/sbin/sendmail -bd -q1h # ps ax|grep 29568 29568 ? Ss 0:00 sendmail: accepting connections 29665 pts/3 S+ 0:00 grep --color=auto 29568 I've checked several times since and haven't seen the pid file change, but this probably depends on when I get email. Hope this helps resolve the issue. I'd rather not have to restart sendmail every day just to keep getting my email.
See my comment: https://bugzilla.redhat.com/show_bug.cgi?id=848172#c18
This message is a reminder that Fedora 16 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '16'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 16's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 16 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.