748171 – sm-client.pid not found

Bug 748171 - sm-client.pid not found

Summary: sm-client.pid not found

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	sendmail
Sub Component:
Version:	16
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Jaroslav Škarvada
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	752293 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-10-22 18:43 UTC by Yann Droneaud
Modified:	2013-02-14 01:23 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-02-14 01:23:28 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
/var/log/messages (22.54 KB, application/x-gzip) 2011-10-22 18:44 UTC, Yann Droneaud	no flags	Details
Output of grep sendmail /var/log/messages, interleaved with the output of grep "starting daemon" /var/log/maillog\|grep sendmail (48.41 KB, text/plain) 2012-04-12 12:18 UTC, Scott Schmit	no flags	Details
View All

Description Yann Droneaud 2011-10-22 18:43:28 UTC

When booting Fedora-16-TC-x86_64-Live-Desktop, systemd complains about missing sm-client.pid

Oct 22 13:31:37 localhost systemd[1]: Failed to read PID file /run/sm-client.pid after start. The service might be broken

Comment 1 Yann Droneaud 2011-10-22 18:44:23 UTC

Created attachment 529629 [details]
/var/log/messages

Added the full /var/log/messages file

Comment 2 Yann Droneaud 2011-10-22 18:58:40 UTC

There's also

Oct 22 13:31:38 localhost systemd[1]: Failed to read PID file /var/run/iscsid.pid after start. The service might be broken.
Oct 22 13:31:40 localhost systemd[1]: Failed to read PID file /run/sendmail.pid after start. The service might be broken.

Comment 3 Michal Schmidt 2011-10-23 16:25:34 UTC

sendmail daemonizes itself in a racy way. It exits the original process before the child writes the PID file. When systemd attempts to read the service's PID file, it is not there yet.

The proper steps to daemonize a process are described in:
http://0pointer.de/public/systemd-man/daemon.html

Reassigning to sendmail.

Comment 4 Jaroslav Škarvada 2011-10-24 07:29:50 UTC

Is it standardised somewhere or is it systemd requirement? I think there is no chance to get such change into sendmail upstream without really strong argument. Could we workaround this in systemd?

Comment 5 Michal Schmidt 2011-10-24 11:31:07 UTC

I don't know where else it is documented.

The argument for fixing sendmail is that the current way is inherently racy. When I start sendmail and I want to know the PID sendmail is running as, I expect to be able to read the information from the PID file. Why would I have to have a sleep before sendmail writes its PID file? And more importantly, how do I know how long I should wait? The successful exit of the original process is the only synchronization event that can tell me that the daemon is fully ready, so the PID file should be present by then.

That's why the recommended order of the daemonization steps makes perfect sense.

systemd falls back to guessing the main PID of the service when it prints this warning (see "GuessMainPID=" in "man systemd.service"). Most of the time the guessing gets the answer right, but it is not a good idea to depend on it.

Comment 6 Michal Schmidt 2011-10-24 13:46:48 UTC

You could avoid the racy daemonization in sendmail by working towards adding socket activation to it and switching away from Type=forking, i.e. allow sendmail to be started as a "New-Style Daemon" as described in daemon.html.

Comment 7 Jaroslav Škarvada 2011-11-09 11:01:32 UTC

(In reply to comment #5)
> I don't know where else it is documented.
>
> The argument for fixing sendmail is that the current way is inherently racy.
> When I start sendmail and I want to know the PID sendmail is running as, I
> expect to be able to read the information from the PID file. Why would I have
> to have a sleep before sendmail writes its PID file? And more importantly, how
> do I know how long I should wait? The successful exit of the original process
> is the only synchronization event that can tell me that the daemon is fully
> ready, so the PID file should be present by then.
> 
If there is no standard, it is more about systemd upstream opinion. I agree that the proposed steps makes sense, but please look on it from the sendmail perspective: Why to rewrite something that worked good for ages? Sendmail upstream is very conservative and mostly calls long-term (maybe non-optimal) behaviour features in order to be backward compatible and not to introduce new bugs.

I will try to prepare patch, but personally I am very sceptic about the chance to get accepted.

Comment 8 Jaroslav Škarvada 2011-11-09 11:03:00 UTC

*** Bug 752293 has been marked as a duplicate of this bug. ***

Comment 9 Mads Kiilerich 2011-11-09 11:09:55 UTC

(In reply to comment #7)
> I will try to prepare patch, but personally I am very sceptic about the chance
> to get accepted.

That would be yet another reason to move on to some other more modern and more secure MTA as default ... or to stop installing a MTA by default at all.

Comment 10 Jaroslav Škarvada 2011-11-09 11:23:29 UTC

I don't want to start another flame (I am also the postfix maintainer). But it depends what is your definition for "modern". And about the less security - it seems like urban legend today.

Comment 11 Kay Sievers 2011-11-09 11:35:54 UTC

(In reply to comment #7)
> If there is no standard, it is more about systemd upstream opinion.

Just a bit of the reason to ask for all that:

It's not so much about 'opinion' it's about basic process supervision, which
should have been offered in a sane way by init many years ago. All basic
services should be watched and the the admin notified or automatically
restarted when something goes wrong.

It's not a strict requirement with systemd, but we want all services to able
to be supervised.

With old-school double forking, the PID file is what we need to be created
in a reliable way when the service signifies by the double-fork exit, that
it is ready.

Comment 12 skeezikx 2011-11-09 19:25:18 UTC

I'm surprised priority is Low.  Right now sendmail is not running on any f16 installation.  Is sendmail not critical to smooth operation?  Many OS and non-OS programs and processes send mail to root mailbox.  How will users and administrators get the info from root mailbox from all the processes?  I always thought sendmail was integral to fedora.
By the way - I am running 32 bit - I must have missed that part of the initial info.

Comment 13 Jaroslav Škarvada 2011-11-09 21:15:24 UTC

(In reply to comment #12)
> Right now sendmail is not running on any f16 installation.
>
Any installation? Really?

For me it seems to happen only on some machines. Personally, I cannot reproduce it on any of my machines. But to be fair, I am resetting the severity (which was probably set by the reporter) to medium.

Comment 14 Jaroslav Škarvada 2011-11-09 21:22:43 UTC

Similar problem has been faced by httpd maintainers. Interesting solution was proposed by Jan Kaluža in bug 728465 comment 14 (but currently it is not 100% reliable). I will try at first to patch the forking and in case of rejection I will probably fall-back to Jan's solution.

Comment 15 Michal Schmidt 2011-11-10 14:22:22 UTC

(In reply to comment #12)
> I'm surprised priority is Low.  Right now sendmail is not running on any f16
> installation.

For me sendmail runs, even though I'm seeing the warning message. Your problem may or may not be related to this bug. Please boot with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg" and attach the output of the "dmesg" command here. And what does "systemctl status sendmail.service" say?

Comment 16 Michal Schmidt 2011-11-10 14:25:36 UTC

(In reply to comment #12)
> Right now sendmail is not running on any f16 installation.

Also please test if adding "GuessMainPID=no" to /lib/systemd/system/sendmail.service makes it work.

Comment 17 skeezikx 2011-11-10 21:51:03 UTC

a) where do I add "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg" 

b) in what section do I add "GuessMainPID=no"
[Unit], [Service], or [Install]

Comment 18 skeezikx 2011-11-10 22:07:09 UTC

Here is the output:

[root@sendmail system]# systemctl status sendmail.service
sendmail.service - Sendmail Mail Transport Agent
          Loaded: loaded (/lib/systemd/system/sendmail.service; enabled)
          Active: failed since Thu, 10 Nov 2011 15:48:30 -0600; 18min ago
         Process: 1432 ExecStart=/usr/sbin/sendmail -bd $SENDMAIL_OPTS $SENDMAIL_OPTARG (code=exited, status=0/SUCCESS)
         Process: 1427 ExecStartPre=/etc/mail/make aliases (code=exited, status=0/SUCCESS)
         Process: 1425 ExecStartPre=/etc/mail/make (code=exited, status=0/SUCCESS)
        Main PID: 1433 (code=exited, status=71)
          CGroup: name=systemd:/system/sendmail.service

Comment 19 skeezikx 2011-11-11 03:32:41 UTC

I guessed and added "GuessMainPID=no" to the [Service] section.
Now I get this:

Nov 10 21:29:04 sendmail systemd[1]: PID 1433 read from file /run/sendmail.pid does not exist. Your service or init script might be broken.
Nov 10 21:29:54 sendmail systemd[1]: PID 2395 read from file /run/sendmail.pid does not exist. Your service or init script might be broken.
Nov 10 21:29:54 sendmail systemd[1]: sendmail.service: main process exited, code=exited, status=71
Nov 10 21:29:54 sendmail systemd[1]: Unit sendmail.service entered failed state.
Nov 10 21:30:55 sendmail systemd[1]: Reloading.
Nov 10 21:30:55 sendmail systemd[1]: [/etc/rc.d/init.d/racoon:11] Failed to add LSB Provides name ipsec.service, ignoring: File exists

Comment 20 Jaroslav Škarvada 2011-11-11 08:49:57 UTC

(In reply to comment #17)
> a) where do I add "log_buf_len=1M systemd.log_level=debug
> systemd.log_target=kmsg" 
> 
I guess these are kernel boot args, i.e. you could add them to grub config or override in grub boot menu.

Also is there anything interesting in your /var/log/maillog?

Comment 21 skeezikx 2011-11-12 05:37:21 UTC

(In reply to comment #20)
> (In reply to comment #17)
> > a) where do I add "log_buf_len=1M systemd.log_level=debug
> > systemd.log_target=kmsg" 
> > 
> I guess these are kernel boot args, i.e. you could add them to grub config or
> override in grub boot menu.
> 
> Also is there anything interesting in your /var/log/maillog?

Actually, there is something interesting in maillog.  Sendmail appears to be logging even though the systemctl status sendmail.service reports failed.  See log example below:

Nov 11 22:46:11 sendmail sm-msp-queue[6049]: pA9JfO7d001984: to=root, delay=2+09:04:47, xdelay=00:00:00, mailer=relay, pri=3181501, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1]
Nov 11 22:46:11 sendmail sm-msp-queue[6049]: pA9FTJEQ001502: to=root, ctladdr=root (0/0), delay=2+13:16:52, xdelay=00:00:00, mailer=relay, pri=3630214, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1]
Nov 11 22:46:11 sendmail sm-msp-queue[6049]: pA9JfO7e001984: to=root, delay=2+09:04:47, xdelay=00:00:00, mailer=relay, pri=3662825, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1]
Nov 11 22:46:11 sendmail sm-msp-queue[6049]: pA9FT5b6001428: to=root@localhost, ctladdr=root (0/0), delay=2+13:17:06, xdelay=00:00:00, mailer=relay, pri=4111533, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1]

Comment 22 Michal Schmidt 2011-12-05 12:53:56 UTC

(In reply to comment #7)
> If there is no standard, it is more about systemd upstream opinion. I agree
> that the proposed steps makes sense, but please look on it from the sendmail
> perspective: Why to rewrite something that worked good for ages? Sendmail
> upstream is very conservative and mostly calls long-term (maybe non-optimal)
> behaviour features in order to be backward compatible and not to introduce new
> bugs.

I have added a workaround to systemd upstream to handle daemons like this gracefully. See the commit message for details:
http://cgit.freedesktop.org/systemd/commit/?id=3a11183858af30bc9b4e9dac430dd7541deec19b

In short, it uses inotify to wait for the PID file to be ready before marking the service as running. This should make the starting of sendmail reliable.

I still recommend that services get fixed to follow the steps described in daemon(7), but the availability of the workaround makes it a lower priority. Relying on the workaround implies a bit higher overhead, but the problem is not an issue of correctness anymore.

Comment 23 Yann Droneaud 2011-12-29 19:57:06 UTC

Another impact of this bug.

Installation of update got stuck for some minutes:

  ├─packagekitd,1631
  │   ├─yumBackend.py,11948 /usr/share/PackageKit/helpers/yum/yumBackend.py update-packages yes...
  │   │   └─sh,14190 /var/tmp/rpm-tmp.xJvWoR 1
  │   │       └─systemctl,14194 try-restart sendmail.service
  │   ├─{packagekitd},1633
  │   └─{packagekitd},1637


Dec 29 20:43:31 dworkin systemd[1]: PID 1105 read from file /run/sm-client.pid does not exist. Your service or init script might be broken.
Dec 29 20:43:31 dworkin systemd[1]: Reloading.
Dec 29 20:43:32 dworkin systemd[1]: Reloading.
Dec 29 20:43:32 dworkin systemd[1]: PID 1105 read from file /run/sm-client.pid does not exist. Your service or init script might be broken.
Dec 29 20:45:02 dworkin systemd[1]: sendmail.service operation timed out. Terminating.
Dec 29 20:45:02 dworkin systemd[1]: PID 14204 read from file /run/sm-client.pid does not exist. Your service or init script might be broken.
Dec 29 20:46:32 dworkin systemd[1]: sendmail.service stopping timed out (2). Killing.
Dec 29 20:48:02 dworkin systemd[1]: sendmail.service still around after SIGKILL (2). Entering failed mode.
Dec 29 20:48:02 dworkin systemd[1]: Unit sendmail.service entered failed state.
Dec 29 20:48:02 dworkin systemd[1]: Reloading.

Comment 24 Alfred Reaud 2012-02-25 04:46:37 UTC

(In reply to comment #21)
> (In reply to comment #20)
> > (In reply to comment #17)
> > > a) where do I add "log_buf_len=1M systemd.log_level=debug
> > > systemd.log_target=kmsg" 
> > > 
> > I guess these are kernel boot args, i.e. you could add them to grub config or
> > override in grub boot menu.
> > 
> > Also is there anything interesting in your /var/log/maillog?
> 
> Actually, there is something interesting in maillog.  Sendmail appears to be
> logging even though the systemctl status sendmail.service reports failed.  See
> log example below:
> 
> Nov 11 22:46:11 sendmail sm-msp-queue[6049]: pA9JfO7d001984: to=root,
> delay=2+09:04:47, xdelay=00:00:00, mailer=relay, pri=3181501,
> relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by [127.0.0.1]
> Nov 11 22:46:11 sendmail sm-msp-queue[6049]: pA9FTJEQ001502: to=root,
> ctladdr=root (0/0), delay=2+13:16:52, xdelay=00:00:00, mailer=relay,
> pri=3630214, relay=[127.0.0.1], dsn=4.0.0, stat=Deferred: Connection refused by
> [127.0.0.1]
> <snip>

I'm getting the same error. Verified via telnet.

"netstat -lpn | grep sendmail" and "netstat -lpn | grep :25 return" null. The audit log shows no selinux problems.

$ systemctl status sendmail.service
sendmail.service - Sendmail Mail Transport Agent
	  Loaded: loaded (/lib/systemd/system/sendmail.service; enabled)
	  Active: active (running) since Fri, 24 Feb 2012 21:07:45 -0700; 9min ago
	 Process: 3965 ExecStart=/usr/sbin/sendmail $SENDMAIL_OPTS $SENDMAIL_OPTARG (code=exited, status=0/SUCCESS)
	 Process: 3960 ExecStartPre=/etc/mail/make aliases (code=exited, status=0/SUCCESS)
	 Process: 3958 ExecStartPre=/etc/mail/make (code=exited, status=0/SUCCESS)
	Main PID: 3966 (sendmail)
	  CGroup: name=systemd:/system/sendmail.service
		  └ 3966 sendmail: Queue runner@01:00:00 for /var/spool/mqueue

Though the service is running, it isn't apparent to the system. It's machine specific, BTW. I've two other recent installs of F16 where sendmail is working fine. Indicating not a sendmail bug, IMHO... Porting over one of those working configuration scripts doesn't solve the problem. getenforce returns 0, BTW.

Comment 25 sean 2012-03-17 19:39:30 UTC

F16 with systemd-37-15.fc16.x86_64. Using postfix-2.8.7-1.fc16.x86_64:

ls -l /usr/sbin/sendmail
lrwxrwxrwx 1 root root 21 Aug 31  2009 /usr/sbin/sendmail -> /etc/alternatives/mta
[root@asterisk ~]# ls -l /etc/alternatives/mta
lrwxrwxrwx 1 root root 26 Sep  1  2009 /etc/alternatives/mta -> /usr/sbin/sendmail.postfix

but can't get postfix to start:

PID file /run/sendmail.pid not readable (yet?) after start.
Mar 17 14:43:50 systemd[1]: sendmail.service operation timed out. Terminating.
Mar 17 14:43:50 systemd[1]: Job sm-client.service/start failed with result 'dependency'.
Mar 17 14:43:50 systemd[1]: Unit sendmail.service entered failed state.

So it's not just sendmail.

What I can do is disable sendmail, and use chkconfig to enable postfix.

Sure smells like a systemd issue!

Comment 26 Michal Schmidt 2012-03-19 13:25:00 UTC

(In reply to comment #25)
> F16 with systemd-37-15.fc16.x86_64. Using postfix-2.8.7-1.fc16.x86_64:
> 
> ls -l /usr/sbin/sendmail
> lrwxrwxrwx 1 root root 21 Aug 31  2009 /usr/sbin/sendmail ->
> /etc/alternatives/mta
> [root@asterisk ~]# ls -l /etc/alternatives/mta
> lrwxrwxrwx 1 root root 26 Sep  1  2009 /etc/alternatives/mta ->
> /usr/sbin/sendmail.postfix
> 
> but can't get postfix to start:
> 
> PID file /run/sendmail.pid not readable (yet?) after start.
> Mar 17 14:43:50 systemd[1]: sendmail.service operation timed out. Terminating.
> Mar 17 14:43:50 systemd[1]: Job sm-client.service/start failed with result
> 'dependency'.
> Mar 17 14:43:50 systemd[1]: Unit sendmail.service entered failed state.

I certainly wouldn't expect postfix to be started using sendmail's unit files.
postfix has its own unit file.

> What I can do is disable sendmail, and use chkconfig to enable postfix.

This looks like a good idea when you want to use postfix.

> Sure smells like a systemd issue!

To me it doesn't.

Comment 27 Scott Schmit 2012-04-12 12:18:26 UTC

Created attachment 577053 [details]
Output of grep sendmail /var/log/messages, interleaved with the output of grep "starting daemon" /var/log/maillog|grep sendmail

I'm not sure what's going on. Just looking at the log, you'd get the impression that sendmail hadn't put out a pid file after 24 hours and systemd finally gave up. But that's not what's going on.

I interleaved the systemd error messages from /var/log/messages and the "sendmail[...]: starting daemon" messages from maillog to try to get a better idea of what's going on. Not sure that's helpful because it looks like every child gets a log entry too, but I'll attach it anyway just in case it's helpful.

It's been a while since I first started sendmail, and systemd is now complaining about the pid file again.

I think I see the problem. The /run/sendmail.pid file seems to be overwritten by child processes which then exit when they're done, leaving the PID file stale. Eventually (apparently) it gets rewritten by the main sendmail process and it's valid again. That looks buggy.

Here's my evidence. When I first checked a little while back, I'm fairly sure that the pid in /run/sendmail.pid matched a running instance of sendmail. But a later check of the pid file resulted in this:

# cat /run/sendmail.pid
29252
/usr/sbin/sendmail -bd -q1h
# ps ax|grep 29252
29635 pts/3    S+     0:00 grep --color=auto 29252
# systemctl status sendmail.service 
sendmail.service - Sendmail Mail Transport Agent
          Loaded: loaded (/lib/systemd/system/sendmail.service; enabled)
          Active: active (running) since Thu, 12 Apr 2012 08:05:24 -0400; 1min 16s ago
         Process: 29567 ExecStart=/usr/sbin/sendmail -bd $SENDMAIL_OPTS $SENDMAIL_OPTARG (code=exited, status=0/SUCCESS)
         Process: 29562 ExecStartPre=/etc/mail/make aliases (code=exited, status=0/SUCCESS)
         Process: 29560 ExecStartPre=/etc/mail/make (code=exited, status=0/SUCCESS)
        Main PID: 29568 (sendmail)
          CGroup: name=systemd:/system/sendmail.service
                  └ 29568 sendmail: accepting connections

I looked again later, and the pid file changed again:

# cat /run/sendmail.pid 
29568
/usr/sbin/sendmail -bd -q1h
# ps ax|grep 29568
29568 ?        Ss     0:00 sendmail: accepting connections
29665 pts/3    S+     0:00 grep --color=auto 29568

I've checked several times since and haven't seen the pid file change, but this probably depends on when I get email.

Hope this helps resolve the issue. I'd rather not have to restart sendmail every day just to keep getting my email.

Comment 28 yourrealname 2012-10-24 13:16:21 UTC

See my comment: https://bugzilla.redhat.com/show_bug.cgi?id=848172#c18

Comment 29 Fedora End Of Life 2013-01-16 22:52:23 UTC

This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 30 Fedora End Of Life 2013-02-14 01:23:33 UTC

Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.