Bug 748416

Summary: nm-dispatcher uses restart instead of try-restart
Product: [Fedora] Fedora Reporter: Jaroslav Škarvada <jskarvad>
Component: sendmailAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 15CC: bugs.michael, bulk, harald, johannbg, jskarvad, lpoetter, metherid, mlichvar, mschmidt, notting, paul, pcfe, plautrba
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: sendmail-8.14.5-2.fc15.2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 719931 Environment:
Last Closed: 2011-11-13 05:35:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jaroslav Škarvada 2011-10-24 12:10:07 UTC
+++ This bug was initially created as a clone of Bug #719931 +++

+++ This bug was initially created as a clone of Bug #719884 +++

Created attachment 511883 [details]
log messages for bug

Description of problem: sendmail fails to start after installing systemd test update:

$ rpm -qa systemd\*
systemd-units-26-7.fc15.x86_64
systemd-sysv-26-7.fc15.x86_64
systemd-26-7.fc15.x86_64

Not reproducible with previous stable release: 26-5.fc15

--- Additional comment from mschwendt on 2011-07-08 05:56:23 EDT ---

Created attachment 511884 [details]
log messages for working systemd version

--- Additional comment from mschmidt on 2011-07-08 07:43:45 EDT ---

It seems this is not exactly a new bug, but the fixes in systemd-26-7.fc15 uncovered a previously latent bug. The reason I believe so is that the log from the working version (26-5.fc15) shows some unexpected behaviour too:

1. sendmail is started (OK)
2. sendmail is reloaded (OK)
3. Then however sendmail exits unexpectedly:
 [   35.800771] systemd[1]: Received SIGCHLD from PID 1275 (sendmail).
 [   35.801981] systemd[1]: Got SIGCHLD for process 1275 (sendmail)
 [   35.802026] systemd[1]: Child 1275 died (code=exited, status=0/SUCCESS)
 [   35.802031] systemd[1]: Child 1275 belongs to sendmail.service
 [   35.802039] systemd[1]: sendmail.service: main process exited, code=exited, status=0
 [   35.802145] systemd[1]: sendmail.service changed running -> exited

What does "systemctl status sendmail.service" show after booting with the working version (26-5.fc15)?
And what does "cat /var/run/sendmail.pid" show?

--- Additional comment from mschmidt on 2011-07-08 07:49:25 EDT ---

And here's another thing to test: Remove the line "# pidfile: /var/run/sendmail.pid" from /etc/init.d/sendmail and see what happens after boot with the new systemd version (26-7.fc15).

--- Additional comment from mschwendt on 2011-07-08 07:59:29 EDT ---

Re: comment 2

# systemctl status sendmail.service
sendmail.service - LSB: start and stop sendmail
	  Loaded: loaded (/etc/rc.d/init.d/sendmail)
	  Active: active (exited) since Fri, 08 Jul 2011 13:56:45 +0200; 1min 8s ago
	 Process: 1502 ExecReload=/etc/rc.d/init.d/sendmail reload (code=exited, status=0/SUCCESS)
	 Process: 1130 ExecStart=/etc/rc.d/init.d/sendmail start (code=exited, status=0/SUCCESS)
	Main PID: 1258 (code=exited, status=0/SUCCESS)
	  CGroup: name=systemd:/system/sendmail.service
		  ├ 1516 sendmail: accepting connections
		  └ 1518 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue


# cat /var/run/sendmail.pid
1516
/usr/sbin/sendmail -bd -q1h


Reply to comment 3 will follow...

--- Additional comment from mschwendt on 2011-07-08 08:02:06 EDT ---

Re: comment 3

With the removed pid line, it started:

# systemctl status sendmail.service
sendmail.service - LSB: start and stop sendmail
	  Loaded: loaded (/etc/rc.d/init.d/sendmail)
	  Active: active (running) since Fri, 08 Jul 2011 14:00:33 +0200; 25s ago
	 Process: 1487 ExecReload=/etc/rc.d/init.d/sendmail reload (code=exited, status=0/SUCCESS)
	 Process: 1131 ExecStart=/etc/rc.d/init.d/sendmail start (code=exited, status=0/SUCCESS)
	  CGroup: name=systemd:/system/sendmail.service
		  ├ 1505 sendmail: accepting connections
		  └ 1507 sendmail: Queue runner@01:00:00 for /var/spool/cli...

# cat /var/run/sendmail.pid
1505
/usr/sbin/sendmail -bd -q1h

--- Additional comment from mschmidt on 2011-07-08 08:55:45 EDT ---

I see. sendmail does not merely reload its configuration on SIGHUP (the reload action). It does a full restart itself and it then runs with a new PID.
(http://www.c3.hu/docs/oreilly/tcpip/sendmail/ch26_03.htm)

This is incorrect behaviour. LSB specifies the "reload" action as:
  cause the configuration of the service to be reloaded without actually
  stopping and restarting the service

If a service does not support a real reload, its initscript should not provide the action.


Could systemd workaround this somehow? It could re-read the pidfile after '/etc/init.d/sendmail reload' finishes. But the reload action merely sends SIGHUP to the sendmail process and nothing guarantees that sendmail has finished its restarting by the time the initscript exits.

For now I'll remove the patch from systemd-26-7.fc15 that uncovered this bug (0001-service-pidfile-in-SysV-chkconfig-header-implies-a-r.patch). However the patch fixes some other bugs, so it would be nice to be able to put it back eventually.


Michael,
see if using "restart" instead of "reload" in /NetworkManager/dispatcher.d/10-sendmail fixes it.

--- Additional comment from mschwendt on 2011-07-08 17:09:14 CEST ---

Yes, success:

# rpm -V sendmail
5S.T.....    /etc/NetworkManager/dispatcher.d/10-sendmail
# systemctl status sendmail.service
sendmail.service - LSB: start and stop sendmail
	  Loaded: loaded (/etc/rc.d/init.d/sendmail)
	  Active: active (running) since Fri, 08 Jul 2011 17:07:00 +0200; 35s ago
	 Process: 1396 ExecStop=/etc/rc.d/init.d/sendmail stop (code=exited, status=0/SUCCESS)
	 Process: 1422 ExecStart=/etc/rc.d/init.d/sendmail start (code=exited, status=0/SUCCESS)
	Main PID: 1435 (sendmail)
	  CGroup: name=systemd:/system/sendmail.service
		  ├ 1435 sendmail: accepting connections
		  └ 1444 sendmail: Queue runner@01:00:00 for /var/spool/cli...
# rpm -qa systemd\*
systemd-26-7.fc15.x86_64
systemd-units-26-7.fc15.x86_64
systemd-sysv-26-7.fc15.x86_64

--- Additional comment from jskarvad on 2011-07-10 21:13:21 CEST ---

Created attachment 512109 [details]
Experimental patch that prevents the PID to change on HUP

Michal thanks for the analysis. Experimental patch that prevents the PID to change is attached. Scratch build with this patch applied is also available:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3189837

I will consult this issue with the sendmail upstream. In case the patch will not be accepted we will have to workaround this in systemd or the reload function will have to be removed.

--- Additional comment from jskarvad on 2011-07-10 21:24:27 CEST ---

F15 and rawhide experimental scratch builds:
http://koji.fedoraproject.org/koji/taskinfo?taskID=3189842
http://koji.fedoraproject.org/koji/taskinfo?taskID=3189839

Let's wait for the upstream opinion.

--- Additional comment from mschmidt on 2011-07-12 16:04:28 CEST ---

(In reply to comment #2)
> Created attachment 512109 [details]
> Experimental patch that prevents the PID to change on HUP

Nice trick.

(In reply to comment #3)
> F15 and rawhide experimental scratch builds:
> http://koji.fedoraproject.org/koji/taskinfo?taskID=3189842

Works for me as expected.

--- Additional comment from jskarvad on 2011-07-20 19:10:50 CEST ---

It was rejected by upstream. Copy of the communication follows:

>> it seems there is unnecessary fork on HUP if running as
>> daemon. The result is that the PID changes on HUP.
>> Maybe this is a feature, but it could cause a problem for
>> distros that uses this functionality in initscripts for
>> reload (e.g. http://bugzilla.redhat.com/show_bug.cgi?id=719931).
>
> "What's the problem you are trying to solve?"
> It seems that is "just" a semantics problem with the way your
> init scripts work: "reload" vs. "restart" -- kill -HUP is
> a restart, not a reload (AFAICT).
>
>> Would it be possible to fix this? I attached simple
>> patch, but I am not sure about the possible side-effects,
>
> Me neither. Moreover, you introduce a new call that isn't used in
> sendmail (getsid()). Last, but not least, this is a change in
> behaviour and hence would have to wait for 8.15.
>
> Summary: I don't see much gain from making such a change, but I see
> a lot of work to avoid breaking things.

In summary: Sendmail uses HUP for restart not for reload and changing this behaviour is too risky with low benefit.

Michal, is it possible to fix this in systemd? Otherwise I will switch all reloads to restarts.

--- Additional comment from mschmidt on 2011-07-21 15:15:11 CEST ---

OK, upstream says the sendmail daemon does offer an actual reload. Fair enough.
Please replace all uses of 'service sendmail reload' with 'service sendmail restart' and remove the reload action from the initscript in F15.

As long as an initscript offers a reload action that:
 - makes the daemon change PID
   and
 - is asynchronous (exits before the daemon's PID change is complete),
it is not possible to fix/workaround it in systemd.


I had a look at sendmail.service in Rawhide. It has some ExecReload= lines:
  ExecReload=-/etc/mail/make
  ExecReload=-/etc/mail/make aliases
I am not very familiar with sendmail, but I don't think this is sufficient to make the daemon take the new configuration into account.
If it's indeed buggy, perhaps you should remove these two lines too.


(As a sidenote, I find the option to override DAEMON="-bd" in sysconfig dubious. What benefit does it bring? I'd just hardcode it in the unit file. Since the unit type is defined as "forking", using anything else than "-bd" in sysconfig would break it. For the future I suggest looking into the possibility of using "-bD" and changing the type to "simple".)

--- Additional comment from jskarvad on 2011-07-21 15:36:00 CEST ---

(In reply to comment #6)
> I had a look at sendmail.service in Rawhide. It has some ExecReload= lines:
>   ExecReload=-/etc/mail/make
>   ExecReload=-/etc/mail/make aliases
> I am not very familiar with sendmail, but I don't think this is sufficient to
> make the daemon take the new configuration into account.
> If it's indeed buggy, perhaps you should remove these two lines too.
>
Sorry, I am not familiar with systemd, do we need explicit:
ExecReload=/bin/kill -HUP $MAINPID
?

The 'make' command synchronously regenerates the config files (only if sendmail-cf package is installed, otherwise it fails).

> (As a sidenote, I find the option to override DAEMON="-bd" in sysconfig
> dubious. What benefit does it bring? I'd just hardcode it in the unit file.
> Since the unit type is defined as "forking", using anything else than "-bd" in
> sysconfig would break it. For the future I suggest looking into the possibility
> of using "-bD" and changing the type to "simple".)
>
Interesting, as I wrote I am not familiar with systemd. When the sendmail runs in foreground (with -bD) it doesn't change its PID on HUP. Could we use foreground mode and simple? It would also resolve the reload problem. Or is there any side effect?

--- Additional comment from paul on 2011-07-21 15:54:11 CEST ---

(In reply to comment #7)
> Interesting, as I wrote I am not familiar with systemd. When the sendmail runs
> in foreground (with -bD) it doesn't change its PID on HUP. Could we use
> foreground mode and simple? It would also resolve the reload problem. Or is
> there any side effect?

The manpage doesn't mention any side-effects, so I'd give it a go. Some daemons do logging to stdout when run in the foreground like this for debugging, but I don't think sendmail does.

--- Additional comment from paul on 2011-07-21 16:00:40 CEST ---

I don't see any way of making sm-client run in the foreground and still do regular queue runs though.

--- Additional comment from mschmidt on 2011-07-21 16:03:52 CEST ---

(In reply to comment #7)
> Sorry, I am not familiar with systemd, do we need explicit:
> ExecReload=/bin/kill -HUP $MAINPID
> ?

I did not test it, but I believe so, yes. Either that or remove the ExecReload items altogether.

> Interesting, as I wrote I am not familiar with systemd. When the sendmail runs
> in foreground (with -bD) it doesn't change its PID on HUP. Could we use
> foreground mode and simple? It would also resolve the reload problem.

Correct.

> Or is there any side effect?

Hmm, yes, there would be a side-effect. The ordering requirement in sm-client.service would not be effective anymore:
  After= ... sendmail.service
That's because for "Type=simple" service there is no way to let systemd know when the service is ready to accept external connections, so systemd just fires off the service and then goes on starting subsequent services. This problem goes away if the service is patched to use socket activation, because then the socket can always accept connections.

--- Additional comment from jskarvad on 2011-07-21 16:23:33 CEST ---

> > Or is there any side effect?
> 
> Hmm, yes, there would be a side-effect. The ordering requirement in
> sm-client.service would not be effective anymore:
>   After= ... sendmail.service
> That's because for "Type=simple" service there is no way to let systemd know
> when the service is ready to accept external connections, so systemd just fires
> off the service and then goes on starting subsequent services. This problem
> goes away if the service is patched to use socket activation, because then the
> socket can always accept connections.

Is the ordering for exec preserved? So, does it fire sendmail.service and then immediately sm-client service or is it possible that it e.g. fires sm-client service and then after some time sendmail.service (random ordering)?

--- Additional comment from jskarvad on 2011-07-21 16:26:55 CEST ---

Created attachment 514215 [details]
enable foreground for sm-client

> I don't see any way of making sm-client run in the foreground and still do
regular queue runs though.

If we would go in foreground, I think we could patch it.

--- Additional comment from mschmidt on 2011-07-21 16:37:52 CEST ---

(In reply to comment #11)
> Is the ordering for exec preserved? So, does it fire sendmail.service and then
> immediately sm-client service

Yes, but this does not buy you much, does it? Scheduling of the resulting processes is not deterministic...

> or is it possible that it e.g. fires sm-client
> service and then after some time sendmail.service (random ordering)?

No, it won't do that.

--- Additional comment from jskarvad on 2011-07-21 16:47:21 CEST ---

(In reply to comment #13)
> (In reply to comment #11)
> > Is the ordering for exec preserved? So, does it fire sendmail.service and then
> > immediately sm-client service
> 
> Yes, but this does not buy you much, does it? Scheduling of the resulting
> processes is not deterministic...
> 
It is still better than truly random. IMHO the slight delay during the startup shouldn't be problem. But we will have to split the make files.

--- Additional comment from jskarvad on 2011-07-21 16:51:54 CEST ---

Or implement locking.

Or drop the foreground idea and go without reload.

Michal, with forking is it guaranteed that the ExecStartPre of service B is not run before (or simultaneously) ExecStartPre of service A, if service B has "after" service A?

--- Additional comment from mschmidt on 2011-07-21 17:28:23 CEST ---

(In reply to comment #15)
> Or implement locking.
> 
> Or drop the foreground idea and go without reload.

Yes, this option is easy and safe. We can revisit Type=simple in the future when sendmail gets patched for socket activation.

> Michal, with forking is it guaranteed that the ExecStartPre of service B is not
> run before (or simultaneously) ExecStartPre of service A, if service B has
> "after" service A?

Yes, this is guaranteed. And it is guaranteed even all types, even for Type=simple.

Let's suppose that a start of A.service and B.service is requested. B.service has "After=A.service". The ordering will be as follows:
 1. all ExecStartPre of A.service will be run sequentially.
 2. ExecStart of A.service will be run.
    What happens next depends on the type of A.service:
    a) if it's "simple" -  A.service is now considered running and systemd
       will proceed immediately with step 3.
    b) if it's "forking", systemd will wait for the ExecStart process to exit,
       because that is the event that signifies the readiness of the daemon.
 3. all ExecStartPre of B.service will be run sequentially.
 4. ExecStart of B.service will be run.

Notice that whatever type the services are, step 3 will always be ordered after step 1.

--- Additional comment from jskarvad on 2011-07-22 11:12:58 CEST ---

Created attachment 514643 [details]
Proposed rawhide changes

--- Additional comment from updates on 2011-07-22 12:27:28 CEST ---

sendmail-8.14.5-2.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/sendmail-8.14.5-2.fc15

--- Additional comment from paul on 2011-07-22 14:04:36 CEST ---

Changing the variable names in /etc/sysconfig/sendmail presents a problem at upgrade when sendmail is restarted, as SENDMAIL_OPTS will be empty if the user has edited /etc/sysconfig/sendmail and so the new one gets installed as /etc/sysconfig/sendmail.rpmnew. This results in the restarted sendmail daemon not doing queue runs and the client failing altogether.

This could be prevented by putting default values into the service files using "Environment = SENDMAIL_OPTS=-q1h SENDMAIL_OPTARG="
before the EnvironmentFile lines.

--- Additional comment from jskarvad on 2011-07-22 16:28:17 CEST ---

Thanks, we could simplify it to:

Environment = SENDMAIL_OPTS=-q1h

because there is no change in SENDMAIL_OPTARG and it will be initially blank if not overridden in /etc/sysconfig/sendmail.

Or probably we could simplify it even more and use only SENDMAIL_OPTARG in /etc/sysconfig/sendmail, e.g. SENDMAIL_OPTARG=-q1h - The backward compatibility of /etc/sysconfig/sendmail was already broken during update from 8.14.5-2 to 8.14.5-3, so users with non default settings had to update by hand.

--- Additional comment from updates on 2011-07-23 00:09:22 CEST ---

sendmail-8.14.5-3.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/sendmail-8.14.5-3.fc15

--- Additional comment from updates on 2011-07-23 04:06:04 CEST ---

Package sendmail-8.14.5-3.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sendmail-8.14.5-3.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/sendmail-8.14.5-3.fc15
then log in and leave karma (feedback).

--- Additional comment from paul on 2011-07-25 09:48:03 CEST ---

(In reply to comment #20)
> Thanks, we could simplify it to:
> 
> Environment = SENDMAIL_OPTS=-q1h
> 
> because there is no change in SENDMAIL_OPTARG and it will be initially blank if
> not overridden in /etc/sysconfig/sendmail.

True.

> Or probably we could simplify it even more and use only SENDMAIL_OPTARG in
> /etc/sysconfig/sendmail, e.g. SENDMAIL_OPTARG=-q1h - The backward compatibility
> of /etc/sysconfig/sendmail was already broken during update from 8.14.5-2 to
> 8.14.5-3, so users with non default settings had to update by hand.

The largest number of users upgrading to Rawhide sendmail will come when F16 is released; some of those people may have customized SENDMAIL_OPTARG in /etc/sysconfig/sendmail. It would benefit those people not to use SENDMAIL_OPTARG for the queue option, migrating the old DAEMON and QUEUE options into SENDMAIL_OPTS as it is at the moment, so when they update and sendmail is restarted, it'll still start properly, albeit with the default queue period rather than anything they'd changed QUEUE to.

--- Additional comment from jskarvad on 2011-07-25 16:17:12 CEST ---

Thanks

--- Additional comment from updates on 2011-09-15 10:13:29 CEST ---

sendmail-8.14.5-2.fc15.1 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/sendmail-8.14.5-2.fc15.1

--- Additional comment from mschmidt on 2011-10-24 13:39:42 CEST ---

A tester's remark in http://www.abclinuxu.cz/poradna/linux/show/346943 (in Czech) has just made me realize that "restart" will start the service even if it is disabled. The dispatcher script should use "try-restart" instead.

Comment 1 Jaroslav Škarvada 2011-10-24 12:10:52 UTC
Thanks, cloned to track this separately.

Comment 2 Fedora Update System 2011-10-24 12:35:04 UTC
sendmail-8.14.5-10.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/sendmail-8.14.5-10.fc16

Comment 3 Fedora Update System 2011-10-24 12:36:05 UTC
sendmail-8.14.5-2.fc15.2 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/sendmail-8.14.5-2.fc15.2

Comment 4 Fedora Update System 2011-10-24 22:56:41 UTC
Package sendmail-8.14.5-2.fc15.2:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing sendmail-8.14.5-2.fc15.2'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-14816
then log in and leave karma (feedback).

Comment 5 Fedora Update System 2011-11-13 05:35:30 UTC
sendmail-8.14.5-10.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 6 Fedora Update System 2011-11-25 02:14:43 UTC
sendmail-8.14.5-2.fc15.2 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.