Bug 721375

Summary: sshd moves to a new PID during reload
Product: [Fedora] Fedora Reporter: Michal Schmidt <mschmidt>
Component: opensshAssignee: Petr Lautrbach <plautrba>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 15CC: mattias.ellert, mgrepl, rvokal, tmraz
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-08-07 18:04:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 702621    

Description Michal Schmidt 2011-07-14 12:53:51 UTC
Description of problem:
'service sshd reload' causes the sshd daemon to switch to a new PID. Ideally it would reload its configuration without moving to a new process. A couple of reasons why the current behaviour is wrong:

 1) LSB defines the reload action of an initscript as:
     "cause the configuration of the service to be reloaded without actually
      stopping and restarting the service"
     ( http://refspecs.freestandards.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html )
    In my opinion starting a new process and exiting the original one does
    constitute a restart.
 2) It causes problems when sshd is supervised by systemd. systemd knows
    the main PID of the service and if that PID exits, systemd considers the
    service exited too. As a result, wrong status is shown for the service
    and/or the service is killed (depending on the exact systemd version).
    systemd could be fixed to reload the pidfile after completing the reload
    action, but it won't help in this case as the reload action is completely
    asynchronous (it sends SIGHUP to sshd and that's it).

Version-Release number of selected component (if applicable):
openssh-5.6p1-31.fc15.1.x86_64

How reproducible:
always

Steps to Reproduce:
1. pidof sshd
2. service sshd reload  (or just: killall -HUP sshd)
3. pidof sshd
  
Actual results:
You'll get a different PID in step 3 than in step 1.

Expected results:
sshd should keep the same PID.

Additional info:
See https://bugzilla.redhat.com/show_bug.cgi?id=719931 for a discussion about a similar problem in sendmail.

Comment 1 Tomas Mraz 2011-07-14 13:09:11 UTC
Well the one fix possible would be to just drop the reload action altogether. Would it be fine for you?

Other kludges will be most probably not accepted by OpenSSH upstream. Maybe systemd should stop trying to instigate its view of what is a daemon and what not on all of the world.

SSHD is getting a systemd unit in rawhide. Is there a way how to let the systemd know that the main PID changes on reload?

Comment 2 Michal Schmidt 2011-07-14 13:30:50 UTC
(In reply to comment #1)
> Well the one fix possible would be to just drop the reload action altogether.
> Would it be fine for you?

Fine with me. LSB says the reload action is optional anyway.

> Other kludges will be most probably not accepted by OpenSSH upstream.

Well, may be worth a try.

> SSHD is getting a systemd unit in rawhide. Is there a way how to let the
> systemd know that the main PID changes on reload?

There is a bug currently in systemd that prevent the detection of the PID change on reload. It will be fixed. However, it will help only if the ExecReload action is synchronous, i.e. if it does not finish before the reload is complete. It currently is not written that way - "kill -HUP $MAINPID" does not wait for anything.

Comment 3 Tomas Mraz 2011-07-14 14:01:51 UTC
As other init implementations worked fine for years with the way the reload was done in sshd I'd say that we should keep the init script as is.

As we will have native unit in F16 and the sshd.service is of type simple - that is not daemonizing and kill -HUP in this case also does not change PID I'd say CLOSED-WONTIFX or NOTABUG.

Comment 4 Michal Schmidt 2011-07-14 14:36:44 UTC
(In reply to comment #3)
> As other init implementations worked fine for years with the way the reload was
> done in sshd I'd say that we should keep the init script as is.

In F15 we already have systemd though and if you run 'service sshd status' after a reload, you'll get an incorrect status of the service: "active (exited)". Admittedly this looks like a mere cosmetic issue, because the service still works.
However, I would like to push a systemd update to F15 to fix bug 702621. And even though the patch for that bug is beneficial in a lot of cases, it would also make this sshd reload behaviour worse - sshd would be killed after reload.

Would you consider removing the reload action in F15 then?

> As we will have native unit in F16 and the sshd.service is of type simple -
> that is not daemonizing and kill -HUP in this case also does not change PID I'd
> say CLOSED-WONTIFX or NOTABUG.

That's great. You are right that this will work fine in F16. No fix is needed there.

Comment 5 Michal Schmidt 2011-07-21 13:49:55 UTC
Perhaps I confused the issue by bringing systemd into the discussion. Let me rephrase the problem in more general terms.

The sshd initscript is buggy as long as all these statements are true:
 - It uses the "pidfile:" declaration in its header.
 - The reload action makes the daemon change its PID.
 - The reload action is asynchronous, i.e. it exits before the daemon's
   PID change is complete.
Because then the reload action causes a breach of contract given by the "pidfile:" declaration. The caller of the initscript may be mislead about the current PID of the service if it re-reads the PID file quickly enough.

And since the reload action as currently implemented in sshd is pretty much equivalent to a restart anyway, the easiest way to fix it is to remove the reload action from the initscript.

Comment 6 Tomas Mraz 2011-07-21 13:55:57 UTC
Hmm... what about removing the pidfile declaration in its header? :)

And you cannot prevent admin from sending SIGHUP to the process anyway.

Comment 7 Jan F. Chadima 2011-07-21 14:03:38 UTC
(In reply to comment #5)
> Perhaps I confused the issue by bringing systemd into the discussion. Let me
> rephrase the problem in more general terms.
> 
> The sshd initscript is buggy as long as all these statements are true:
>  - It uses the "pidfile:" declaration in its header.
may be repaired

>  - The reload action makes the daemon change its PID.
may not be repaired

>  - The reload action is asynchronous, i.e. it exits before the daemon's
>    PID change is complete.
may not be repaired

> Because then the reload action causes a breach of contract given by the
> "pidfile:" declaration. The caller of the initscript may be mislead about the
> current PID of the service if it re-reads the PID file quickly enough.
> 
> And since the reload action as currently implemented in sshd is pretty much
> equivalent to a restart anyway, the easiest way to fix it is to remove the
> reload action from the initscript.

there are some "minor" differences.

do "kill -1 $pid_of_sshd" crash sytemd?

Comment 8 Michal Schmidt 2011-07-21 14:31:44 UTC
(In reply to comment #6)
> Hmm... what about removing the pidfile declaration in its header? :)

It would be somewhat better than status quo, yes. At least then the initscript will be correct from the point-of-view of the contract with the caller, because it won't promise what it cannot uphold.

However, declaring the "pidfile:" for daemons is a nice and useful thing, as it allows a supervisor process (such as systemd) to recognize when the service has crashed and it clearly distinguishes daemon initscripts from one-shot initscripts (see a related bug 702621).

(In reply to comment #7)
> there are some "minor" differences.

Curious. Are you suggesting subtly that the differences are more than minor? What are they?

> do "kill -1 $pid_of_sshd" crash sytemd?

With systemd-26-8.fc15 (currently in F15) it will result in sshd.service going into the state "active (exited)", but it will continue operating normally.
However, if I backport a patch from systemd-30 to fix bug 702621 (which I'd like to, otherwise I wouldn't have filed this BZ), it would result in going to the state "inactive (dead)" and killing all the processes in the cgroup (i.e. killing the sshd in the new PID).

Comment 9 Jan F. Chadima 2011-07-21 14:41:15 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > Hmm... what about removing the pidfile declaration in its header? :)
> 
> It would be somewhat better than status quo, yes. At least then the initscript
> will be correct from the point-of-view of the contract with the caller, > 
> > do "kill -1 $pid_of_sshd" crash sytemd?
> 
> With systemd-26-8.fc15 (currently in F15) it will result in sshd.service going
> into the state "active (exited)", but it will continue operating normally.
> However, if I backport a patch from systemd-30 to fix bug 702621 (which I'd
> like to, otherwise I wouldn't have filed this BZ), it would result in going to
> the state "inactive (dead)" and killing all the processes in the cgroup (i.e.
> killing the sshd in the new PID).

So please throw away that patch. And make some better solution.

Comment 10 Jan F. Chadima 2011-07-21 15:45:30 UTC
Maybe in this case of testing for the same pid would be better to test if the port 22 is bound.

Comment 11 Michal Schmidt 2011-09-21 01:20:54 UTC
Port checking would not work in general. Not all services are socket servers. And systemd does not have the information about which ports to check for the given service.

Upstream systemd now has a way to deal with daemons that move to new PIDs. It works even for spontaneous events (from systemd point-of-view) such as
"kill -HUP $PID". systemd detects this movement and it updates its idea of the main PID of the service.

There is one important requirement though: When systemd receives SIGCHLD for the death of the old PID, the new PID number must be already written in the PID file.

Unfortunately, sshd does not do that. sshd forks, exits the original process, and then sometime later the child gets to write the PID file.

Would it be possible to change sshd so that the original process waits before exiting for the child to write the PID file? A pipe can be used for that. For a perfect daemonization you can see the steps in "SysV Daemons" in:
http://0pointer.de/public/systemd-man/daemon.html

Comment 12 Fedora Admin XMLRPC Client 2011-11-30 12:26:15 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 13 Fedora End Of Life 2012-08-07 18:04:41 UTC
This message is a notice that Fedora 15 is now at end of life. Fedora
has stopped maintaining and issuing updates for Fedora 15. It is
Fedora's policy to close all bug reports from releases that are no
longer maintained. At this time, all open bugs with a Fedora 'version'
of '15' have been closed as WONTFIX.

(Please note: Our normal process is to give advanced warning of this
occurring, but we forgot to do that. A thousand apologies.)

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen
this bug and simply change the 'version' to a later Fedora version.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we were unable to fix it before Fedora 15 reached end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged to click on
"Clone This Bug" (top right of this page) and open it against that
version of Fedora.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping