Bug 244655 - Trying to restart a hung/frozen sshd daemon doesn't show correct status
Trying to restart a hung/frozen sshd daemon doesn't show correct status
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: openssh (Show other bugs)
4.5
All Linux
low Severity low
: ---
: ---
Assigned To: Tomas Mraz
Brian Brock
: OtherQA
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-06-18 08:10 EDT by Jose Plans
Modified: 2009-06-19 18:56 EDT (History)
2 users (show)

See Also:
Fixed In Version: RHSA-2007-0703
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-11-15 09:58:13 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
initscripts-sshd_stop.patch (289 bytes, patch)
2007-06-18 08:10 EDT, Jose Plans
no flags Details | Diff

  None (edit)
Description Jose Plans 2007-06-18 08:10:49 EDT
Description of problem:
-------------------------

openssh version: openssh-3.9p1-8.RHEL4.20


The sshd dameon got frozen and When "service sshd restart" is done, it displays
that the stop/start has been successful. But infact the old sshd was not terminated.


#service sshd restart
Stopping sshd:                                             [  OK  ]
Starting sshd:                                             [  OK  ]


# service sshd start
Starting sshd:                                             [  OK  ]


Since old sshd hasn't been killed, when the new one tries to listen on port 22,
it fails. This is because kill command used in killproc() just returns 0 after
sending a signal.

With that mechanism We can't detect if a completely wedged process is really
killed or not.

This is not obvious for the user since it displays OK...

The failure only shows up in the log (/var/log/secure):

Sep  5 20:29:33 n27 sshd[30475]: error: Bind to port 22 on 0.0.0.0 failed:
Address already in use.
Sep  5 20:29:33 n27 sshd[30475]: fatal: Cannot bind any address.
Sep  5 20:29:33 n27 sshd[2841]: Received signal 15; terminating.


The restart(), start() and stop() functions in sshd init-script should handle
this scenario. They (or killproc fucntion ) should check for sshd pid once more
after sending terminate signal and before trying to start a new sshd.

Additional info:
This has been corrected in RHEL5's initscripts. Instead of sending a killproc
$SSHD -TERM, we send KILL (default).

Can this also be applied in RHEL4 ?
Comment 1 Jose Plans 2007-06-18 08:10:49 EDT
Created attachment 157274 [details]
initscripts-sshd_stop.patch
Comment 2 Jose Plans 2007-06-18 08:12:28 EDT
Please let me know if you need more details about this.
Comment 3 Tomas Mraz 2007-06-18 11:47:19 EDT
Let's fix this in rhel-4.6.
Comment 4 RHEL Product and Program Management 2007-06-18 11:55:31 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 11 errata-xmlrpc 2007-11-15 09:58:13 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2007-0703.html

Note You need to log in before you can comment on or make changes to this bug.