438449 – /etc/rc.d/init.d/killall is racing with other "stops"

Bug 438449 - /etc/rc.d/init.d/killall is racing with other "stops"

Summary: /etc/rc.d/init.d/killall is racing with other "stops"

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	openssh
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Tomas Mraz
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	F9Blocker
TreeView+	depends on / blocked

Reported:	2008-03-20 23:15 UTC by Michal Jaegermann
Modified:	2008-04-08 07:13 UTC (History)
CC List:	2 users (show)
Fixed In Version:	openssh-5.0p1-1.fc9
Clone Of:
Environment:
Last Closed:	2008-04-08 07:13:19 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Fixes this bug. /etc/init.d/sshd kills itself via killall sshd. (425 bytes, patch) 2008-04-07 18:17 UTC, archimerged Ark submedes	no flags	Details \| Diff
View All

Description Michal Jaegermann 2008-03-20 23:15:33 UTC

Description of problem:

It looks that when shutting down a system a script /etc/rc.d/init.d/killall
is supposed to run last after everything was stopped doing
a final sweep for strays.  Only it appears that with an upstart
scheme it does not wait and it races with other scripts producing
on a screen messages like

/etc/rc6.d/S00killall: line 16: 2743 Terminated /etc/init.d/$subsys stop

If that is a purely cosmetic issue then the above can be prevented
by redirecting all stdout and stderr output of 'for ... ; done' loop
in that script to /dev/null.  OTOH as this is racing, as indicated
by those error messages, then what are guarantees that something
would not be stopped too early?  'killall' is not doing any checks
(with this exception that it will not touch 'network').

Other possibility could be a removal but the script may turn out
to be really handy when one would want to get something close
to a real single-user mode. 

Version-Release number of selected component (if applicable):
upstart-0.3.9-9.fc9

How reproducible:
on every shutdown

Comment 1 archimerged Ark submedes 2008-04-07 18:17:30 UTC

Created attachment 301544 [details]
Fixes this bug.  /etc/init.d/sshd kills itself via killall sshd.

Add trap '' TERM before the killall and trap TERM after.

Comment 2 archimerged Ark submedes 2008-04-07 18:23:25 UTC

component should be changed to openssh

Comment 3 Michal Jaegermann 2008-04-07 18:54:55 UTC

> component should be changed to openssh

Is this truly the whole problem or you found (thanks!) one instance
where right now this is causing hiccups?  'killall' really used to
run as a final check and this does not seem to be the case anymore.

Comment 4 archimerged Ark submedes 2008-04-07 19:14:11 UTC

I put set -xv at front of /etc/init.d/killall and found that this is the script
that causes all of the "stopping <subsys> ... [ OK ]" messages on the console.

/usr/bin/killall is a completely different thing but it was related --
/etc/init.d/sshd wanted to kill /usr/sbin/sshd, but /usr/bin/killall kills the
script as well as any extra sshd's.  Seems to me I remember a bug just like this
circa Solaris 2.5 ...

The error message is somewhat misleading.  It wasn't line 16, but line 16 is the
beginning of the compound statement where the 'terminated' status code came
back, and /etc/init.d/$subsys stop was the text of the line which got the bad
status.  When I changed that to bash -xv /etc/init.d/$subsys, the problem
confusingly went away (since the process named bash, not sshd).

Comment 5 Michal Jaegermann 2008-04-07 19:30:33 UTC

> /usr/bin/killall is a completely different thing ...

Yes, I know.  Maybe I should be more carefull but I thought that
from the context it was clear that we are talking about killall in 
/etc/init.d/. See also a title for this bug report.

Comment 6 archimerged Ark submedes 2008-04-07 19:33:25 UTC

The comment about "there shouldn't be any" at the top of /etc/init.d/killall is
no longer accurate and should be revised.  That script is part of the
initscripts rpm.  (Or else if things were supposed to have been stopped by the
Knn links in /etc/init.d/rc6.d/ then there is a bug in upstart or something like
that.  /etc/init.d/rc6.d/S00killall used to run after the Knn scripts IIRC.)

Actually, it does sound like there is a bug, because killall kills subsystems in
alphabetical order, not in the order specified by the Knn symbolic links.

Probably need a new bug for that...

Comment 7 Tomas Mraz 2008-04-08 07:13:19 UTC

I have patched the sshd init script in rawhide.

Note You need to log in before you can comment on or make changes to this bug.