Bug 199014 - kernel updates 2.6.17-1.2396.fc6 and 2.6.17-1.2401.fc6 make various services to fail
Summary: kernel updates 2.6.17-1.2396.fc6 and 2.6.17-1.2401.fc6 make various services ...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
Assignee: Roland McGrath
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-07-15 17:32 UTC by Michal Jaegermann
Modified: 2007-11-30 22:11 UTC (History)
6 users (show)

Fixed In Version: 2.6.17-1.2431.fc6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-07-21 05:56:25 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Michal Jaegermann 2006-07-15 17:32:48 UTC
Description of problem:

When booting recent kernels 2.6.17-1.2396.fc6 and 2.6.17-1.2401.fc6
things seem to be normal enough when startup scripts are running but
trying status of various services from /etc/init.d/ shows up things
of that sort:

sendmail dead but subsys locked
acpid dead but subsys locked
automount is stopped
portmap dead but subsys locked
rpc.idmapd is stopped
smartd dead but subsys locked
dbus-daemon is stopped

An attempt of 'service sendmail restart' produces the following output

Shutting down sendmail:                                    [FAILED]
Shutting down sm-client:                                   [  OK  ]
Starting sendmail:                                         [  OK  ]
Starting sm-client:                                        [  OK  ]

with "sendmail dead but subsys locked" status and 

NOQUEUE: SYSERR(root): opendaemonsocket: daemon MTA: cannot bind: Address
already in use
daemon MTA: problem creating SMTP socket

complaints in logs.

This is not selinux problem because selinux on that system is turned off.
Booting 2.6.17-1.2366.fc6 makes all of the above, and more, to work again.

Version-Release number of selected component (if applicable):
2.6.17-1.2396.fc6 and 2.6.17-1.2401.fc6

How reproducible:
all the time (but tried only on x86_64)

Comment 1 Dave Jones 2006-07-16 03:29:43 UTC
Roland, could this be your utrace stuff ?

Comment 2 Jay Cliburn 2006-07-16 15:55:17 UTC
I see the same behavior for acpid under the 2405 kernel.

Under kernel-2.6.17-1.2356.fc6, the command 'service acpid status' properly
reports "acpid (pid xxxx) is running...".  Under kernel-2.6.17-1.2405.fc6, the
same command reports that acpid is dead, even though it's running.  This happens
on both an i386 and an x86_64 system.

[root@gadwall ~]# ps -ef | grep acpid
root        11     7  0 10:50 ?        00:00:00 [kacpid]
root      1636     1  0 10:52 ?        00:00:00 /usr/sbin/acpid
root      1976  1926  0 10:57 pts/0    00:00:00 grep acpid
[root@gadwall ~]# service acpid status
acpid dead but subsys locked
[root@gadwall ~]# service acpid stop
Stopping acpi daemon:                                      [FAILED]
[root@gadwall ~]# ps -ef | grep acpid
root        11     7  0 10:50 ?        00:00:00 [kacpid]
root      1636     1  0 10:52 ?        00:00:00 /usr/sbin/acpid
root      2002  1926  0 10:58 pts/0    00:00:00 grep acpid
[root@gadwall ~]# service acpid status
acpid is stopped

Comment 3 Jim Cornette 2006-07-16 18:35:12 UTC
When shutting down the computer, I see sevral services which fail to shutdown.
Joining the bug for tracking progress.

Comment 4 Michal Jaegermann 2006-07-16 19:11:23 UTC
> ... I see sevral services which fail to shutdown.

Failures on a shutdown are a side-effect of various services to be
"dead"; or at least to be reported that way.

After I dropped '-c' option to pidof in /etc/init.d/functions all
services listed in the original report, and more, are now seen as
"running" with 2.6.17-1.2405.fc6.  Consequently a shutdown also
works without troubles.  Well, with an exception of sm-client which
appears to be shut down earlier then /etc/init.d/sendmail is trying
to do that explicitely and hence a reported failure.

Why switching a kernel version has such effect on pidof, and whos
bug is that really, I have no idea.

Comment 5 Roland McGrath 2006-07-17 09:09:35 UTC
Unless there is some info in dmesg to go on, I have no speculation about these
problems.  I can believe that random instability was caused by the utrace
changes, just on principle, but I don't have anything to go on.

Comment 6 Jay Cliburn 2006-07-17 14:45:54 UTC
(In reply to comment #5)
> Unless there is some info in dmesg to go on, I have no speculation about these
> problems.  I can believe that random instability was caused by the utrace
> changes, just on principle, but I don't have anything to go on.

Your recommendation then is to file bugs against each individual service
exhibiting the behavior?

Comment 7 Michal Jaegermann 2006-07-17 15:34:49 UTC
> I can believe that random instability

There is no much "random" about it.  A behaviour is consistent and
depends on a kernel version in use.

If you will look closer then programs in question actually do run
but, with recent kernel versions, are reported as dead.  I other
words - kernels and user-space got out of sync.  Should this issue
be changed to SysVinit and/or initscripts?  It is not clear what
else could be affected by what happened in kernel.

Comment 8 Bill Nottingham 2006-07-20 00:15:59 UTC
This is definitely a kernel issue... /proc/*/root is now only readable for the
current task. Is this really intentional?

Comment 9 Roland McGrath 2006-07-20 06:35:19 UTC
Ah, the specific diagnosis of the kernel behavior makes all the difference.
Now that I know what the issue is, that sounds like it's probably my bug.


Comment 10 Bill Nottingham 2006-07-20 15:25:12 UTC
FWIW, the usage case we're using it for is for pidof's '-c' option ; we use it
in the init scripts to make sure we only find/kill processes that are running in
the same root as the script, so we don't kill daemons in chroots, or similar.

Comment 11 Roland McGrath 2006-07-20 23:58:50 UTC
Handy reproducer:

-bash-3.1$ ls -ld /proc/$$/root /proc/self/root
ls: cannot read symbolic link /proc/2106/root: Permission denied
lrwxrwxrwx 1 roland roland 0 Jul 19 23:42 /proc/2106/root
lrwxrwxrwx 1 roland roland 0 Jul 20 17:05 /proc/self/root -> /


I should have a fix shortly.

Comment 12 Roland McGrath 2006-07-21 05:56:25 UTC
This is fixed as of kernel-2.6.17-1.2431.fc6

Comment 13 Jim Cornette 2006-07-22 02:42:34 UTC
services shut down properly, without the errors seen with previous kernels.
Fixed in this regard.


Note You need to log in before you can comment on or make changes to this bug.