Bug 444410

Summary: bootsequence haldaemon failure
Product: [Fedora] Fedora Reporter: Roger Depreeuw <rogdepre>
Component: chkconfigAssignee: Bill Nottingham <notting>
Status: CLOSED WORKSFORME QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 9CC: gc, james, jthurtell, pertusus, redhat, rvokal, tgutwin, torel
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-12-04 20:57:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
init.d Head *
none
services status/priorities
none
Difference in priorities after resetpriorites for al daemons none

Description Roger Depreeuw 2008-04-28 09:34:55 UTC
Description of problem:Haldaemon fails during bootsequence


Version-Release number of selected component (if applicable):FC9-beta


How reproducible:


Steps to Reproduce:
1.Aplly yesterdays yum update, containing "2.6.25-8.fc9.i686"
2.Reboot the system
3.
  
Actual results:
Bootsequence waits 2min then continues succesfully while displaying haldaemon
failed message.On the running system haldaemon starts and restarts successfully
from "/etc/init.d/haldaemon restart" cmd at the prompt.

Expected results:
Succesfful and fast haldaemon startup during bootsequence 

Additional info:
from /var/log/messages
Apr 28 07:36:58 rok12net gnome-keyring-daemon[2407]: failed to initialize a HAL
context: (null)
Apr 28 07:36:58 rok12net gnome-keyring-daemon[2407]: Scheduling hal init retry
Apr 28 07:37:00 rok12net pulseaudio[2480]: pid.c: Stale PID file, overwriting.
Apr 28 07:37:01 rok12net pulseaudio[2480]: module-hal-detect.c: Couldn't connect
to hald: (null): (null)
Apr 28 07:37:28 rok12net gnome-keyring-daemon[2407]: failed to initialize a HAL
context: (null)
Apr 28 07:37:28 rok12net kernel: gnome-keyring-d[2407] general protection
ip:d8461f sp:bff2aa50 error:0 in libdbus-1.so.3.4.0[d79000+3f000]

Comment 1 David Zeuthen 2008-04-28 16:45:46 UTC
Are you running SELinux in enforcing mode? If so, does it work in permissive mode?

Comment 2 Tore H. Larsen 2008-04-28 19:40:43 UTC
I see the same. SELinux disabled. Also seems to influence NetworkManager which
dies during boot. WAR: restart haldaemon and NetworkManger after login.

Comment 3 Matthias Clasen 2008-05-02 16:21:23 UTC
        --verbose=yes|no      Print out debug (overrides HALD_VERBOSE)
        --use-syslog          Print out debug messages to syslog instead of
                              stderr. Use this option to get debug messages
                              if hald runs as a daemon.

Can you get any output from hal using these options ?

Comment 4 Tore H. Larsen 2008-05-02 21:39:52 UTC
May  2 23:18:27 bgo-s-101 hald[2539]: 23:18:27.419 [I] hald.c:669: hal 0.5.11rc2
May  2 23:18:27 bgo-s-101 hald[2539]: 23:18:27.420 [I] hald.c:678: Will daemonize
May  2 23:18:27 bgo-s-101 hald[2539]: 23:18:27.420 [I] hald.c:679: Becoming a daemon
May  2 23:18:27 bgo-s-101 hald[2540]: 23:18:27.422 [I] hald_dbus.c:5381: local
server is listening at unix:a
bstract=/var/run/hald/dbus-MGoEELB39j,guid=b352a454e6639b29ca70902c481b8523
May  2 23:18:27 bgo-s-101 hald[2540]: 23:18:27.423 [E] hald_dbus.c:5747:
dbus_bus_get(): Failed to connect t
o socket /var/run/dbus/system_bus_socket: No such file or directory
May  2 23:19:17 bgo-s-101 pulseaudio[4168]: module-hal-detect.c: Couldn't
connect to hald: (null): (null)
May  2 23:19:43 bgo-s-101 pulseaudio[4410]: module-hal-detect.c: Couldn't
connect to hald: (null): (null)


Comment 5 Tore H. Larsen 2008-05-02 22:36:21 UTC
WAR: 'cd /etc/rc5.d ; mv S26haldaemon S28haldaemon'  and hald do not die. 

Comment 6 Tom Gutwin 2008-05-04 02:37:06 UTC
(In reply to comment #2)
> I see the same. SELinux disabled. Also seems to influence NetworkManager which
> dies during boot. WAR: restart haldaemon and NetworkManger after login.

For what its worth... I see all the same symptoms.
Restarting , hal and network after login also brings things to a working state.
My SELinux is disabled.

The delay at my boot is longer than 2 minutes. Its more like 5.

Comment 7 Matthias Clasen 2008-05-05 04:31:06 UTC
looks like the real issue is with dbus ? 
Is dbus getting started before hal in your boot sequence ? (It shows up as
"messagebus" in boot messages)

Comment 8 Giuseppe Castagna 2008-05-06 15:25:40 UTC
A simple suggestion. Check whether the problem is the same as what happened for
bug #444859: for some unknow reasons a bunch of services changed their priority
to S99. Among this there is also messagebus, which thus starts AFTER haldaemon.
The reason why this happened is a mystery for me, as well as the fact as trying
to reset priorities by chkconfig resetpriotities has no effect.

So it is a chkconfig problem rather than a haldaemon problem

Comment 9 Tore H. Larsen 2008-05-06 15:57:37 UTC
Don't see any errors here, but me having moved haldaemon to S28. Do you?

[root@bgo-s-101 rc5.d]# ls -latr *mess* *hald* *udev* *acpi* *Consol* *ntpd*
*NetworkM* *syslog* *yum* *blue*
lrwxrwxrwx 1 root root 20 2008-02-09 12:43 S90ConsoleKit -> ../init.d/ConsoleKit
lrwxrwxrwx 1 root root 22 2008-02-09 12:59 S97yum-updatesd -> ../init.d/yum-updatesd
lrwxrwxrwx 1 root root 20 2008-04-09 08:31 S27messagebus -> ../init.d/messagebus
lrwxrwxrwx 1 root root 17 2008-04-09 09:10 S12rsyslog -> ../init.d/rsyslog
lrwxrwxrwx 1 root root 24 2008-04-19 13:13 S99NetworkManager ->
../init.d/NetworkManager
lrwxrwxrwx 1 root root 17 2008-04-19 13:13 K75ntpdate -> ../init.d/ntpdate
lrwxrwxrwx 1 root root 15 2008-04-19 13:13 S26acpid -> ../init.d/acpid
lrwxrwxrwx 1 root root 19 2008-04-24 17:18 S26udev-post -> ../init.d/udev-post
lrwxrwxrwx 1 root root 19 2008-05-01 22:52 S28haldaemon -> ../init.d/haldaemon
lrwxrwxrwx 1 root root 14 2008-05-06 17:32 S58ntpd -> ../init.d/ntpd
lrwxrwxrwx 1 root root 19 2008-05-06 17:32 S50bluetooth -> ../init.d/bluetooth


Comment 10 Roger Depreeuw 2008-05-06 21:25:46 UTC
I enabled haldaemon again to start during the bootsequence at run levels 2,3,4,5
I then disabled messagebus in level 2 and left it on at levels 3,4,5.
Both cold boot and warm reboot resulted in a clean bootsequence and haldaemon
was started nicely. Then i switched messagebus back on at run level 2 and the
bootsequncing ran without unexpected results and haldaemon was again running.
Does this make sens to you?
Regards

Comment 11 Tom Gutwin 2008-05-10 04:51:55 UTC
(In reply to comment #8)
> A simple suggestion. Check whether the problem is the same as what happened for
> bug #444859: for some unknow reasons a bunch of services changed their priority
> to S99. Among this there is also messagebus, which thus starts AFTER haldaemon.
> The reason why this happened is a mystery for me, as well as the fact as trying
> to reset priorities by chkconfig resetpriotities has no effect.
> 
> So it is a chkconfig problem rather than a haldaemon problem

Yes. Thats it. messagebus is S99. I will reset them.  Thanks for the help.


Comment 12 Tom Gutwin 2008-05-12 04:22:38 UTC
Because reset priorities by chkconfig resetpriotities is not working...
I had to manually mv the links in rc.d/* subdirs to the correct priorities.
This at least got me booting without the haldaemon delay.
 BUT 

Things are still not perfect. I had to do a 
   /etc/init.d/haldaemon restart
   /etc/init.d/network restart
to get my network working (NetworkManager still won't hook up)

For example in rc3.d
#!/bin/bash
mv S99acpid S44acpid
mv S99haldaemon S26haldaemon
mv S99messagebus S22messagebus
mv S99cups S98cups
mv S99network S10network
mv S99NetworkManager S27NetworkManager
mv S99nscd S30nscd
mv S99ntpd S58ntpd
mv S99ntpdate S57ntpdate
mv S99sendmail S80sendmail
mv S99xinetd S56xinetd
mv S99yum-updatesd S97yum-updatesd


rpmPackages is at:
  chkconfig-1.3.37-2

Comment 13 Bill Nottingham 2008-05-12 15:55:19 UTC
Please don't manually move the links. You need to reset the priorities in order,
unfortunately.

What services do you have installed now? (names and versions)

Comment 14 Bug Zapper 2008-05-14 10:17:37 UTC
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 15 Tom Gutwin 2008-05-17 02:42:05 UTC
(In reply to comment #13)
> Please don't manually move the links. You need to reset the priorities in order,
> unfortunately.

Do you mean 
chkconfig network resetpriotities 
chkconfig messagebus resetpriotities 
chkconfig haldaemon resetpriotities 
chkconfig NetworkManager resetpriotities 
....

in that order because that is the order they should start ? --> 10, 22 26 27

chkconfig throuws an error
[root@warp2 init.d]# /sbin/chkconfig network resetpriotities 
chkconfig version 1.3.37 - Copyright (C) 1997-2000 Red Hat, Inc.
This may be freely redistributed under the terms of the GNU Public License.

usage:   chkconfig [--list] [name]
         chkconfig --add <name>
         chkconfig --del <name>
         chkconfig --override <name>
         chkconfig [--level <levels>] <name> <on|off|reset|resetpriorities>

I confirmed ... No changes were made to the links

rpm -q chkconfig -i  renders

Name        : chkconfig                    Relocations: (not relocatable)
Version     : 1.3.37                            Vendor: Fedora Project
Release     : 2                             Build Date: Wed 20 Feb 2008 04:36:39
AM PST
Install Date: Sat 05 Apr 2008 10:51:35 PM PDT      Build Host:
xenbuilder4.fedora.phx.redhat.com
Group       : System Environment/Base       Source RPM: chkconfig-1.3.37-2.src.rpm
Size        : 599326                           License: GPLv2
Signature   : DSA/SHA1, Tue 04 Mar 2008 07:38:40 AM PST, Key ID da84cbd430c9ecf8
Packager    : Fedora Project
snip...

> 
> What services do you have installed now? (names and versions)

See attached files for chkconfig --list and a head * in the init.d dir


Comment 16 Tom Gutwin 2008-05-17 02:43:27 UTC
Created attachment 305780 [details]
init.d Head *

A head list of the init.d files to show the priorities

Comment 17 Tom Gutwin 2008-05-17 02:44:26 UTC
Created attachment 305781 [details]
services status/priorities

chkconfig --list

Comment 18 Tom Gutwin 2008-05-19 05:26:25 UTC
I went through each service and reset their priorities one at a time.
I found Some interesting things happening...

Some calls to chkconfig <servicename> resetpriotities 
worked; resetting the priority properly.
 - NetworkManager
 - haldaemon
 - bluetooth
 - messagebus
 - rsyslog
All were at S99 before the reset, and then went to the correct priority.

However, when I did the same for others, it took about a 1 1/2 seconds to
execute which made me curious.  and then when I looked if the priorities were
reset... NO everything went back to S99. (even the ones I had eyeballed correct
just before).
So, I went through one by one doing a chkconfig <servicename> resetpriotities
and then doing an ' ls -latr /etc/rc.d/rc5.d' to see what changed.
these are the services that 'broke' the reset:
   ntpdate
   acpid
   cups
   nscd
They had/are at S99 even though they should not be.
There might be others I did not try.
Is there something common among these??? that is messing with chkconfig








Comment 19 Tom Gutwin 2008-05-20 04:39:25 UTC
Bill,  I saw in https://bugzilla.redhat.com/show_bug.cgi?id=444859#c7 that you
said to NOT have NetworkManager have a PROVIDES for $network in the init section.

Mine does. What should it be?

Here is the NetworkManager head...
#!/bin/sh
#
# NetworkManager:   NetworkManager daemon
#
# chkconfig: - 27 73
# description:  This is a daemon for automatically switching network \
#               connections to the best available connection.
#
# processname: NetworkManager
# pidfile: /var/run/NetworkManager/NetworkManager.pid
#
### BEGIN INIT INFO
# Provides: network_manager $network
# Required-Start: messagebus haldaemon
# Required-Stop: messagebus haldaemon
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: start and stop NetworkManager
# Description: NetworkManager is a tool for easily managing network connections
### END INIT INFO


Comment 20 Bill Nottingham 2008-05-20 14:48:00 UTC
For F8 and previous, it shouldn't. For F9, it should be OK.

Comment 21 Rolf Fokkens 2008-05-24 11:22:47 UTC
Created attachment 306575 [details]
Difference in priorities after resetpriorites for al daemons

Comment 22 Rolf Fokkens 2008-05-24 11:25:36 UTC
The submitted diff file shows the difference of priorities in /etc/rc.d/rc3.d
after doing a chkconfig <service> resetpriorities for alle services. This was
triggered by the fact that my haldaemon didn's start during boot as well.

Comment 23 Tuc 2008-05-30 15:53:52 UTC
Hate to be a "me too", but I'm seeing some weird stuff from chkconfig.

Mysql has :

# chkconfig: 2345 64 36
# description: A very fast and reliable SQL database engine.

I run chkconfig mysql resetpriorities

and it shows up as S99mysql ... ?????? It also seems to reset my boinc which I
didn't even tell it to, which has :

# chkconfig: 345 98 03
# description: This script starts the local BOINC client as a daemon
#         For more information about BOINC (the Berkeley Open Infrastructure
#         for Network Computing) see http://boinc.berkeley.edu
# processname: boinc
# config: /etc/sysconfig/boinc

but still shows as S99boinc



Comment 24 Bill Nottingham 2008-05-30 15:55:33 UTC
Tuc - are those the *full* headers, or are there any LSB INIT INFO blocks? Those
take precedence...

Comment 25 Tuc 2008-05-30 16:07:37 UTC
<BLUSH> Sorry. Did not know that. Yes, there are LSB INIT INFO blocks :

MYSQL:

### BEGIN INIT INFO
# Provides: mysql
# Required-Start: $local_fs $network $remote_fs
# Should-Start: ypbind nscd ldap ntpd xntpd
# Required-Stop: $local_fs $network $remote_fs
# Default-Start:  2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: start and stop MySQL
# Description: MySQL is a very fast and reliable SQL database engine.
### END INIT INFO


boinc:

### BEGIN INIT INFO
# Provides: boinc
# Required-Start: $network
# Required-Stop:  $network
# Default-Start: 3 4 5
# Default-Stop: 0 1 2 6
# Description: This script starts the local BOINC client as a daemon
#         For more information about BOINC (the Berkeley Open Infrastructure
#         for Network Computing) see http://boinc.berkeley.edu
### END INIT INFO


My original intent is to start mysql before radius. Radius only has the
chkconfig so it started it as S88. I'm not sure the interaction of chkconfig and
LSB INIT INFO into deciding what to call the entry (And why it mucks with boinc
when they are completely different).

I could just rename the link to S64mysql, but if someone else comes along later
it could reset it, or if the package is updated it could do the same.

So with a quandry like this, whats the best way to go about it?

Comment 26 Bill Nottingham 2008-05-30 16:51:28 UTC
Presumably, you have something providing $network starting at S99... figure out
why that is happening, and you'll fix the problem. chkconfig NetworkManager
resetpriorities may help.

Comment 27 Tuc 2008-05-30 20:19:40 UTC
As requested, since this is not on a Fedora release, 449164 opened 

Comment 28 Ian Collier 2008-06-11 16:24:22 UTC
This probably doesn't add anything, but I've just had to diagnose the same
problem on a laptop after upgrading from F7 to F9 (by booting the F9 DVD and
choosing upgrade, and then later running "yum update" to update all packages to
current).  After the upgrade, in runlevel 5, S26haldaemon was starting before
S26messagebus.  On bootup, hal was pausing for several minutes and then saying
"FAILED" with no further info.  Resetting the priority of messagebus to 22 fixed
this.  Possibly unrelated: the network service seemed to be turned off.

Comment 29 John Thurtell 2008-11-30 00:29:25 UTC
Problem still apparent during upgrade to F10 (from F8 in my case)

Symptoms: Kbd and mouse not working after upgrade, so unable to do graphical login (gdm). Remote login confirmed system working ok, with exception of haldaemon. /var/log/Xorg.0.log reported errors on default kbd and pointer, possibly related to hald.

Resetting priorities of haldaemon and messagebus using chkconfig cures problem

Comment 30 Bill Nottingham 2008-12-04 20:57:27 UTC
Looking over this, I'm not seeing an actual bug with chkconfig itself - it's honoring the priorities and depedencies correctly. If you're seeing this issue, you may need to file a bug against the individual packages that they may need to reset their priorities on upgrade.