Bug 161747

Summary: hald immediately exits
Product: [Fedora] Fedora Reporter: Michal Jaegermann <michal>
Component: dbusAssignee: John (J5) Palmieri <johnp>
Status: CLOSED WONTFIX QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: davidz, jkeck
Target Milestone: ---   
Target Release: ---   
Hardware: athlon   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-06-28 17:59:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
A full strace from an attempt to start hald
none
strace -o hal_trace.txt /usr/sbin/hald --daemon=no --verbose=yes none

Description Michal Jaegermann 2005-06-26 23:07:32 UTC
Description of problem:

On an i386 system just updated to FC4 everything seems to be working fine
with an exception of hald which plainly refuses to start.  There are no
crashes or anything dramatic.  Simply every attempt ends up like that
in 'strace':

clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xb7ef2a28) = 21223
rt_sigaction(SIGCHLD, {0x80533c8, [CHLD], SA_RESTART}, {SIG_DFL}, 8) = 0
select(4, [3], NULL, [3], {25, 0})      = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
exit_group(1)                           = ?

In case you wonder if a corresponing account exists 'id haldaemon' writes:

uid=68(haldaemon) gid=68(haldaemon) groups=68(haldaemon)

and 'rpm -V hal' returns without any complaints.

A full trace from 'hald --retain-privileges', as this is how 'service hald'
starts, is attached but dropping that option does not make any difference.
Adding '--verbose=yes' is not much more informative.  It prints:

16:59:23.094 [I] hald.c:495: hal 0.5.2
16:59:23.095 [I] hald.c:497: Will daemonize
16:59:23.095 [I] hald.c:507: Becoming a daemon

and it does not run after that.

Version-Release number of selected component (if applicable):
hal-0.5.2-2

How reproducible:
always

Comment 1 Michal Jaegermann 2005-06-26 23:07:32 UTC
Created attachment 115996 [details]
A full strace from an attempt to start hald

Comment 2 David Zeuthen 2005-06-26 23:22:57 UTC
Hi, what is the output of 'hald --daemon=no --verbose=yes --retain-privileges'?

Comment 3 Michal Jaegermann 2005-06-27 00:53:16 UTC
'hald --daemon=no --verbose=yes --retain-privileges' prints 315K of
stuff which ends with
......
18:45:26.243 [I] hald.c:638: Device probing completed
18:45:26.244 [I] hald_dbus.c:2519: entering
18:45:26.258 [E] hald_dbus.c:2526: dbus_bus_get(): Failed to connect to socket
/var/run/dbus/system_bus_socket: Connection refused
18:45:26.258 [I] util.c:1157: Killing helper with pid 29551
18:45:26.258 [I] util.c:1157: Killing helper with pid 29529
18:45:26.259 [I] util.c:1157: Killing helper with pid 29517

Hm, /var/run/dbus/system_bus_socket does exist, owned by 'root.root' and shows
"srwxrwxrwx" permissions.

Do you want the whole output?



Comment 4 David Zeuthen 2005-06-27 01:10:26 UTC
Hmm... is the messagebus running? (service messagebus status)

I assume this is a default install; how does it work with booting with
enforcing=0 or upgrading the relevant selinux packages?

> Do you want the whole output?

Nah, no need, looks like a D-BUS issue.


Comment 5 Michal Jaegermann 2005-06-27 01:31:45 UTC
Bummer!

# service messagebus status
dbus-daemon is stopped
# service messagebus restart
Stopping system message bus:                               [FAILED]
Starting system message bus:                               [  OK  ]
# service messagebus status 
dbus-daemon is stopped
# service messagebus start 
Starting system message bus: Failed to start message bus: The pid file
"/var/run/messagebus.pid" exists, if the message bus is not running, 
remove this file

Indeed.  I do not know at this moment why this is a problem.

Thanks.  I would think for a long time before looking at messagebus.
Would be nice to have some message in a startup pointing in the right direction.


Comment 6 David Zeuthen 2005-06-27 04:43:00 UTC
You should be able to remove "/var/run/messagebus.pid" by hand - also, it should
be cleaned up in rc.sysinit on reboots (along with other pid-files). Personally,
I think it's a stupid that messagebus refuses to start if there are stale
pid-files because there are other and better ways to detect if another instance
is running. I should write a D-BUS patch for fixing that.

So, if messagebus is started OK, does hald start OK?

Comment 7 Michal Jaegermann 2005-06-27 05:39:49 UTC
Sigh!  You are not really reading what I write.  Let's try again:
....
Starting system message bus:                               [  OK  ]
# service messagebus status 
dbus-daemon is stopped
....

> So, if messagebus is started OK, does hald start OK?

I have no idea because so far I failed to start messagebus even if
it says "[  OK  ]".

'service messagebus start' indeed refuses to start with a stale pid
file but 'service messagebus restart' tries to start it; only without
much success although it is convinced that it did fine.


Comment 8 Michal Jaegermann 2005-06-27 07:37:12 UTC
Mystery solved.  For unknown reasons when 'dbus-0.33-3' was installed during
an upgrade a %preinst script failed to add 'dbus' user (81.81) to a password
file.  This caused dbus-daemon to quietly exit with a reported status 0
(which to me looks like a nasty bug).

After I added this missing account them messagebus service _really_ started
and after that hald starts and is running.  Thanks for pointing me in the
right direction.

Why an upgrade failed to add some accounts I have no idea. Bugs in anaconda?
The other account which gave me troubles was 'smmsp'.

Comment 9 Michal Jaegermann 2005-06-27 07:42:12 UTC
BTW, a tail of a trace for dbus-daemon when user 81.81 is missing looks
like this:
....
chmod("/var/run/dbus/system_bus_socket", 0777) = 0
fcntl64(3, F_GETFD)                     = 0
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
gettimeofday({1119856649, 866006}, NULL) = 0
open("/dev/urandom", O_RDONLY)          = 4
read(4, "\265\2767\215\314\0356\353\275\312\27\315", 12) = 12
close(4)                                = 0
gettimeofday({1119856649, 866400}, NULL) = 0
gettimeofday({1119856649, 866454}, NULL) = 0
chdir("/")                              = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0xb7fa3708) = 13714
--- SIGCHLD (Child exited) @ 0 (0) ---
open("/var/run/messagebus.pid", O_WRONLY|O_CREAT|O_EXCL, 0644) = 4
fcntl64(4, F_GETFL)                     = 0x1 (flags O_WRONLY)
fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7fba000
_llseek(4, 0, [0], SEEK_CUR)            = 0
write(4, "13714\n", 6)                  = 6
close(4)                                = 0
munmap(0xb7fba000, 4096)                = 0
exit_group(0)                           = ?

Everyting nice and clean; only it does not run anymore.

Comment 10 David Zeuthen 2005-06-27 13:17:44 UTC
OK, so I'm reassigning this to D-BUS; we may want to reassign to anaconda at
some point because of comment 8.

Comment 11 John (J5) Palmieri 2005-06-28 13:23:46 UTC
What version did you upgrade from?  The dbus account (81) has been there since fc2.

Comment 12 Michal Jaegermann 2005-06-28 14:17:50 UTC
> The dbus account (81) has been there since fc2.
So?  How does it matter?  The important thing is the account was not created
even if package scriplets say that it should be.

FYI - this particular upgrade was on a system basically RHL7.3 (with many
updates in the meantime).

Comment 13 John (J5) Palmieri 2005-06-28 17:59:07 UTC
Upgrading from RHL7.3 to FC4 with many updates in between is an important piece
of information because it tells me something went wrong there.  As updates are
largly untested when jumping more than one version I am not suprised you have
seen this error. What most likely happened was the version of rpm you were
upgrading from did not support the %pre script in the rpm spec file which
creates the user:

%pre
# Add the "dbus" user
/usr/sbin/useradd -c 'System message bus' -u %{dbus_user_uid} \
        -s /sbin/nologin -r -d '/' dbus 2> /dev/null || :

or the useradd command was not in the same location or accept that perticular
sequence of flags.

Doing huge version jumps you will see problems like these pop up though they are
usually fixable once tracked down.  There is nothing much more that can be done
about it.  Closing as won't fix.

Comment 14 Michal Jaegermann 2005-06-28 19:08:09 UTC
The explanation about rpm version is totally implausible.

First of all that version can at least _print_ those scripts (I can still
check that elsewhere).  In particular, on another RHL7.3 system

$ rpm -qp --scripts dbus-0.22-10.FC3.2.i386.rpm
preinstall scriptlet (through /bin/sh):
# Add the "dbus" user
/usr/sbin/useradd -c 'System message bus' -u 81 \
	-s /sbin/nologin -r -d '/' dbus 2> /dev/null || :
....

so it is not likely that it would be not supported.  I do not have
now 'dbus' package from FC4 handy but it really does not matter.

Another issue that while performing an upgrade I would imagine
that a copy of 'rpm' on an installation image was used, even if only
one needs different libraries than those on a system to upgrade,
and not the one you are replacing.  The later would not work anyway
as you would get "executable busy" error at least in some states
of the whole operation.

Yes, 'useradd' command from RHL7.3 does not have any problems with
that sequences of flags but this is hardly an issue either.  Anyway,
this is a tail of /etc/passwd on another RHL7.3 installation after
running from shell the scriplet in question:

dbus:x:81:81:System message bus:/:/sbin/nologin

See?  So comment #13 is all wet.

Comment 15 Mykola Ulianytskyi 2006-05-07 11:56:37 UTC
Created attachment 128705 [details]
strace -o hal_trace.txt /usr/sbin/hald --daemon=no --verbose=yes

Comment 16 Mykola Ulianytskyi 2006-05-07 12:39:27 UTC
Bug repeat in:
dbus-0.33-3
hal-0.5.2-2
&
dbus-0.33-3.fc4.1
hal-0.5.2-2.fc4.1

selinux disabled
/var/run/messagebus.pid doesn't exists

$ cat /etc/passwd | grep dbus
dbus:x:81:81:System message bus:/:/sbin/nologin

# /usr/sbin/hald --daemon=no --verbose=yes
...
14:50:35.738 [I] hald.c:653: Device probing completed
14:50:35.738 [I] hald_dbus.c:3063: entering
14:50:35.738 [E] hald_dbus.c:3070: dbus_bus_get(): Failed to connect to socket
/var/run/dbus/system_bus_socket: No such file or directory
14:50:35.738 [I] util.c:1262: Killing helper with pid 25713
14:50:35.738 [I] util.c:1262: Killing helper with pid 25700

strace in Comment #15

Comment 17 John (J5) Palmieri 2006-05-08 15:00:17 UTC
Nikolay.  D-Bus isn't started.  Please run dbus-daemon --system and tell me what
it says.