Bug 161747
Summary: | hald immediately exits | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michal Jaegermann <michal> | ||||||
Component: | dbus | Assignee: | John (J5) Palmieri <johnp> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4 | CC: | davidz, jkeck | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | athlon | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2005-06-28 17:59:07 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Michal Jaegermann
2005-06-26 23:07:32 UTC
Created attachment 115996 [details]
A full strace from an attempt to start hald
Hi, what is the output of 'hald --daemon=no --verbose=yes --retain-privileges'? 'hald --daemon=no --verbose=yes --retain-privileges' prints 315K of stuff which ends with ...... 18:45:26.243 [I] hald.c:638: Device probing completed 18:45:26.244 [I] hald_dbus.c:2519: entering 18:45:26.258 [E] hald_dbus.c:2526: dbus_bus_get(): Failed to connect to socket /var/run/dbus/system_bus_socket: Connection refused 18:45:26.258 [I] util.c:1157: Killing helper with pid 29551 18:45:26.258 [I] util.c:1157: Killing helper with pid 29529 18:45:26.259 [I] util.c:1157: Killing helper with pid 29517 Hm, /var/run/dbus/system_bus_socket does exist, owned by 'root.root' and shows "srwxrwxrwx" permissions. Do you want the whole output? Hmm... is the messagebus running? (service messagebus status)
I assume this is a default install; how does it work with booting with
enforcing=0 or upgrading the relevant selinux packages?
> Do you want the whole output?
Nah, no need, looks like a D-BUS issue.
Bummer! # service messagebus status dbus-daemon is stopped # service messagebus restart Stopping system message bus: [FAILED] Starting system message bus: [ OK ] # service messagebus status dbus-daemon is stopped # service messagebus start Starting system message bus: Failed to start message bus: The pid file "/var/run/messagebus.pid" exists, if the message bus is not running, remove this file Indeed. I do not know at this moment why this is a problem. Thanks. I would think for a long time before looking at messagebus. Would be nice to have some message in a startup pointing in the right direction. You should be able to remove "/var/run/messagebus.pid" by hand - also, it should be cleaned up in rc.sysinit on reboots (along with other pid-files). Personally, I think it's a stupid that messagebus refuses to start if there are stale pid-files because there are other and better ways to detect if another instance is running. I should write a D-BUS patch for fixing that. So, if messagebus is started OK, does hald start OK? Sigh! You are not really reading what I write. Let's try again:
....
Starting system message bus: [ OK ]
# service messagebus status
dbus-daemon is stopped
....
> So, if messagebus is started OK, does hald start OK?
I have no idea because so far I failed to start messagebus even if
it says "[ OK ]".
'service messagebus start' indeed refuses to start with a stale pid
file but 'service messagebus restart' tries to start it; only without
much success although it is convinced that it did fine.
Mystery solved. For unknown reasons when 'dbus-0.33-3' was installed during an upgrade a %preinst script failed to add 'dbus' user (81.81) to a password file. This caused dbus-daemon to quietly exit with a reported status 0 (which to me looks like a nasty bug). After I added this missing account them messagebus service _really_ started and after that hald starts and is running. Thanks for pointing me in the right direction. Why an upgrade failed to add some accounts I have no idea. Bugs in anaconda? The other account which gave me troubles was 'smmsp'. BTW, a tail of a trace for dbus-daemon when user 81.81 is missing looks like this: .... chmod("/var/run/dbus/system_bus_socket", 0777) = 0 fcntl64(3, F_GETFD) = 0 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 gettimeofday({1119856649, 866006}, NULL) = 0 open("/dev/urandom", O_RDONLY) = 4 read(4, "\265\2767\215\314\0356\353\275\312\27\315", 12) = 12 close(4) = 0 gettimeofday({1119856649, 866400}, NULL) = 0 gettimeofday({1119856649, 866454}, NULL) = 0 chdir("/") = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7fa3708) = 13714 --- SIGCHLD (Child exited) @ 0 (0) --- open("/var/run/messagebus.pid", O_WRONLY|O_CREAT|O_EXCL, 0644) = 4 fcntl64(4, F_GETFL) = 0x1 (flags O_WRONLY) fstat64(4, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fba000 _llseek(4, 0, [0], SEEK_CUR) = 0 write(4, "13714\n", 6) = 6 close(4) = 0 munmap(0xb7fba000, 4096) = 0 exit_group(0) = ? Everyting nice and clean; only it does not run anymore. OK, so I'm reassigning this to D-BUS; we may want to reassign to anaconda at some point because of comment 8. What version did you upgrade from? The dbus account (81) has been there since fc2. > The dbus account (81) has been there since fc2.
So? How does it matter? The important thing is the account was not created
even if package scriplets say that it should be.
FYI - this particular upgrade was on a system basically RHL7.3 (with many
updates in the meantime).
Upgrading from RHL7.3 to FC4 with many updates in between is an important piece of information because it tells me something went wrong there. As updates are largly untested when jumping more than one version I am not suprised you have seen this error. What most likely happened was the version of rpm you were upgrading from did not support the %pre script in the rpm spec file which creates the user: %pre # Add the "dbus" user /usr/sbin/useradd -c 'System message bus' -u %{dbus_user_uid} \ -s /sbin/nologin -r -d '/' dbus 2> /dev/null || : or the useradd command was not in the same location or accept that perticular sequence of flags. Doing huge version jumps you will see problems like these pop up though they are usually fixable once tracked down. There is nothing much more that can be done about it. Closing as won't fix. The explanation about rpm version is totally implausible. First of all that version can at least _print_ those scripts (I can still check that elsewhere). In particular, on another RHL7.3 system $ rpm -qp --scripts dbus-0.22-10.FC3.2.i386.rpm preinstall scriptlet (through /bin/sh): # Add the "dbus" user /usr/sbin/useradd -c 'System message bus' -u 81 \ -s /sbin/nologin -r -d '/' dbus 2> /dev/null || : .... so it is not likely that it would be not supported. I do not have now 'dbus' package from FC4 handy but it really does not matter. Another issue that while performing an upgrade I would imagine that a copy of 'rpm' on an installation image was used, even if only one needs different libraries than those on a system to upgrade, and not the one you are replacing. The later would not work anyway as you would get "executable busy" error at least in some states of the whole operation. Yes, 'useradd' command from RHL7.3 does not have any problems with that sequences of flags but this is hardly an issue either. Anyway, this is a tail of /etc/passwd on another RHL7.3 installation after running from shell the scriplet in question: dbus:x:81:81:System message bus:/:/sbin/nologin See? So comment #13 is all wet. Created attachment 128705 [details]
strace -o hal_trace.txt /usr/sbin/hald --daemon=no --verbose=yes
Bug repeat in: dbus-0.33-3 hal-0.5.2-2 & dbus-0.33-3.fc4.1 hal-0.5.2-2.fc4.1 selinux disabled /var/run/messagebus.pid doesn't exists $ cat /etc/passwd | grep dbus dbus:x:81:81:System message bus:/:/sbin/nologin # /usr/sbin/hald --daemon=no --verbose=yes ... 14:50:35.738 [I] hald.c:653: Device probing completed 14:50:35.738 [I] hald_dbus.c:3063: entering 14:50:35.738 [E] hald_dbus.c:3070: dbus_bus_get(): Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory 14:50:35.738 [I] util.c:1262: Killing helper with pid 25713 14:50:35.738 [I] util.c:1262: Killing helper with pid 25700 strace in Comment #15 Nikolay. D-Bus isn't started. Please run dbus-daemon --system and tell me what it says. |