Description of problem: Today I've notice that during hald-addon-input suddenly started to eat 100%CPU I've just grabbed some strace of process to show what the loop was doing: poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 7, -1) = 1 poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 7, -1) = 1 ...poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN, revents=POLLIN}], 7, -1) = 2 read(4, "\217\243\332G\0\0\0\0=l\10\0\0\0\0\0\4\0\4\0\34\0\0\0\217\243\332G\0\0\0\0"..., 1024) = 72 read(4, 0x60e4f0, 1024) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 7, -1) = 1 ..... Version-Release number of selected component (if applicable): 1344688 hal-0.5.11-0.git20080304.4.fc9.x86_64 How reproducible: don't know :( Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Happened to me again - now I've attached with gdb (unfortunately withoug debuginfo this time :( #0 0x00000035336dc30f in poll () from /lib64/libc.so.6 #1 0x0000003b9c83aaf8 in ?? () from /lib64/libglib-2.0.so.0 #2 0x0000003b9c83b18d in g_main_loop_run () from /lib64/libglib-2.0.so.0 #3 0x0000000000401d75 in g_str_equal () #4 0x000000353361e47a in __libc_start_main () from /lib64/libc.so.6 #5 0x0000000000401b29 in g_str_equal () #6 0x00007fffdd6b5368 in ?? () #7 0x000000000000001c in ?? () #8 0x0000000000000001 in ?? () #9 0x00007fffdd6b5e55 in ?? () #10 0x0000000000000000 in ?? () Hopefully I'll be better prepared... (When I've detached gdb - the HAL daemon stopped its looping) So I could not check again with debugs Will try next time...
Ok - got the place with debuginfo - (gdb) bt #0 0x00000035336dc30f in poll () from /lib64/libc.so.6 #1 0x0000003b9c83aaf8 in g_main_context_iterate (context=<value optimized out>, block=<value optimized out>, dispatch=<value optimized out>, self=<value optimized out>) at gmain.c:2945 #2 0x0000003b9c83b18d in IA__g_main_loop_run (loop=<value optimized out>) at gmain.c:2844 #3 0x0000000000401d75 in main (argc=<value optimized out>, argv=<value optimized out>) at addon-input.c:487 Thought the places was probably already obvious - so just some variables must be set in weird way. I guess it should be easy to see when this loop appears - most probably returned descriptor is probably not being read for some reason ? strace: poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=5, events=POLLIN}, {fd=-1}, {fd=7, events=POLLIN}], 8, -1) = 2 poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=5, events=POLLIN}, {fd=-1}, {fd=7, events=POLLIN}], 8, -1) = 2 poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=5, events=POLLIN}, {fd=-1}, {fd=7, events=POLLIN}], 8, -1) = 2 poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=5, events=POLLIN}, {fd=-1}, {fd=7, events=POLLIN, revents=POLLIN}], 8, -1) = 3 read(7, "i*\344G\0\0\0\0\245\337\3\0\0\0\0\0\4\0\4\0\35\0\0\0i*\344G\0\0\0\0"..., 1024) = 72 read(7, 0x60dfb0, 1024) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=5, events=POLLIN}, {fd=-1}, {fd=7, events=POLLIN}], 8, -1) = 2 poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=5, events=POLLIN}, {fd=-1}, {fd=7, events=POLLIN}], 8, -1) = 2 poll([{fd=4, events=POLLIN}, {fd=6, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=9, events=POLLIN, revents=POLLNVAL}, {fd=5, events=POLLIN}, {fd=-1}, {fd=7, events=POLLIN}], 8, -1) = 2 ltrace:g_io_channel_read_chars(0x60d720, 0x60d690, 24, 0x7fff703dfde0, 0x7fff703dfde8) = 1 g_io_channel_read_chars(0x60d720, 0x60d690, 24, 0x7fff703dfde0, 0x7fff703dfde8) = 1 g_io_channel_read_chars(0x60d720, 0x60d690, 24, 0x7fff703dfde0, 0x7fff703dfde8) = 3 g_io_channel_read_chars(0x60d720, 0x60d690, 24, 0x7fff703dfde0, 0x7fff703dfde8) = 1 g_io_channel_read_chars(0x60d720, 0x60d690, 24, 0x7fff703dfde0, 0x7fff703dfde8) = 1 g_io_channel_read_chars(0x60d720, 0x60d690, 24, 0x7fff703dfde0, 0x7fff703dfde8) = 1 g_io_channel_read_chars(0x60d720, 0x60d690, 24, 0x7fff703dfde0, 0x7fff703dfde8) = 3 g_io_channel_read_chars(0x60d720, 0x60d690, 24, 0x7fff703dfde0, 0x7fff703dfde8) = 1 g_io_channel_read_chars(0x60d720, 0x60d690, 24, 0x7fff703dfde0, 0x7fff703dfde8) = 1 g_io_channel_read_chars(0x60d720, 0x60d690, 24, 0x7fff703dfde0, 0x7fff703dfde8) = 1 g_io_channel_read_chars(0x60d720, 0x60d690, 24, 0x7fff703dfde0, 0x7fff703dfde8) = 3 Are you able to get something from this - or do you need some more help ?
Problem still persists with the version hal-0.5.11-0.2.rc2.fc9.x86_64 in case it would be interesting...
I'm seeing this too, in hal-0.5.11-0.5.rc2.fc9.i386. The strace is a continious series of: poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=11, events=POLLIN, revents=POLLNVAL}, {fd=3, events=POLLIN}, {fd=10, events=POLLIN}, {fd=9, events=POLLIN}], 9, -1) = 1 Note that fd=11 is getting POLLNVAL; its not an open fd for the process (it doesn't appear in /proc/<pid>/fd for hald-addon-input).
I wonder if its caused by thinkfinger, the thinkpad fingerprint reader? I'm using thinkfinger configured to allow me to login with gdm. When it does so, it creates a virtual keyboard device, which is then removed once the login is complete. As an experiment, I ran "su" while hald-addon-input was spinning. The new input device appeared, and hald-addon-input opened it as fd 11, which stopped the poll spinning. When I swiped my finger and the thinkfinger device was removed, hald-addon-input dealt with the device removal properly and is no longer polling on fd 11 and is not spinning. I wonder, however, if there's a race or something which makes it fail to deal devices being removed, and it ends up spinning on their now-closed fds?
Hmmm - I'm also using thinkpad fingerprint reader - and I also often usually notice this problem after login via gdm after suspend (and usually in this case for some reason gdb doens't even show me availability of reader - thus only password is displayd - not sure if this combines together. But I'm sure that in at least one case the hal started to loop without gdm interaction, but maybe I've been logging on console - thus reader might be involved (usually there is slight delay when I notice that 1 cpu is permanently busy). I'll keep an eye on this and instead of restarting haldaemon I'll try to play with thinkfinger device.
I can confirm that as soon as the next thinfinger reading appears via the login the spinnig of hald-addon-input stops.
Bastien, you have been playing with thinkfinger earlier, any insights ?
If by earlier you mean last year :) It's probably caused by thinkfinger's uinput crap (they're using uinput to feed newlines into the auth when you've swiped a finger, so it's like the user pressing "Enter" after having typed in their password). This is the code: http://thinkfinger.svn.sourceforge.net/viewvc/thinkfinger/pam/pam_thinkfinger-uinput.c?view=markup
This should fix it: http://thread.gmane.org/gmane.comp.freedesktop.hal/11326
The link in that thread is wrong. http://gitweb.freedesktop.org/?p=hal.git;a=commitdiff;h=38d16647bf66f2e26a882ccd9bb5a0273278338d is the diff that fixes it. David, can you please add this to the F9 package?
Fixed in 0.5.11-0.7.rc2.fc9 and 0.5.11-0.7.rc2.fc10.