hal-0.4.2-4.EL4 #0 0x0000003f8fe2e21d in raise () at ../string/bits/string2.h:1000 #1 0x0000003f8fe2fa1e in abort () at ../string/bits/string2.h:1000 #2 0x0000003f93142445 in _dbus_abort () at dbus-sysdeps.c:86 #3 0x0000003f9312c9b6 in _dbus_real_assert (condition=Variable "condition" is not available. ) at dbus-internals.c:455 #4 0x0000003f93131f5e in dbus_free (memory=0xa8d620) at dbus-memory.c:629 #5 0x000000000040aa50 in match_device_async_timeout (user_data=0xaed600) at device_store.c:435 #6 0x0000003f9292956b in g_timeout_dispatch (source=0xaf3c50, callback=0x11b1, user_data=0x6) at gmain.c:3301 #7 0x0000003f929266bd in g_main_context_dispatch (context=0x6403d0) at gmain.c:1942 #8 0x0000003f92928397 in g_main_context_iterate (context=0x6403d0, block=-1880368832, dispatch=1, self=0xffffffffffffffff) at gmain.c:2573 #9 0x0000003f92928735 in g_main_loop_run (loop=0x6414e0) at gmain.c:2777 #10 0x000000000040b444 in main (argc=1, argv=0x7fbffff808) at hald.c:513 #11 0x0000003f8fe1c3fb in __libc_start_main (main=0x40aef0 <main>, argc=1, ubp_av=0x7fbffff808, init=0x4287e0 <__libc_csu_init>, fini=Variable "fini" is not available. ) at ../sysdeps/generic/libc-start.c:209 #12 0x000000000040586a in _start () We tried using the attached patch, which corresponds to the following upstream commits: http://gitweb.freedesktop.org/?p=hal.git;a=commit;h=98b280dc3561597b3526446be2c8aa803ec54a16 http://gitweb.freedesktop.org/?p=hal.git;a=commit;h=d22a988389649a02613b93cb2b593d2d3de6e9a3 http://gitweb.freedesktop.org/?p=hal.git;a=commit;h=7530cf018fecef349bc158c49eec799fde912117 http://gitweb.freedesktop.org/?p=hal.git;a=commit;h=869367455153923dd68312ee6681c8d795f1e7fe and especially: http://gitweb.freedesktop.org/?p=hal.git;a=commit;h=88de5835f14360ab0b5c6c56b4c7f28ca3c95f2e The crash still occurs with the following assertion: 18:35:08.591 [W] linux/osspec.c:1745: Got SEQNUM=673256, but last_hotplug_seqnum=673258 18:35:08.591 [I] linux/osspec.c:1345: action=remove seqnum=673256 subsystem=vc sysfs_path=/sys/class/vc/vcsa8 18:35:08.592 [W] linux/osspec.c:1117: Removal of class device at sysfs path /sys/class/vc/vcsa8 is not yet implemented 18:35:08.592 [I] linux/osspec.c:1686: SEQNUM=673258, TIMESTAMP=1146674108 18:35:08.592 [I] linux/osspec.c:1755: Queing up seqnum=673258, sysfspath=/class/vc/vcsa8, subsys=vc 18:35:08.592 [I] linux/osspec.c:1404: action=add, seqnum=673258 subsystem=vc devpath=/class/vc/vcsa8 devname=/dev/vcsa8 18:35:16.274 [W] linux/osspec.c:1203: No HAL device corresponding to device file /dev/vcsa7 18:35:16.361 [W] linux/osspec.c:1203: No HAL device corresponding to device file /dev/vcsa8 23699: assertion failed "n_blocks_outstanding >= 0" file "dbus-memory.c" line 629 I have no reproducer for this bug, and it doesn't seem to be known upstream.
I believe the problem might be a D-Bus problem, but I don't have any hints on that, but given that the version on RHEL4 is quite old...
Additional backtrace info: ============================ #0 0x000000367bb2e21d in raise () at ../string/bits/string2.h:1000 1000 ++__result; (gdb) up #1 0x000000367bb2fa1e in abort () at ../string/bits/string2.h:1000 1000 ++__result; (gdb) up #2 0x0000003680e42445 in dbus_shutdown () from /usr/lib64/libdbus-1.so.0 (gdb) up #3 0x0000003680e2c9b6 in dbus_watch_handle () from /usr/lib64/libdbus-1.so.0 (gdb) up #4 0x0000003680e31f5e in dbus_free () from /usr/lib64/libdbus-1.so.0 (gdb) up #5 0x000000000040aa50 in match_device_async_timeout (user_data=0xa9e710) at device_store.c:435 435 info->callback (info->store, NULL, info->user_data); (gdb) p user_data $1 = 0xa9e710 (gdb) p * user_data Attempt to dereference a generic pointer. (gdb) p *user_data Attempt to dereference a generic pointer. (gdb) p (AsyncMatchInfo *) user_data $2 = (struct {...} *) 0xa9e710 ======================================= And the code it references (device_store.c) ======================================= match_device_async_timeout (gpointer user_data) { AsyncMatchInfo *info = (AsyncMatchInfo *) user_data; info->callback (info->store, NULL, info->user_data); destroy_async_match_info (info); return FALSE; } ======================================
According to GM from VW this problem occurs if they are running their "traditional" benchmarks, i.e. 1) rlogin to machine and start kill-gdm-endlessloop 2) rlogin to machine and start sreport.schleife 3) rlogin to machine and start ipc.gm This apparently triggers the problem within about 24hours. Will upload the three scripts they start in parallel
Created attachment 183201 [details] gdm
Created attachment 183221 [details] ipc.gm
Created attachment 183241 [details] sreport.schleife
I've put some the attachments that were linked to this ticket on to the BZ. Internal Status set to 'Waiting on Engineering' This event sent from IssueTracker by alanm issue 88828
ipc.gm is a completely ludicrous "benchmark". Deleting IPC resources that don't belong to it is very likely to cause problems. ipcrm is not a test tool, it's a way to kill your system if you don't know what you're doing. The test script doesn't know what it's doing. That said, it's very likely the problem is with sysreport poking bits of hardware. sysreport isn't supposed to be run on a live production system. Is there any other way to reproduce the problem?