Bug 211129 - Crashes on x86-64
Crashes on x86-64
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: hal (Show other bugs)
All Linux
high Severity high
: ---
: ---
Assigned To: David Zeuthen
Depends On: 234251
  Show dependency treegraph
Reported: 2006-10-17 11:35 EDT by Bastien Nocera
Modified: 2013-03-05 22:47 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2007-10-02 11:56:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
gdm (132 bytes, text/plain)
2007-08-31 08:59 EDT, Alan Matsuoka
no flags Details
ipc.gm (265 bytes, text/plain)
2007-08-31 08:59 EDT, Alan Matsuoka
no flags Details
sreport.schleife (80 bytes, text/plain)
2007-08-31 09:00 EDT, Alan Matsuoka
no flags Details

  None (edit)
Description Bastien Nocera 2006-10-17 11:35:58 EDT

#0  0x0000003f8fe2e21d in raise () at ../string/bits/string2.h:1000
#1  0x0000003f8fe2fa1e in abort () at ../string/bits/string2.h:1000
#2  0x0000003f93142445 in _dbus_abort () at dbus-sysdeps.c:86
#3  0x0000003f9312c9b6 in _dbus_real_assert (condition=Variable "condition" is
not available.
) at dbus-internals.c:455
#4  0x0000003f93131f5e in dbus_free (memory=0xa8d620) at dbus-memory.c:629
#5  0x000000000040aa50 in match_device_async_timeout (user_data=0xaed600) at
#6  0x0000003f9292956b in g_timeout_dispatch (source=0xaf3c50, callback=0x11b1,
user_data=0x6) at gmain.c:3301
#7  0x0000003f929266bd in g_main_context_dispatch (context=0x6403d0) at gmain.c:1942
#8  0x0000003f92928397 in g_main_context_iterate (context=0x6403d0,
block=-1880368832, dispatch=1, self=0xffffffffffffffff) at gmain.c:2573
#9  0x0000003f92928735 in g_main_loop_run (loop=0x6414e0) at gmain.c:2777
#10 0x000000000040b444 in main (argc=1, argv=0x7fbffff808) at hald.c:513
#11 0x0000003f8fe1c3fb in __libc_start_main (main=0x40aef0 <main>, argc=1,
ubp_av=0x7fbffff808, init=0x4287e0 <__libc_csu_init>, fini=Variable "fini" is
not available.
) at ../sysdeps/generic/libc-start.c:209
#12 0x000000000040586a in _start ()

We tried using the attached patch, which corresponds to the following upstream


and especially:

The crash still occurs with the following assertion:
18:35:08.591 [W] linux/osspec.c:1745: Got SEQNUM=673256, but
18:35:08.591 [I] linux/osspec.c:1345: action=remove seqnum=673256 subsystem=vc
18:35:08.592 [W] linux/osspec.c:1117: Removal of class device at sysfs path
/sys/class/vc/vcsa8 is not yet implemented
18:35:08.592 [I] linux/osspec.c:1686: SEQNUM=673258, TIMESTAMP=1146674108
18:35:08.592 [I] linux/osspec.c:1755: Queing up seqnum=673258,
sysfspath=/class/vc/vcsa8, subsys=vc
18:35:08.592 [I] linux/osspec.c:1404: action=add, seqnum=673258  subsystem=vc
devpath=/class/vc/vcsa8 devname=/dev/vcsa8
18:35:16.274 [W] linux/osspec.c:1203: No HAL device corresponding to device file
18:35:16.361 [W] linux/osspec.c:1203: No HAL device corresponding to device file
23699: assertion failed "n_blocks_outstanding >= 0" file "dbus-memory.c" line 629

I have no reproducer for this bug, and it doesn't seem to be known upstream.
Comment 1 Bastien Nocera 2006-10-17 11:37:47 EDT
I believe the problem might be a D-Bus problem, but I don't have any hints on
that, but given that the version on RHEL4 is quite old...
Comment 6 Jeremy West 2007-02-27 16:44:58 EST
Additional backtrace info:

#0  0x000000367bb2e21d in raise () at ../string/bits/string2.h:1000
1000        ++__result;
(gdb) up
#1  0x000000367bb2fa1e in abort () at ../string/bits/string2.h:1000
1000        ++__result;
(gdb) up
#2  0x0000003680e42445 in dbus_shutdown () from /usr/lib64/libdbus-1.so.0
(gdb) up
#3  0x0000003680e2c9b6 in dbus_watch_handle () from /usr/lib64/libdbus-1.so.0
(gdb) up
#4  0x0000003680e31f5e in dbus_free () from /usr/lib64/libdbus-1.so.0
(gdb) up
#5  0x000000000040aa50 in match_device_async_timeout (user_data=0xa9e710)
    at device_store.c:435
435             info->callback (info->store, NULL, info->user_data);
(gdb) p user_data
$1 = 0xa9e710
(gdb) p * user_data
Attempt to dereference a generic pointer.
(gdb) p *user_data
Attempt to dereference a generic pointer.
(gdb) p (AsyncMatchInfo *) user_data
$2 = (struct {...} *) 0xa9e710
And the code it references (device_store.c)
match_device_async_timeout (gpointer user_data)
        AsyncMatchInfo *info = (AsyncMatchInfo *) user_data;

        info->callback (info->store, NULL, info->user_data);

        destroy_async_match_info (info);

        return FALSE;
Comment 16 Alan Matsuoka 2007-08-31 08:58:07 EDT
According to GM from VW this problem occurs if they are running their
"traditional" benchmarks, i.e.

1) rlogin to machine and start kill-gdm-endlessloop
2) rlogin to machine and start sreport.schleife
3) rlogin to machine and start ipc.gm

This apparently triggers the problem within about 24hours.

Will upload the three scripts they start in parallel
Comment 17 Alan Matsuoka 2007-08-31 08:59:03 EDT
Created attachment 183201 [details]
Comment 18 Alan Matsuoka 2007-08-31 08:59:43 EDT
Created attachment 183221 [details]
Comment 19 Alan Matsuoka 2007-08-31 09:00:11 EDT
Created attachment 183241 [details]
Comment 20 Issue Tracker 2007-08-31 09:01:22 EDT
I've put some the attachments that were linked to this ticket on to the

Internal Status set to 'Waiting on Engineering'

This event sent from IssueTracker by alanm 
 issue 88828
Comment 21 Bastien Nocera 2007-08-31 09:33:46 EDT
ipc.gm is a completely ludicrous "benchmark". Deleting IPC resources that don't
belong to it is very likely to cause problems. ipcrm is not a test tool, it's a
way to kill your system if you don't know what you're doing. The test script
doesn't know what it's doing.

That said, it's very likely the problem is with sysreport poking bits of
hardware. sysreport isn't supposed to be run on a live production system.

Is there any other way to reproduce the problem?

Note You need to log in before you can comment on or make changes to this bug.