Bug 211129 - Crashes on x86-64
Crashes on x86-64
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: hal (Show other bugs)
4.0
All Linux
high Severity high
: ---
: ---
Assigned To: David Zeuthen
:
Depends On: 234251
Blocks:
  Show dependency treegraph
 
Reported: 2006-10-17 11:35 EDT by Bastien Nocera
Modified: 2013-03-05 22:47 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-02 11:56:30 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
gdm (132 bytes, text/plain)
2007-08-31 08:59 EDT, Alan Matsuoka
no flags Details
ipc.gm (265 bytes, text/plain)
2007-08-31 08:59 EDT, Alan Matsuoka
no flags Details
sreport.schleife (80 bytes, text/plain)
2007-08-31 09:00 EDT, Alan Matsuoka
no flags Details

  None (edit)
Description Bastien Nocera 2006-10-17 11:35:58 EDT
hal-0.4.2-4.EL4

#0  0x0000003f8fe2e21d in raise () at ../string/bits/string2.h:1000
#1  0x0000003f8fe2fa1e in abort () at ../string/bits/string2.h:1000
#2  0x0000003f93142445 in _dbus_abort () at dbus-sysdeps.c:86
#3  0x0000003f9312c9b6 in _dbus_real_assert (condition=Variable "condition" is
not available.
) at dbus-internals.c:455
#4  0x0000003f93131f5e in dbus_free (memory=0xa8d620) at dbus-memory.c:629
#5  0x000000000040aa50 in match_device_async_timeout (user_data=0xaed600) at
device_store.c:435
#6  0x0000003f9292956b in g_timeout_dispatch (source=0xaf3c50, callback=0x11b1,
user_data=0x6) at gmain.c:3301
#7  0x0000003f929266bd in g_main_context_dispatch (context=0x6403d0) at gmain.c:1942
#8  0x0000003f92928397 in g_main_context_iterate (context=0x6403d0,
block=-1880368832, dispatch=1, self=0xffffffffffffffff) at gmain.c:2573
#9  0x0000003f92928735 in g_main_loop_run (loop=0x6414e0) at gmain.c:2777
#10 0x000000000040b444 in main (argc=1, argv=0x7fbffff808) at hald.c:513
#11 0x0000003f8fe1c3fb in __libc_start_main (main=0x40aef0 <main>, argc=1,
ubp_av=0x7fbffff808, init=0x4287e0 <__libc_csu_init>, fini=Variable "fini" is
not available.
) at ../sysdeps/generic/libc-start.c:209
#12 0x000000000040586a in _start ()

We tried using the attached patch, which corresponds to the following upstream
commits:

http://gitweb.freedesktop.org/?p=hal.git;a=commit;h=98b280dc3561597b3526446be2c8aa803ec54a16
http://gitweb.freedesktop.org/?p=hal.git;a=commit;h=d22a988389649a02613b93cb2b593d2d3de6e9a3
http://gitweb.freedesktop.org/?p=hal.git;a=commit;h=7530cf018fecef349bc158c49eec799fde912117
http://gitweb.freedesktop.org/?p=hal.git;a=commit;h=869367455153923dd68312ee6681c8d795f1e7fe

and especially:
http://gitweb.freedesktop.org/?p=hal.git;a=commit;h=88de5835f14360ab0b5c6c56b4c7f28ca3c95f2e

The crash still occurs with the following assertion:
18:35:08.591 [W] linux/osspec.c:1745: Got SEQNUM=673256, but
last_hotplug_seqnum=673258
18:35:08.591 [I] linux/osspec.c:1345: action=remove seqnum=673256 subsystem=vc
sysfs_path=/sys/class/vc/vcsa8
18:35:08.592 [W] linux/osspec.c:1117: Removal of class device at sysfs path
/sys/class/vc/vcsa8 is not yet implemented
18:35:08.592 [I] linux/osspec.c:1686: SEQNUM=673258, TIMESTAMP=1146674108
18:35:08.592 [I] linux/osspec.c:1755: Queing up seqnum=673258,
sysfspath=/class/vc/vcsa8, subsys=vc
18:35:08.592 [I] linux/osspec.c:1404: action=add, seqnum=673258  subsystem=vc
devpath=/class/vc/vcsa8 devname=/dev/vcsa8
18:35:16.274 [W] linux/osspec.c:1203: No HAL device corresponding to device file
/dev/vcsa7
18:35:16.361 [W] linux/osspec.c:1203: No HAL device corresponding to device file
/dev/vcsa8
23699: assertion failed "n_blocks_outstanding >= 0" file "dbus-memory.c" line 629

I have no reproducer for this bug, and it doesn't seem to be known upstream.
Comment 1 Bastien Nocera 2006-10-17 11:37:47 EDT
I believe the problem might be a D-Bus problem, but I don't have any hints on
that, but given that the version on RHEL4 is quite old...
Comment 6 Jeremy West 2007-02-27 16:44:58 EST
Additional backtrace info:

============================
#0  0x000000367bb2e21d in raise () at ../string/bits/string2.h:1000
1000        ++__result;
(gdb) up
#1  0x000000367bb2fa1e in abort () at ../string/bits/string2.h:1000
1000        ++__result;
(gdb) up
#2  0x0000003680e42445 in dbus_shutdown () from /usr/lib64/libdbus-1.so.0
(gdb) up
#3  0x0000003680e2c9b6 in dbus_watch_handle () from /usr/lib64/libdbus-1.so.0
(gdb) up
#4  0x0000003680e31f5e in dbus_free () from /usr/lib64/libdbus-1.so.0
(gdb) up
#5  0x000000000040aa50 in match_device_async_timeout (user_data=0xa9e710)
    at device_store.c:435
435             info->callback (info->store, NULL, info->user_data);
(gdb) p user_data
$1 = 0xa9e710
(gdb) p * user_data
Attempt to dereference a generic pointer.
(gdb) p *user_data
Attempt to dereference a generic pointer.
(gdb) p (AsyncMatchInfo *) user_data
$2 = (struct {...} *) 0xa9e710
=======================================
And the code it references (device_store.c)
=======================================
match_device_async_timeout (gpointer user_data)
{
        AsyncMatchInfo *info = (AsyncMatchInfo *) user_data;

        info->callback (info->store, NULL, info->user_data);

        destroy_async_match_info (info);

        return FALSE;
}
======================================
Comment 16 Alan Matsuoka 2007-08-31 08:58:07 EDT
According to GM from VW this problem occurs if they are running their
"traditional" benchmarks, i.e.

1) rlogin to machine and start kill-gdm-endlessloop
2) rlogin to machine and start sreport.schleife
3) rlogin to machine and start ipc.gm

This apparently triggers the problem within about 24hours.

Will upload the three scripts they start in parallel
Comment 17 Alan Matsuoka 2007-08-31 08:59:03 EDT
Created attachment 183201 [details]
gdm
Comment 18 Alan Matsuoka 2007-08-31 08:59:43 EDT
Created attachment 183221 [details]
ipc.gm
Comment 19 Alan Matsuoka 2007-08-31 09:00:11 EDT
Created attachment 183241 [details]
sreport.schleife
Comment 20 Issue Tracker 2007-08-31 09:01:22 EDT
I've put some the attachments that were linked to this ticket on to the
BZ.

Internal Status set to 'Waiting on Engineering'

This event sent from IssueTracker by alanm 
 issue 88828
Comment 21 Bastien Nocera 2007-08-31 09:33:46 EDT
ipc.gm is a completely ludicrous "benchmark". Deleting IPC resources that don't
belong to it is very likely to cause problems. ipcrm is not a test tool, it's a
way to kill your system if you don't know what you're doing. The test script
doesn't know what it's doing.

That said, it's very likely the problem is with sysreport poking bits of
hardware. sysreport isn't supposed to be run on a live production system.

Is there any other way to reproduce the problem?

Note You need to log in before you can comment on or make changes to this bug.