Bug 203385 - frysk testsuite produces kernel messages indicating lockups
Summary: frysk testsuite produces kernel messages indicating lockups
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Chris Moller
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 173278 203387
TreeView+ depends on / blocked
 
Reported: 2006-08-21 16:59 UTC by Mark Wielaard
Modified: 2007-11-30 22:11 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-10-03 18:04:30 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Sourceware 3101 0 None None None Never

Description Mark Wielaard 2006-08-21 16:59:13 UTC
Description of problem:

frysk testsuite produces kernel messages indicating lockups.

Version-Release number of selected component (if applicable):

Linux
hermans.wildebeest.org 2.6.17-1.2573.fc6 #1 SMP Fri Aug 18 13:26:49 EDT
2006 i686 i686 i386 GNU/Linux

How reproducible:

Sometimes, does not occur always.

Steps to Reproduce:
1. run make check inside a build of Frysk cvs.
2.
3.
  
Actual results:

Messages like the following during the make check results:
Message from syslogd@hermans at Mon Aug 21 16:43:44 2006 ...
hermans kernel: BUG: spinlock lockup on CPU#1, TestRunner/883, 6b6b6b7b (Not
tainted)

Expected results:

Full run of make check without any kernel messages.

Additional info:

The following output was found in dmesg:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000014
 printing eip:
c04ecce0
*pde = 00000000
Oops: 0000 [#1]
SMP
last sysfs file: /devices/system/cpu/cpu1/cpufreq/scaling_cur_freq
Modules linked in: i915 drm autofs4 sunrpc ip_conntrack_netbios_ns ipt_REJECT
xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables
cpufreq_ondemand video sbs ibm_acpi i2c_ec dock button battery asus_acpi ac ipv6
parport_pc lp parport snd_hda_intel snd_hda_codec joydev snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss
intel_rng snd_pcm i2c_i801 sg snd_timer serio_raw i2c_core snd e1000 soundcore
pcspkr snd_page_alloc ide_cd cdrom dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd
ahci libata sd_mod scsi_mod ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<c04ecce0>]    Not tainted VLI
EFLAGS: 00210282   (2.6.17-1.2573.fc6 #1)
EIP is at _raw_spin_lock+0x8/0xd9
eax: 00000010   ebx: 00000010   ecx: 00000000   edx: 00000000
esi: 00000010   edi: d0cbc000   ebp: c8ebdf00   esp: c8ebdef4
ds: 007b   es: 007b   ss: 0068
Process funit-fib-fork (pid: 30265, ti=c8ebd000 task=d0cbc000 task.ti=c8ebd000)
Stack: 00000010 00000000 d0cbc000 c8ebdf18 c0613aa2 00000000 00000002 c0454834
       e94a9674 c8ebdf3c c0454834 00000020 c8ebdf5c d0cbc000 c042b982 e94a9674
       00000020 d0cbc000 c8ebdf6c c045584a e94a9674 0003e768 c042741d e94a966c
Call Trace:
 [<c0613aa2>] _spin_lock+0x20/0x28
 [<c0454834>] remove_detached+0x20/0x8d
 [<c045584a>] utrace_report_death+0x1c6/0x1dd
 [<c0427435>] do_exit+0x70b/0x78c
 [<c042752f>] sys_exit_group+0x0/0x11
 [<00000005>] 0x5
 [<c0405379>] show_stack_log_lvl+0x8a/0x95
 [<c04054b1>] show_registers+0x12d/0x19a
 [<c04056ae>] die+0x190/0x293
 [<c0615005>] do_page_fault+0x3dc/0x4a4
 [<c0404be1>] error_code+0x39/0x40
 [<c0613aa2>] _spin_lock+0x20/0x28
 [<c0454834>] remove_detached+0x20/0x8d
 [<c045584a>] utrace_report_death+0x1c6/0x1dd
 [<c0427435>] do_exit+0x70b/0x78c
 [<c042752f>] sys_exit_group+0x0/0x11
 [<c042753e>] sys_exit_group+0xf/0x11
 [<c0403faf>] syscall_call+0x7/0xb
Code: 00 00 00 8b 45 f0 c7 43 04 ad 4e ad de c7 43 0c ff ff ff ff c7 43 08 ff ff
ff ff 89 03 5a 5b 5e 5f 5d c3 55 89 e5 57 56 89 c6 53 <81> 78 04 ad 4e ad de 74
0a ba 88 3f 64 c0 e8 0f fe ff ff 89 e0
EIP: [<c04ecce0>] _raw_spin_lock+0x8/0xd9 SS:ESP 0068:c8ebdef4
 <1>Fixing recursive fault but reboot is needed!
audit(1156171305.014:4): avc:  denied  { signal } for  pid=874 comm="TestRunner"
scontext=system_u:object_r:unlabeled_t:s0
tcontext=user_u:system_r:unconfined_t:s0 tclass=process
BUG: soft lockup detected on CPU#1!
 [<c04051ee>] show_trace_log_lvl+0x58/0x159
 [<c04057ea>] show_trace+0xd/0x10
 [<c0405903>] dump_stack+0x19/0x1b
 [<c0450c1f>] softlockup_tick+0xa5/0xb9
 [<c042d908>] run_local_timers+0x12/0x14
 [<c042dc87>] update_process_times+0x3c/0x61
 [<c0417cb6>] smp_apic_timer_interrupt+0x74/0x7e
 [<c0404b0a>] apic_timer_interrupt+0x2a/0x30
DWARF2 unwinder stuck at apic_timer_interrupt+0x2a/0x30
Leftover inexact backtrace:
 [<c04057ea>] show_trace+0xd/0x10
 [<c0405903>] dump_stack+0x19/0x1b
 [<c0450c1f>] softlockup_tick+0xa5/0xb9
 [<c042d908>] run_local_timers+0x12/0x14
 [<c042dc87>] update_process_times+0x3c/0x61
 [<c0417cb6>] smp_apic_timer_interrupt+0x74/0x7e
 [<c0404b0a>] apic_timer_interrupt+0x2a/0x30
 [<c0613aa2>] _spin_lock+0x20/0x28
 [<c0454e21>] utrace_detach+0x2c/0x78
 [<c042c525>] ptrace_exit+0x34/0xa6
 [<c0426e0f>] do_exit+0xe5/0x78c
 [<c042752f>] sys_exit_group+0x0/0x11
 [<c0430121>] get_signal_to_deliver+0x376/0x39e
 [<c0403584>] do_notify_resume+0x81/0x6f9
 [<c040408d>] work_notifysig+0x13/0x1a
BUG: spinlock lockup on CPU#1, TestRunner/883, 6b6b6b7b (Not tainted)
 [<c04051ee>] show_trace_log_lvl+0x58/0x159
 [<c04057ea>] show_trace+0xd/0x10
 [<c0405903>] dump_stack+0x19/0x1b
 [<c04ecd92>] _raw_spin_lock+0xba/0xd9
 [<c0613aa2>] _spin_lock+0x20/0x28
 [<c0454e21>] utrace_detach+0x2c/0x78
 [<c042c525>] ptrace_exit+0x34/0xa6
 [<c0426e0f>] do_exit+0xe5/0x78c
 [<c042752f>] sys_exit_group+0x0/0x11
 [<ef64c5fc>] 0xef64c5fc
DWARF2 unwinder stuck at 0xef64c5fc
Leftover inexact backtrace:
 [<c04057ea>] show_trace+0xd/0x10
 [<c0405903>] dump_stack+0x19/0x1b
 [<c04ecd92>] _raw_spin_lock+0xba/0xd9
 [<c0613aa2>] _spin_lock+0x20/0x28
 [<c0454e21>] utrace_detach+0x2c/0x78
 [<c042c525>] ptrace_exit+0x34/0xa6
 [<c0426e0f>] do_exit+0xe5/0x78c
 [<c042752f>] sys_exit_group+0x0/0x11
 [<c0430121>] get_signal_to_deliver+0x376/0x39e
 [<c0403584>] do_notify_resume+0x81/0x6f9
 [<c040408d>] work_notifysig+0x13/0x1a

Comment 1 Roland McGrath 2006-08-21 20:38:51 UTC
This is clearly my bug (utrace).  Chris should do the work on getting a reliable
reproducer I can use, and ideally figuring out the kernel-level sequence of
events that is triggering the bug.  Then reassign to me to fix it.

Comment 2 Andrew Cagney 2006-08-22 14:46:36 UTC
Upstream generic frysk bug:
http://sourceware.org/bugzilla/show_bug.cgi?id=3101

Comment 3 Mark Wielaard 2006-09-29 10:36:54 UTC
Tried to reproduce with latest FC6 development packages and current frysk CVS:

$ uname -a
Linux hermans.wildebeest.org 2.6.18-1.2699.fc6 #1 SMP Tue Sep 26 23:49:34 EDT
2006 i686 i686 i386 GNU/Linux

$ cat common/version.in
0.0.1.2006.09.29

Could not reproduce it with the above setup. No suspecious kernel messages in
dmesg output either.

The testsuite (make check -k) seems to hand at:
Running testManyExistingThreadDetached(frysk.proc.TestProcTasksObserver) ...
But that is a different issue.

Comment 4 Chris Moller 2006-10-03 18:04:30 UTC
This bug can't be reproduced even by the original reporter and seems to have
been a transient.


Note You need to log in before you can comment on or make changes to this bug.