Bug 55511 - kernel panic when using 3com 3c996-T gigabit (bcm5700 module) under full load
Summary: kernel panic when using 3com 3c996-T gigabit (bcm5700 module) under full load
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.1
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-11-01 15:32 UTC by Axel Kohlmeyer
Modified: 2007-04-18 16:37 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2001-11-01 15:32:56 UTC
Embargoed:


Attachments (Terms of Use)

Description Axel Kohlmeyer 2001-11-01 15:32:51 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.77 [en] (X11; U; Linux 2.4.9-6 i686)

Description of problem:
We have a cluster of 12 machines (Athlon TB1.33GHz, Asus A7V133,  768MB
PC-133 RAM (Infineon), 3c996-T, 3Com 3c17700 Gigabit Switch) for scientific
calculations with a self compiled scientific software using LAM-MPI.

When running parallel jobs some of the nodes crash frequently 
with a kernel panic, even when we configured the switch to run
only with 100MBit-FD. On the other hand, heavy network usage
without high cpu load produced no problems.


Version-Release number of selected component (if applicable):
kernel-2.4.9-6

How reproducible:
Sometimes

Steps to Reproduce:
Unfortunately, we cannot give access to our program
and we have not found an alternative way to reproduce the 
kernel panic, yet.


Additional info:

Ksymoops output from a serial console:

cat ~/parker.oops
ksymoops 2.4.0 on i686 2.4.9-6.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.9-6/ (default)
     -m /boot/System.map-2.4.9-6 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Warning (compare_maps): mismatch on symbol partition_name  , ksyms_base
says c01b61e0, System.map says c0157120.  Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol nlmsvc_grace_period  , lockd
says f093fa94, /lib/modules/2.4.9-6/kernel/fs/lockd/lockd.o says f093eefc. 
Ignoring /lib/modules/2.4.9-6/kernel/fs/lockd/lockd.o entry
Warning (compare_maps): mismatch on symbol nlmsvc_ops  , lockd says
f093fa90, /lib/modules/2.4.9-6/kernel/fs/lockd/lockd.o says f093eef8. 
Ignoring /lib/modules/2.4.9-6/kernel/fs/lockd/lockd.o entry
Warning (compare_maps): mismatch on symbol nlmsvc_timeout  , lockd says
f093fa98, /lib/modules/2.4.9-6/kernel/fs/lockd/lockd.o says f093ef00. 
Ignoring /lib/modules/2.4.9-6/kernel/fs/lockd/lockd.o entry
Warning (compare_maps): mismatch on symbol nfs_debug  , sunrpc says
f0931b00, /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o says f09317e0. 
Ignoring /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol nfsd_debug  , sunrpc says
f0931b04, /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o says f09317e4. 
Ignoring /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol nlm_debug  , sunrpc says
f0931b08, /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o says f09317e8. 
Ignoring /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol rpc_debug  , sunrpc says
f0931afc, /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o says f09317dc. 
Ignoring /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol rpc_garbage_args  , sunrpc says
f0931adc, /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o says f09317bc. 
Ignoring /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol rpc_success  , sunrpc says
f0931acc, /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o says f09317ac. 
Ignoring /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol rpc_system_err  , sunrpc says
f0931ae0, /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o says f09317c0. 
Ignoring /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol xdr_one  , sunrpc says f0931ac4,
/lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o says f09317a4.  Ignoring
/lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol xdr_two  , sunrpc says f0931ac8,
/lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o says f09317a8.  Ignoring
/lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol xdr_zero  , sunrpc says
f0931ac0, /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o says f09317a0. 
Ignoring /lib/modules/2.4.9-6/kernel/net/sunrpc/sunrpc.o entry
Unable to handle kernel paging request at virtual address 2faadb65
f09044c0
*pde = 00000000
Oops: 0002
CPU:    0
EIP:    0010:[<f09044c0>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000   ebx: ef160140   ecx: 2faadb65   edx: ef1498c0
esi: 00000182   edi: ef120000   ebp: c02e9fa8   esp: c02e9f44
ds: 0018   es: 0018   ss: 0018
Process swapper (pid: 0, stackpage=c02e9000)
Stack: ef160140 00000000 ef160000 f08fd550 ef160140 efa45b60 04000001
0000000c 
       c010825a 0000000c ef160000 c02e9fa8 c02e9fa8 0000000c c0322a80
efa45b60 
       c01083d8 0000000c c02e9fa8 efa45b60 c0105390 c02e8000 c02e8000
c0105390 
Call Trace: [<f08fd550>] bcm5700_probe [bcm5700] 0xfb0 
[<c010825a>] handle_IRQ_event [kernel] 0x3a 
[<c01083d8>] do_IRQ [kernel] 0x68 
[<c0105390>] default_idle [kernel] 0x0 
[<c0105390>] default_idle [kernel] 0x0 
[<c020eccc>] call_do_IRQ [kernel] 0x5 
[<c0105390>] default_idle [kernel] 0x0 
[<c0105390>] default_idle [kernel] 0x0 
[<c01053b3>] default_idle [kernel] 0x23 
[<c0105432>] cpu_idle [kernel] 0x52 
[<c0105000>] stext [kernel] 0x0 
Code: c7 01 00 00 00 00 0f b7 42 08 0f b7 c0 83 e8 04 89 41 04 0f 
>>EIP; f09044c0 <[bcm5700]LM_ServiceInterrupts+160/2e0>   <=====
Trace; f08fd550 <[bcm5700]bcm5700_interrupt+f0/1d0>
Trace; c010825a <handle_IRQ_event+3a/70>
Trace; c01083d8 <do_IRQ+68/b0>
Trace; c0105390 <default_idle+0/30>
Trace; c0105390 <default_idle+0/30>
Trace; c020eccc <call_do_IRQ+5/d>
Trace; c0105390 <default_idle+0/30>
Trace; c0105390 <default_idle+0/30>
Trace; c01053b3 <default_idle+23/30>
Trace; c0105432 <cpu_idle+52/70>
Trace; c0105000 <_stext+0/0>
Code;  f09044c0 <[bcm5700]LM_ServiceInterrupts+160/2e0>
00000000 <_EIP>:
Code;  f09044c0 <[bcm5700]LM_ServiceInterrupts+160/2e0>   <=====
   0:   c7 01 00 00 00 00         movl   $0x0,(%ecx)   <=====
Code;  f09044c6 <[bcm5700]LM_ServiceInterrupts+166/2e0>
   6:   0f b7 42 08               movzwl 0x8(%edx),%eax
Code;  f09044ca <[bcm5700]LM_ServiceInterrupts+16a/2e0>
   a:   0f b7 c0                  movzwl %ax,%eax
Code;  f09044cd <[bcm5700]LM_ServiceInterrupts+16d/2e0>
   d:   83 e8 04                  sub    $0x4,%eax
Code;  f09044d0 <[bcm5700]LM_ServiceInterrupts+170/2e0>
  10:   89 41 04                  mov    %eax,0x4(%ecx)
Code;  f09044d3 <[bcm5700]LM_ServiceInterrupts+173/2e0>
  13:   0f 00 00                  sldt   (%eax)

 <0>Kernel panic: Aiee, killing interrupt handler!

15 warnings issued.  Results may not be reliable.

Comment 1 Axel Kohlmeyer 2002-02-19 12:33:57 UTC
Tweaking the BIOS Settings made the problem reliably go away.

I changed: 
- Spread Spectrum          -> disabled
- Byte Merge               -> enabled
- PCI Master Read Caching  -> enabled
- Delayed Transaction      -> enabled
- PCI to DRAM Prefetch     -> enabled

we are now running totally solid for weeks, and since
two weeks even with the PCI bus overclocked to 37 MHz.



Note You need to log in before you can comment on or make changes to this bug.