Bug 557380

Summary: Kernel panic due to recursive lock in 3c59x driver.
Product: Red Hat Enterprise Linux 4 Reporter: Vitaly Mayatskikh <vmayatsk>
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED ERRATA QA Contact: Network QE <network-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 4.8CC: cww, dhoward, dtian, jburke, kzhang, nhorman, yugzhang
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-16 16:05:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 648407    
Attachments:
Description Flags
Console log
none
patch to prevent tx recursion
none
Panic on January 12
none
panic on January 13th none

Description Vitaly Mayatskikh 2010-01-21 09:06:50 UTC
Created attachment 385880 [details]
Console log

During regular testing machine failed in network driver for 3Com NIC:

eth0: Too much work in interrupt, status 8401.
<0>Kernel panic - not syncing: drivers/net/3c59x.c:2265: spin_lock(drivers/net/3c59x.c:dfe2b1f4) already locked by drivers/net/3c59x.c/2419

See attachment for full console log.

Comment 1 Neil Horman 2010-01-26 14:12:45 UTC
Created attachment 386835 [details]
patch to prevent tx recursion

please give this patch a try, and let me know the results.  Thanks!

Comment 2 Vitaly Mayatskikh 2010-02-09 12:40:46 UTC
Have no idea, how to test it :( Any thoughts?

Comment 7 Neil Horman 2010-08-09 14:42:57 UTC
sigh, Vitaly is no longer with us.  I'll test this myself

Comment 8 Neil Horman 2010-08-09 17:29:49 UTC
I've sent this upstream for review.

Comment 9 Neil Horman 2010-08-11 13:12:02 UTC
Davem didn't want this patch for upstream, so its back to the drawing board here.

Comment 10 Neil Horman 2010-08-11 15:07:33 UTC
sent a new patch attempt upstream

Comment 11 Don Howard 2010-09-15 17:59:17 UTC
Another instance of this bug in testing:
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=16977668

Neil: 
Has your latest patch been accepted upstream?

Comment 12 Neil Horman 2010-09-15 19:44:16 UTC
sure has:
http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=aa25ab7d943a5e1e6bcc2a65ff6669144f5b5d60

I've also posted it internally for this bug.

Comment 14 Vivek Goyal 2010-10-05 15:49:11 UTC
Committed in 89.39.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 24 Dayong Tian 2011-01-12 10:13:42 UTC
Created attachment 472983 [details]
Panic on January 12

Comment 26 Neil Horman 2011-01-12 12:25:21 UTC
No, thats a completely different crash, If its reproducible I'd open up a new bug

Comment 27 Dayong Tian 2011-01-13 03:02:58 UTC
Created attachment 473187 [details]
panic on January 13th

Comment 28 Dayong Tian 2011-01-13 03:06:51 UTC
(In reply to comment #26)
> No, thats a completely different crash, If its reproducible I'd open up a new
> bug

Hi Neil, it's reproducible on machine cpq-dl380-01.rhts.eng.bos.redhat.com with kernel 2.6.9-89.ELsmp:
--------------------
Code:  Bad EIP value.
Unable to handle kernel paging request at virtual address e09074e2
 printing eip:
e09074e2
*pde = 00000000
Recursive die() failure, output suppressed
 <0>Fatal exception: panic in 5 seconds

Kernel panic - not syncing: Fatal exception
------------[ cut here ]------------
kernel BUG at kernel/panic.c:77!
invalid operand: 0000 [#3]
SMP 
Modules linked in: netconsole netdump md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc cpufreq_powersave loop button battery ac e100 3c59x mii floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod cpqarray sd_mod scsi_mod
CPU:    1
EIP:    0060:[<c0122d0a>]    Not tainted VLI
EFLAGS: 00010286   (2.6.9-89.ELsmp) 
EIP is at panic+0x47/0x166
eax: 0000002f   ebx: d87df000   ecx: d87dfb40   edx: c02ef568
esi: c02e7c13   edi: c02e7bc7   ebp: c02ef1e7   esp: d87dfb48
ds: 007b   es: 007b   ss: 0068
Process bash (pid: 5305, threadinfo=d87df000 task=deb6c790)
Stack: d87df000 c01060d0 c02e7c3b 00004890 c0123745 c0425cf3 00000006 00000013 
       c012365d c02ef17e 00000000 c02ef17e 00000000 c0007820 00007820 c011bad9 
       c02ef1d6 00000000 c02f396f e09074e2 c02ef1c3 c02ef1a8 e09074e2 00000000 
Call Trace:
 [<c01060d0>] die+0x164/0x16b
 [<c0123745>] release_console_sem+0x75/0xa9
 [<c012365d>] vprintk+0x136/0x14a
 [<c011bad9>] do_page_fault+0x3f0/0x5c6
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<c01d4859>] vgacon_scroll+0x182/0x199
 [<c020e0d5>] scrup+0x63/0xce
 [<c020e6e3>] complement_pos+0x12/0x144
 [<c020eb8a>] set_cursor+0x62/0x6e
 [<c0211bc4>] vt_console_print+0x286/0x2a5
 [<c021193e>] vt_console_print+0x0/0x2a5
 [<c0123307>] __call_console_drivers+0x36/0x40
 [<c012341f>] call_console_drivers+0xb6/0xd8
 [<c011b6e9>] do_page_fault+0x0/0x5c6
 [<c02de3db>] error_code+0x2f/0x38
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<c0135a0a>] try_crashdump+0x31/0x33
 [<c010604e>] die+0xe2/0x16b
 [<c012365d>] vprintk+0x136/0x14a
 [<c011bad9>] do_page_fault+0x3f0/0x5c6
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<c01d4830>] vgacon_scroll+0x159/0x199
 [<c01d4859>] vgacon_scroll+0x182/0x199
 [<c020e0d5>] scrup+0x63/0xce
 [<c020e6e3>] complement_pos+0x12/0x144
 [<c020eb8a>] set_cursor+0x62/0x6e
 [<c0211bc4>] vt_console_print+0x286/0x2a5
 [<c021193e>] vt_console_print+0x0/0x2a5
 [<c0123307>] __call_console_drivers+0x36/0x40
 [<c012341f>] call_console_drivers+0xb6/0xd8
 [<c011b6e9>] do_page_fault+0x0/0x5c6
 [<c02de3db>] error_code+0x2f/0x38
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<c0135a0a>] try_crashdump+0x31/0x33
 [<c010604e>] die+0xe2/0x16b
 [<c012365d>] vprintk+0x136/0x14a
 [<c011bad9>] do_page_fault+0x3f0/0x5c6
 [<c0213238>] sysrq_handle_crash+0x0/0x8
 [<c011dc49>] try_to_wake_up+0x288/0x293
 [<c012ac01>] __mod_timer+0x101/0x10b
 [<c021290b>] poke_blanked_console+0xa1/0xac
 [<c0211bd2>] vt_console_print+0x294/0x2a5
 [<c021193e>] vt_console_print+0x0/0x2a5
 [<c0123307>] __call_console_drivers+0x36/0x40
 [<c011b6e9>] do_page_fault+0x0/0x5c6
 [<c02de3db>] error_code+0x2f/0x38
 [<c0213238>] sysrq_handle_crash+0x0/0x8
 [<c02133d0>] __handle_sysrq+0x62/0xd9
 [<c018f50c>] write_sysrq_trigger+0x23/0x29
 [<c015d5a7>] vfs_write+0xb6/0xe2
 [<c015d671>] sys_write+0x3c/0x62
 [<c02dd8e3>] syscall_call+0x7/0xb
 [<c02d007b>] xfrm_sk_policy_lookup+0xc1/0x3ca
--------------------
Please refer to attachment 473187 [details] for detailed info.

Comment 29 Dayong Tian 2011-01-13 09:12:28 UTC
Hi Neil, I reproduced the panic with kernel 2.6.9-96.ELsmp, and filed bug 669302. You know when I tried to reproduce the panic described in this bug I always got the panic described in bug 669302, how should I verify this bug? Thanks!

Comment 30 Neil Horman 2011-01-13 12:13:32 UTC
This bug has nothing to do with netdump (which it appears is how you are trying to verify it).  If you want to verify it, setup netconsole, stream messages accross it by generating printk events in the kernel and observe that the system doesn't lock up.

Comment 32 errata-xmlrpc 2011-02-16 16:05:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html