Bug 557380 - Kernel panic due to recursive lock in 3c59x driver.
Summary: Kernel panic due to recursive lock in 3c59x driver.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.8
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Neil Horman
QA Contact: Network QE
URL:
Whiteboard:
Depends On:
Blocks: 648407
TreeView+ depends on / blocked
 
Reported: 2010-01-21 09:06 UTC by Vitaly Mayatskikh
Modified: 2011-02-16 16:05 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-02-16 16:05:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Console log (150.88 KB, text/plain)
2010-01-21 09:06 UTC, Vitaly Mayatskikh
no flags Details
patch to prevent tx recursion (922 bytes, patch)
2010-01-26 14:12 UTC, Neil Horman
no flags Details | Diff
Panic on January 12 (3.29 KB, text/plain)
2011-01-12 10:13 UTC, Dayong Tian
no flags Details
panic on January 13th (7.17 KB, text/plain)
2011-01-13 03:02 UTC, Dayong Tian
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0263 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.9 kernel security and bug fix update 2011-02-16 15:14:55 UTC

Description Vitaly Mayatskikh 2010-01-21 09:06:50 UTC
Created attachment 385880 [details]
Console log

During regular testing machine failed in network driver for 3Com NIC:

eth0: Too much work in interrupt, status 8401.
<0>Kernel panic - not syncing: drivers/net/3c59x.c:2265: spin_lock(drivers/net/3c59x.c:dfe2b1f4) already locked by drivers/net/3c59x.c/2419

See attachment for full console log.

Comment 1 Neil Horman 2010-01-26 14:12:45 UTC
Created attachment 386835 [details]
patch to prevent tx recursion

please give this patch a try, and let me know the results.  Thanks!

Comment 2 Vitaly Mayatskikh 2010-02-09 12:40:46 UTC
Have no idea, how to test it :( Any thoughts?

Comment 7 Neil Horman 2010-08-09 14:42:57 UTC
sigh, Vitaly is no longer with us.  I'll test this myself

Comment 8 Neil Horman 2010-08-09 17:29:49 UTC
I've sent this upstream for review.

Comment 9 Neil Horman 2010-08-11 13:12:02 UTC
Davem didn't want this patch for upstream, so its back to the drawing board here.

Comment 10 Neil Horman 2010-08-11 15:07:33 UTC
sent a new patch attempt upstream

Comment 11 Don Howard 2010-09-15 17:59:17 UTC
Another instance of this bug in testing:
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=16977668

Neil: 
Has your latest patch been accepted upstream?

Comment 12 Neil Horman 2010-09-15 19:44:16 UTC
sure has:
http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=aa25ab7d943a5e1e6bcc2a65ff6669144f5b5d60

I've also posted it internally for this bug.

Comment 14 Vivek Goyal 2010-10-05 15:49:11 UTC
Committed in 89.39.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 24 Dayong Tian 2011-01-12 10:13:42 UTC
Created attachment 472983 [details]
Panic on January 12

Comment 26 Neil Horman 2011-01-12 12:25:21 UTC
No, thats a completely different crash, If its reproducible I'd open up a new bug

Comment 27 Dayong Tian 2011-01-13 03:02:58 UTC
Created attachment 473187 [details]
panic on January 13th

Comment 28 Dayong Tian 2011-01-13 03:06:51 UTC
(In reply to comment #26)
> No, thats a completely different crash, If its reproducible I'd open up a new
> bug

Hi Neil, it's reproducible on machine cpq-dl380-01.rhts.eng.bos.redhat.com with kernel 2.6.9-89.ELsmp:
--------------------
Code:  Bad EIP value.
Unable to handle kernel paging request at virtual address e09074e2
 printing eip:
e09074e2
*pde = 00000000
Recursive die() failure, output suppressed
 <0>Fatal exception: panic in 5 seconds

Kernel panic - not syncing: Fatal exception
------------[ cut here ]------------
kernel BUG at kernel/panic.c:77!
invalid operand: 0000 [#3]
SMP 
Modules linked in: netconsole netdump md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc cpufreq_powersave loop button battery ac e100 3c59x mii floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod cpqarray sd_mod scsi_mod
CPU:    1
EIP:    0060:[<c0122d0a>]    Not tainted VLI
EFLAGS: 00010286   (2.6.9-89.ELsmp) 
EIP is at panic+0x47/0x166
eax: 0000002f   ebx: d87df000   ecx: d87dfb40   edx: c02ef568
esi: c02e7c13   edi: c02e7bc7   ebp: c02ef1e7   esp: d87dfb48
ds: 007b   es: 007b   ss: 0068
Process bash (pid: 5305, threadinfo=d87df000 task=deb6c790)
Stack: d87df000 c01060d0 c02e7c3b 00004890 c0123745 c0425cf3 00000006 00000013 
       c012365d c02ef17e 00000000 c02ef17e 00000000 c0007820 00007820 c011bad9 
       c02ef1d6 00000000 c02f396f e09074e2 c02ef1c3 c02ef1a8 e09074e2 00000000 
Call Trace:
 [<c01060d0>] die+0x164/0x16b
 [<c0123745>] release_console_sem+0x75/0xa9
 [<c012365d>] vprintk+0x136/0x14a
 [<c011bad9>] do_page_fault+0x3f0/0x5c6
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<c01d4859>] vgacon_scroll+0x182/0x199
 [<c020e0d5>] scrup+0x63/0xce
 [<c020e6e3>] complement_pos+0x12/0x144
 [<c020eb8a>] set_cursor+0x62/0x6e
 [<c0211bc4>] vt_console_print+0x286/0x2a5
 [<c021193e>] vt_console_print+0x0/0x2a5
 [<c0123307>] __call_console_drivers+0x36/0x40
 [<c012341f>] call_console_drivers+0xb6/0xd8
 [<c011b6e9>] do_page_fault+0x0/0x5c6
 [<c02de3db>] error_code+0x2f/0x38
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<c0135a0a>] try_crashdump+0x31/0x33
 [<c010604e>] die+0xe2/0x16b
 [<c012365d>] vprintk+0x136/0x14a
 [<c011bad9>] do_page_fault+0x3f0/0x5c6
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<c01d4830>] vgacon_scroll+0x159/0x199
 [<c01d4859>] vgacon_scroll+0x182/0x199
 [<c020e0d5>] scrup+0x63/0xce
 [<c020e6e3>] complement_pos+0x12/0x144
 [<c020eb8a>] set_cursor+0x62/0x6e
 [<c0211bc4>] vt_console_print+0x286/0x2a5
 [<c021193e>] vt_console_print+0x0/0x2a5
 [<c0123307>] __call_console_drivers+0x36/0x40
 [<c012341f>] call_console_drivers+0xb6/0xd8
 [<c011b6e9>] do_page_fault+0x0/0x5c6
 [<c02de3db>] error_code+0x2f/0x38
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump]
 [<c0135a0a>] try_crashdump+0x31/0x33
 [<c010604e>] die+0xe2/0x16b
 [<c012365d>] vprintk+0x136/0x14a
 [<c011bad9>] do_page_fault+0x3f0/0x5c6
 [<c0213238>] sysrq_handle_crash+0x0/0x8
 [<c011dc49>] try_to_wake_up+0x288/0x293
 [<c012ac01>] __mod_timer+0x101/0x10b
 [<c021290b>] poke_blanked_console+0xa1/0xac
 [<c0211bd2>] vt_console_print+0x294/0x2a5
 [<c021193e>] vt_console_print+0x0/0x2a5
 [<c0123307>] __call_console_drivers+0x36/0x40
 [<c011b6e9>] do_page_fault+0x0/0x5c6
 [<c02de3db>] error_code+0x2f/0x38
 [<c0213238>] sysrq_handle_crash+0x0/0x8
 [<c02133d0>] __handle_sysrq+0x62/0xd9
 [<c018f50c>] write_sysrq_trigger+0x23/0x29
 [<c015d5a7>] vfs_write+0xb6/0xe2
 [<c015d671>] sys_write+0x3c/0x62
 [<c02dd8e3>] syscall_call+0x7/0xb
 [<c02d007b>] xfrm_sk_policy_lookup+0xc1/0x3ca
--------------------
Please refer to attachment 473187 [details] for detailed info.

Comment 29 Dayong Tian 2011-01-13 09:12:28 UTC
Hi Neil, I reproduced the panic with kernel 2.6.9-96.ELsmp, and filed bug 669302. You know when I tried to reproduce the panic described in this bug I always got the panic described in bug 669302, how should I verify this bug? Thanks!

Comment 30 Neil Horman 2011-01-13 12:13:32 UTC
This bug has nothing to do with netdump (which it appears is how you are trying to verify it).  If you want to verify it, setup netconsole, stream messages accross it by generating printk events in the kernel and observe that the system doesn't lock up.

Comment 32 errata-xmlrpc 2011-02-16 16:05:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0263.html


Note You need to log in before you can comment on or make changes to this bug.