Created attachment 385880 [details] Console log During regular testing machine failed in network driver for 3Com NIC: eth0: Too much work in interrupt, status 8401. <0>Kernel panic - not syncing: drivers/net/3c59x.c:2265: spin_lock(drivers/net/3c59x.c:dfe2b1f4) already locked by drivers/net/3c59x.c/2419 See attachment for full console log.
Created attachment 386835 [details] patch to prevent tx recursion please give this patch a try, and let me know the results. Thanks!
Have no idea, how to test it :( Any thoughts?
sigh, Vitaly is no longer with us. I'll test this myself
I've sent this upstream for review.
Davem didn't want this patch for upstream, so its back to the drawing board here.
sent a new patch attempt upstream
Another instance of this bug in testing: http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=16977668 Neil: Has your latest patch been accepted upstream?
sure has: http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=aa25ab7d943a5e1e6bcc2a65ff6669144f5b5d60 I've also posted it internally for this bug.
Committed in 89.39.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
Created attachment 472983 [details] Panic on January 12
No, thats a completely different crash, If its reproducible I'd open up a new bug
Created attachment 473187 [details] panic on January 13th
(In reply to comment #26) > No, thats a completely different crash, If its reproducible I'd open up a new > bug Hi Neil, it's reproducible on machine cpq-dl380-01.rhts.eng.bos.redhat.com with kernel 2.6.9-89.ELsmp: -------------------- Code: Bad EIP value. Unable to handle kernel paging request at virtual address e09074e2 printing eip: e09074e2 *pde = 00000000 Recursive die() failure, output suppressed <0>Fatal exception: panic in 5 seconds Kernel panic - not syncing: Fatal exception ------------[ cut here ]------------ kernel BUG at kernel/panic.c:77! invalid operand: 0000 [#3] SMP Modules linked in: netconsole netdump md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc cpufreq_powersave loop button battery ac e100 3c59x mii floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod cpqarray sd_mod scsi_mod CPU: 1 EIP: 0060:[<c0122d0a>] Not tainted VLI EFLAGS: 00010286 (2.6.9-89.ELsmp) EIP is at panic+0x47/0x166 eax: 0000002f ebx: d87df000 ecx: d87dfb40 edx: c02ef568 esi: c02e7c13 edi: c02e7bc7 ebp: c02ef1e7 esp: d87dfb48 ds: 007b es: 007b ss: 0068 Process bash (pid: 5305, threadinfo=d87df000 task=deb6c790) Stack: d87df000 c01060d0 c02e7c3b 00004890 c0123745 c0425cf3 00000006 00000013 c012365d c02ef17e 00000000 c02ef17e 00000000 c0007820 00007820 c011bad9 c02ef1d6 00000000 c02f396f e09074e2 c02ef1c3 c02ef1a8 e09074e2 00000000 Call Trace: [<c01060d0>] die+0x164/0x16b [<c0123745>] release_console_sem+0x75/0xa9 [<c012365d>] vprintk+0x136/0x14a [<c011bad9>] do_page_fault+0x3f0/0x5c6 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump] [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump] [<c01d4859>] vgacon_scroll+0x182/0x199 [<c020e0d5>] scrup+0x63/0xce [<c020e6e3>] complement_pos+0x12/0x144 [<c020eb8a>] set_cursor+0x62/0x6e [<c0211bc4>] vt_console_print+0x286/0x2a5 [<c021193e>] vt_console_print+0x0/0x2a5 [<c0123307>] __call_console_drivers+0x36/0x40 [<c012341f>] call_console_drivers+0xb6/0xd8 [<c011b6e9>] do_page_fault+0x0/0x5c6 [<c02de3db>] error_code+0x2f/0x38 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump] [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump] [<c0135a0a>] try_crashdump+0x31/0x33 [<c010604e>] die+0xe2/0x16b [<c012365d>] vprintk+0x136/0x14a [<c011bad9>] do_page_fault+0x3f0/0x5c6 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump] [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump] [<c01d4830>] vgacon_scroll+0x159/0x199 [<c01d4859>] vgacon_scroll+0x182/0x199 [<c020e0d5>] scrup+0x63/0xce [<c020e6e3>] complement_pos+0x12/0x144 [<c020eb8a>] set_cursor+0x62/0x6e [<c0211bc4>] vt_console_print+0x286/0x2a5 [<c021193e>] vt_console_print+0x0/0x2a5 [<c0123307>] __call_console_drivers+0x36/0x40 [<c012341f>] call_console_drivers+0xb6/0xd8 [<c011b6e9>] do_page_fault+0x0/0x5c6 [<c02de3db>] error_code+0x2f/0x38 [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump] [<e09074e2>] netpoll_start_netdump+0x0/0xf8 [netdump] [<c0135a0a>] try_crashdump+0x31/0x33 [<c010604e>] die+0xe2/0x16b [<c012365d>] vprintk+0x136/0x14a [<c011bad9>] do_page_fault+0x3f0/0x5c6 [<c0213238>] sysrq_handle_crash+0x0/0x8 [<c011dc49>] try_to_wake_up+0x288/0x293 [<c012ac01>] __mod_timer+0x101/0x10b [<c021290b>] poke_blanked_console+0xa1/0xac [<c0211bd2>] vt_console_print+0x294/0x2a5 [<c021193e>] vt_console_print+0x0/0x2a5 [<c0123307>] __call_console_drivers+0x36/0x40 [<c011b6e9>] do_page_fault+0x0/0x5c6 [<c02de3db>] error_code+0x2f/0x38 [<c0213238>] sysrq_handle_crash+0x0/0x8 [<c02133d0>] __handle_sysrq+0x62/0xd9 [<c018f50c>] write_sysrq_trigger+0x23/0x29 [<c015d5a7>] vfs_write+0xb6/0xe2 [<c015d671>] sys_write+0x3c/0x62 [<c02dd8e3>] syscall_call+0x7/0xb [<c02d007b>] xfrm_sk_policy_lookup+0xc1/0x3ca -------------------- Please refer to attachment 473187 [details] for detailed info.
Hi Neil, I reproduced the panic with kernel 2.6.9-96.ELsmp, and filed bug 669302. You know when I tried to reproduce the panic described in this bug I always got the panic described in bug 669302, how should I verify this bug? Thanks!
This bug has nothing to do with netdump (which it appears is how you are trying to verify it). If you want to verify it, setup netconsole, stream messages accross it by generating printk events in the kernel and observe that the system doesn't lock up.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0263.html