Bug 194055

Summary: netdump not working with 3c59x card
Product: Red Hat Enterprise Linux 4 Reporter: William Cohen <wcohen>
Component: kernelAssignee: Neil Horman <nhorman>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: bugproxy, dchapman, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0304 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-08 01:46:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 198694    
Attachments:
Description Flags
This is the upstream fix for the netpoll recursion problem
none
rhel4 port of the recursion patch none

Description William Cohen 2006-06-05 14:05:17 UTC
Description of problem:

Netdump not working with 3c59x card in machine. The same machine generating a
netdump with e1000 with Neil Horman's patched kernel. Get following kernel panic
after the handshake message with 3c59x card:

<0>Kernel panic - not syncing: drivers/net/3c59x.c:2265: spin_lock(drivers/net/9

Version-Release number of selected component (if applicable):

#uname -a
Linux slingshot.devel.redhat.com 2.6.9-39.EL.bz193688 #1 Fri Jun 2 15:12:49 EDT
2006 i686 athlon i386 GNU/Linux


How reproducible:

Always


Steps to Reproduce:
1. Set up netdump client and server. Verified that worked with e1000 (or other
working) ethernet card.
2. Replace ethernet card with 3c59x card
3. force dump with "echo c > /proc/sysrq-trigger"
  
Actual results:

A kernel panic other than the one from "echo c > /proc/sysrq-trigger" which
kills netdump on the client machine.

Expected results:

Get a dump on the server.

Additional info:

Output from the netdump client machine:

<1>kernel BUG at kernel/panic.c:75!
<1>invalid operand: 0000 [#3]
Modules linked in: md5 ipv6 parport_pc lp parport netconsole netdump autofs4
i2dCPU:    0
EIP:    0060:[<c0123db2>]    Not tainted VLI
EFLAGS: 00010282   (2.6.9-39.EL.bz193688)
EIP is at panic+0x47/0x142
eax: 0000003c   ebx: c03ff000   ecx: c032a506   edx: c03ffe68
esi: 00000000   edi: c03230bd   ebp: c03fefbc   esp: c03ffe70
ds: 007b   es: 007b   ss: 0068
Process bash (pid: 3997, threadinfo=c03ff000 task=d8b1c600)
Stack: c03ff000 c0106926 c032305b e0839600 00000246 c0123db2 00000246 c0123db2
       00000000 c03ffecc c0139dcb c03fff64 00000000 c03ffecc c0106c7f c03fff64
       c03230bd 00000000 00000006 00000004 00000002 c03fff3b ffffffff 00000004
Call Trace:
 [<c0106926>] die+0x224/0x22b
 [<c0123db2>] panic+0x47/0x142
 [<c0123db2>] panic+0x47/0x142
 [<c0139dcb>] search_exception_tables+0x1f/0x21
 [<c0106c7f>] do_invalid_op+0xcf/0xf2
 [<c0123db2>] panic+0x47/0x142
 [<c011fbba>] scheduler_tick+0x1b/0x4aa
 [<c011f480>] activate_task+0x53/0x5f
 [<c012008b>] __wake_up_common+0x36/0x51
 [<c01ea6b9>] __delay+0x9/0xa
 [<c0250701>] serial8250_console_write+0x16c/0x1b2
 [<c0106bb0>] do_invalid_op+0x0/0xf2
 [<c0319713>] error_code+0x2f/0x38
 [<c0123db2>] panic+0x47/0x142
 [<e089b995>] boomerang_interrupt+0x94/0x3fc [3c59x]
 [<c0107ec8>] handle_IRQ_event+0x25/0x4f
 [<c0108896>] do_IRQ+0x18a/0x2bf
 =======================
 [<c0319654>] common_interrupt+0x18/0x20
 [<c0129a84>] __do_softirq+0x2c/0x79
 [<c010940e>] do_softirq+0x46/0x4d
 =======================
 [<c0119364>] smp_apic_timer_interrupt+0x8d/0x8f
 [<c0319676>] apic_timer_interrupt+0x1a/0x20
 [<c011aa6e>] disable_IO_APIC+0x0/0xa
 [<c0117271>] machine_restart+0x17/0x83
 [<e09ee6b0>] netpoll_start_netdump+0x36/0xd4 [netdump]
 [<c013d37f>] try_crashdump+0x31/0x33
 [<c0106843>] die+0x141/0x22b
 [<c0139dcb>] search_exception_tables+0x1f/0x21
 [<c0106c7f>] do_invalid_op+0xcf/0xf2
 [<c0123db2>] panic+0x47/0x142
 [<c01ea6b9>] __delay+0x9/0xa
 [<c0250701>] serial8250_console_write+0x16c/0x1b2
 [<c0106bb0>] do_invalid_op+0x0/0xf2
 [<c0319713>] error_code+0x2f/0x38
 [<c0123db2>] panic+0x47/0x142
 [<e089b42a>] boomerang_start_xmit+0x282/0x3a0 [3c59x]
 [<c02c8f0f>] netpoll_send_skb+0x186/0x22f
 [<c02c94d4>] arp_reply+0x2da/0x2e2
 [<c02c9537>] __netpoll_rx+0x5b/0x2ab
 [<c02bcd97>] netif_rx+0xb3/0x26e
 [<e089c22e>] boomerang_rx+0x284/0x3f3 [3c59x]
 [<e089b901>] boomerang_interrupt+0x0/0x3fc [3c59x]
 [<e089ba95>] boomerang_interrupt+0x194/0x3fc [3c59x]
 [<e089b901>] boomerang_interrupt+0x0/0x3fc [3c59x]
 [<e0899028>] poll_vortex+0x28/0x2d [3c59x]
 [<c02c8a5f>] netpoll_poll_dev+0x18/0x30
 [<e09ee5e2>] netdump_startup_handshake+0x7f/0x10d [netdump]
 [<c01ea6b9>] __delay+0x9/0xa
 [<c0250701>] serial8250_console_write+0x16c/0x1b2
 [<e09e9000>] write_msg+0x0/0x16d [netconsole]
 [<c012466a>] crashdump_call_console_drivers+0x27/0x31
 [<e09ee7cd>] netpoll_netdump+0x7f/0x478 [netdump]
 [<c023bde3>] sysrq_handle_crash+0x0/0x8
 [<e09ee74e>] netpoll_netdump+0x0/0x478 [netdump]
 [<e09ee745>] netpoll_start_netdump+0xcb/0xd4 [netdump]
 =======================
 [<c013d37f>] try_crashdump+0x31/0x33
 [<c0106843>] die+0x141/0x22b
 [<c011db59>] do_page_fault+0x380/0x4dc
 [<c023bde3>] sysrq_handle_crash+0x0/0x8
 [<c018bb38>] inode_setattr+0x18c/0x195
 [<c02c8f0f>] netpoll_send_skb+0x186/0x22f
 [<e09e9000>] write_msg+0x0/0x16d [netconsole]
 [<c011d7d9>] do_page_fault+0x0/0x4dc
 [<c0319713>] error_code+0x2f/0x38
 [<c023bde3>] sysrq_handle_crash+0x0/0x8
 [<c023c114>] __handle_sysrq+0x58/0xc6
 [<c01ad8d1>] write_sysrq_trigger+0x23/0x29
 [<c016c3bd>] vfs_write+0xb6/0xe2
 [<c016c487>] sys_write+0x3c/0x62
 [<c0318c97>] syscall_call+0x7/0xb
Code: 40 c0 e8 53 5c 0c 00 68 60 6f 40 c0 68 06 a5 32 c0 e8 64 0b 00 00 83 c4 0

Comment 1 Neil Horman 2006-06-14 11:34:20 UTC
Created attachment 130825 [details]
This is the upstream fix for the netpoll recursion problem

This is caused by the rx path in the network stack calling into the tx path
(through __netpoll_rx->arp_reply).  My patch fixes it by queueing arp frames
instead and replying to them later in the return up the tx path.  Its been
accepted upstream, and I'll backport it to RHEL4 shortly

Comment 2 Neil Horman 2006-06-14 14:37:49 UTC
Created attachment 130860 [details]
rhel4 port of the recursion patch

Comment 3 Neil Horman 2006-06-21 19:18:31 UTC
*** Bug 168733 has been marked as a duplicate of this bug. ***

Comment 4 RHEL Program Management 2006-09-07 19:15:43 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 RHEL Program Management 2006-09-07 19:16:17 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Jason Baron 2006-09-15 19:13:33 UTC
committed in stream U5 build 42.11. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 8 Jeff Moyer 2006-09-19 15:52:35 UTC
*** Bug 142269 has been marked as a duplicate of this bug. ***

Comment 10 Salina Chu 2006-09-20 19:34:42 UTC
Thanks for opening this bug up to IBM.

I am switching the mirror of IBM bug ( which was 142269 ) and dup'ed to this 
bug.  

Salina Chu
LTC screen team

Comment 11 IBM Bug Proxy 2006-09-29 07:36:18 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|DEFERRED                    |OPEN




------- Additional Comments From whchang.com  2006-09-29 03:32 EDT -------
Hello,

I have verified that kernel 2.6.9-42.12.EL fixed this issue from the netpoll 
layer.

Since the main goal of pushing the NAPI backport patch was to fix the netdump 
problem, no further effort will be made as the bug is indeed fixed from upper 
layer. I'll re-open this bug and then close it 

Comment 12 IBM Bug Proxy 2006-09-29 07:36:49 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|OPEN                        |ASSIGNED




------- Additional Comments From whchang.com  2006-09-29 03:32 EDT -------
Accepting the bug again in order to close it... 

Comment 13 IBM Bug Proxy 2006-10-02 13:12:23 UTC
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ACCEPTED                    |CLOSED
             Impact|------                      |Functionality




------- Additional Comments From sglass.com  2006-10-02 09:08 EDT -------
Closing the bug for you... 

Comment 18 Red Hat Bugzilla 2007-05-08 01:46:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html