Bug 194055 - netdump not working with 3c59x card
netdump not working with 3c59x card
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Neil Horman
Brian Brock
:
: 142269 168733 (view as bug list)
Depends On:
Blocks: 198694
  Show dependency treegraph
 
Reported: 2006-06-05 10:05 EDT by William Cohen
Modified: 2007-11-30 17:07 EST (History)
3 users (show)

See Also:
Fixed In Version: RHBA-2007-0304
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-07 21:46:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
This is the upstream fix for the netpoll recursion problem (2.13 KB, patch)
2006-06-14 07:34 EDT, Neil Horman
no flags Details | Diff
rhel4 port of the recursion patch (2.29 KB, patch)
2006-06-14 10:37 EDT, Neil Horman
no flags Details | Diff

  None (edit)
Description William Cohen 2006-06-05 10:05:17 EDT
Description of problem:

Netdump not working with 3c59x card in machine. The same machine generating a
netdump with e1000 with Neil Horman's patched kernel. Get following kernel panic
after the handshake message with 3c59x card:

<0>Kernel panic - not syncing: drivers/net/3c59x.c:2265: spin_lock(drivers/net/9

Version-Release number of selected component (if applicable):

#uname -a
Linux slingshot.devel.redhat.com 2.6.9-39.EL.bz193688 #1 Fri Jun 2 15:12:49 EDT
2006 i686 athlon i386 GNU/Linux


How reproducible:

Always


Steps to Reproduce:
1. Set up netdump client and server. Verified that worked with e1000 (or other
working) ethernet card.
2. Replace ethernet card with 3c59x card
3. force dump with "echo c > /proc/sysrq-trigger"
  
Actual results:

A kernel panic other than the one from "echo c > /proc/sysrq-trigger" which
kills netdump on the client machine.

Expected results:

Get a dump on the server.

Additional info:

Output from the netdump client machine:

<1>kernel BUG at kernel/panic.c:75!
<1>invalid operand: 0000 [#3]
Modules linked in: md5 ipv6 parport_pc lp parport netconsole netdump autofs4
i2dCPU:    0
EIP:    0060:[<c0123db2>]    Not tainted VLI
EFLAGS: 00010282   (2.6.9-39.EL.bz193688)
EIP is at panic+0x47/0x142
eax: 0000003c   ebx: c03ff000   ecx: c032a506   edx: c03ffe68
esi: 00000000   edi: c03230bd   ebp: c03fefbc   esp: c03ffe70
ds: 007b   es: 007b   ss: 0068
Process bash (pid: 3997, threadinfo=c03ff000 task=d8b1c600)
Stack: c03ff000 c0106926 c032305b e0839600 00000246 c0123db2 00000246 c0123db2
       00000000 c03ffecc c0139dcb c03fff64 00000000 c03ffecc c0106c7f c03fff64
       c03230bd 00000000 00000006 00000004 00000002 c03fff3b ffffffff 00000004
Call Trace:
 [<c0106926>] die+0x224/0x22b
 [<c0123db2>] panic+0x47/0x142
 [<c0123db2>] panic+0x47/0x142
 [<c0139dcb>] search_exception_tables+0x1f/0x21
 [<c0106c7f>] do_invalid_op+0xcf/0xf2
 [<c0123db2>] panic+0x47/0x142
 [<c011fbba>] scheduler_tick+0x1b/0x4aa
 [<c011f480>] activate_task+0x53/0x5f
 [<c012008b>] __wake_up_common+0x36/0x51
 [<c01ea6b9>] __delay+0x9/0xa
 [<c0250701>] serial8250_console_write+0x16c/0x1b2
 [<c0106bb0>] do_invalid_op+0x0/0xf2
 [<c0319713>] error_code+0x2f/0x38
 [<c0123db2>] panic+0x47/0x142
 [<e089b995>] boomerang_interrupt+0x94/0x3fc [3c59x]
 [<c0107ec8>] handle_IRQ_event+0x25/0x4f
 [<c0108896>] do_IRQ+0x18a/0x2bf
 =======================
 [<c0319654>] common_interrupt+0x18/0x20
 [<c0129a84>] __do_softirq+0x2c/0x79
 [<c010940e>] do_softirq+0x46/0x4d
 =======================
 [<c0119364>] smp_apic_timer_interrupt+0x8d/0x8f
 [<c0319676>] apic_timer_interrupt+0x1a/0x20
 [<c011aa6e>] disable_IO_APIC+0x0/0xa
 [<c0117271>] machine_restart+0x17/0x83
 [<e09ee6b0>] netpoll_start_netdump+0x36/0xd4 [netdump]
 [<c013d37f>] try_crashdump+0x31/0x33
 [<c0106843>] die+0x141/0x22b
 [<c0139dcb>] search_exception_tables+0x1f/0x21
 [<c0106c7f>] do_invalid_op+0xcf/0xf2
 [<c0123db2>] panic+0x47/0x142
 [<c01ea6b9>] __delay+0x9/0xa
 [<c0250701>] serial8250_console_write+0x16c/0x1b2
 [<c0106bb0>] do_invalid_op+0x0/0xf2
 [<c0319713>] error_code+0x2f/0x38
 [<c0123db2>] panic+0x47/0x142
 [<e089b42a>] boomerang_start_xmit+0x282/0x3a0 [3c59x]
 [<c02c8f0f>] netpoll_send_skb+0x186/0x22f
 [<c02c94d4>] arp_reply+0x2da/0x2e2
 [<c02c9537>] __netpoll_rx+0x5b/0x2ab
 [<c02bcd97>] netif_rx+0xb3/0x26e
 [<e089c22e>] boomerang_rx+0x284/0x3f3 [3c59x]
 [<e089b901>] boomerang_interrupt+0x0/0x3fc [3c59x]
 [<e089ba95>] boomerang_interrupt+0x194/0x3fc [3c59x]
 [<e089b901>] boomerang_interrupt+0x0/0x3fc [3c59x]
 [<e0899028>] poll_vortex+0x28/0x2d [3c59x]
 [<c02c8a5f>] netpoll_poll_dev+0x18/0x30
 [<e09ee5e2>] netdump_startup_handshake+0x7f/0x10d [netdump]
 [<c01ea6b9>] __delay+0x9/0xa
 [<c0250701>] serial8250_console_write+0x16c/0x1b2
 [<e09e9000>] write_msg+0x0/0x16d [netconsole]
 [<c012466a>] crashdump_call_console_drivers+0x27/0x31
 [<e09ee7cd>] netpoll_netdump+0x7f/0x478 [netdump]
 [<c023bde3>] sysrq_handle_crash+0x0/0x8
 [<e09ee74e>] netpoll_netdump+0x0/0x478 [netdump]
 [<e09ee745>] netpoll_start_netdump+0xcb/0xd4 [netdump]
 =======================
 [<c013d37f>] try_crashdump+0x31/0x33
 [<c0106843>] die+0x141/0x22b
 [<c011db59>] do_page_fault+0x380/0x4dc
 [<c023bde3>] sysrq_handle_crash+0x0/0x8
 [<c018bb38>] inode_setattr+0x18c/0x195
 [<c02c8f0f>] netpoll_send_skb+0x186/0x22f
 [<e09e9000>] write_msg+0x0/0x16d [netconsole]
 [<c011d7d9>] do_page_fault+0x0/0x4dc
 [<c0319713>] error_code+0x2f/0x38
 [<c023bde3>] sysrq_handle_crash+0x0/0x8
 [<c023c114>] __handle_sysrq+0x58/0xc6
 [<c01ad8d1>] write_sysrq_trigger+0x23/0x29
 [<c016c3bd>] vfs_write+0xb6/0xe2
 [<c016c487>] sys_write+0x3c/0x62
 [<c0318c97>] syscall_call+0x7/0xb
Code: 40 c0 e8 53 5c 0c 00 68 60 6f 40 c0 68 06 a5 32 c0 e8 64 0b 00 00 83 c4 0
Comment 1 Neil Horman 2006-06-14 07:34:20 EDT
Created attachment 130825 [details]
This is the upstream fix for the netpoll recursion problem

This is caused by the rx path in the network stack calling into the tx path
(through __netpoll_rx->arp_reply).  My patch fixes it by queueing arp frames
instead and replying to them later in the return up the tx path.  Its been
accepted upstream, and I'll backport it to RHEL4 shortly
Comment 2 Neil Horman 2006-06-14 10:37:49 EDT
Created attachment 130860 [details]
rhel4 port of the recursion patch
Comment 3 Neil Horman 2006-06-21 15:18:31 EDT
*** Bug 168733 has been marked as a duplicate of this bug. ***
Comment 4 RHEL Product and Program Management 2006-09-07 15:15:43 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 5 RHEL Product and Program Management 2006-09-07 15:16:17 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 7 Jason Baron 2006-09-15 15:13:33 EDT
committed in stream U5 build 42.11. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 8 Jeffrey Moyer 2006-09-19 11:52:35 EDT
*** Bug 142269 has been marked as a duplicate of this bug. ***
Comment 10 Salina Chu 2006-09-20 15:34:42 EDT
Thanks for opening this bug up to IBM.

I am switching the mirror of IBM bug ( which was 142269 ) and dup'ed to this 
bug.  

Salina Chu
LTC screen team
Comment 11 IBM Bug Proxy 2006-09-29 03:36:18 EDT
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|DEFERRED                    |OPEN




------- Additional Comments From whchang@tw.ibm.com  2006-09-29 03:32 EDT -------
Hello,

I have verified that kernel 2.6.9-42.12.EL fixed this issue from the netpoll 
layer.

Since the main goal of pushing the NAPI backport patch was to fix the netdump 
problem, no further effort will be made as the bug is indeed fixed from upper 
layer. I'll re-open this bug and then close it 
Comment 12 IBM Bug Proxy 2006-09-29 03:36:49 EDT
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|OPEN                        |ASSIGNED




------- Additional Comments From whchang@tw.ibm.com  2006-09-29 03:32 EDT -------
Accepting the bug again in order to close it... 
Comment 13 IBM Bug Proxy 2006-10-02 09:12:23 EDT
changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ACCEPTED                    |CLOSED
             Impact|------                      |Functionality




------- Additional Comments From sglass@us.ibm.com  2006-10-02 09:08 EDT -------
Closing the bug for you... 
Comment 18 Red Hat Bugzilla 2007-05-07 21:46:44 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html

Note You need to log in before you can comment on or make changes to this bug.