Bug 599476 - bnx2x driver dumps logs. Network unusable. [NEEDINFO]
Summary: bnx2x driver dumps logs. Network unusable.
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
(Show other bugs)
Version: 5.4
Hardware: x86_64 Linux
low
medium
Target Milestone: rc
: ---
Assignee: Michal Schmidt
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-06-03 10:20 UTC by Linux engineering teams - Veritas
Modified: 2013-12-11 13:55 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-11 13:55:04 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
sgruszka: needinfo? (udas)


Attachments (Terms of Use)
Console messages from th ebnx2x driver (412.89 KB, text/plain)
2010-06-03 10:20 UTC, Linux engineering teams - Veritas
no flags Details
debug tool (4.69 MB, application/x-gzip)
2010-06-30 13:07 UTC, Dmitry Kravkov
no flags Details
example of running the debug tool (4.82 KB, text/plain)
2010-06-30 13:10 UTC, Dmitry Kravkov
no flags Details

Description Linux engineering teams - Veritas 2010-06-03 10:20:14 UTC
Created attachment 419311 [details]
Console messages from th ebnx2x driver

Description of problem:
The bnx2x driver dumps verbose messages to the console prefixed with "bnx2x_panic_dump". After that network becomes unusable. This causes the Symantec clusterware to eventually panic nodes.

Version-Release number of selected component (if applicable):
(Linux)(c1062-hpblade1) ~{1} uname -a
Linux c1062-hpblade1.engba.symantec.com 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
(Linux)(c1062-hpblade1) ~{2} modinfo bnx2x
filename:       /lib/modules/2.6.18-164.el5/kernel/drivers/net/bnx2x.ko
version:        1.48.105
license:        GPL
description:    Broadcom NetXtreme II BCM57710/57711/57711E Driver
author:         Eliezer Tamir
srcversion:     6D030D52DFD981356EEC2BE
alias:          pci:v000014E4d00001650sv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Fsv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Esv*sd*bc*sc*i*
depends:        
vermagic:       2.6.18-164.el5 SMP mod_unload gcc-4.1
parm:           multi_mode: Use per-CPU queues (int)
parm:           disable_tpa: Disable the TPA (LRO) feature (int)
parm:           int_mode: Force interrupt mode (1 INT#x; 2 MSI) (int)
parm:           poll: Use polling (for debug) (int)
parm:           mrrs: Force Max Read Req Size (0..3) (for debug) (int)
parm:           debug: Default debug msglevel (int)
module_sig:     883f3504a8b7cb4bd273d74512bb112f59309f6c2877f55cb62445a8c8af7cdaf1e559983bdb09f5b1b3fa9aca58a8f88267866b046e346ebe3bbaa
(Linux)(c1062-hpblade1) ~{3} 


How reproducible:
Occurs in relatively heavy network load. Easily reproducible with Symantec clusterware with large configurations.
  
Actual results:
Machine shows bnx2x messages on console. Network goes unusable.

Additional info:
Attaching the messages seen.

Symantec contact: udas@veritas.com

Comment 1 Stanislaw Gruszka 2010-06-07 12:24:37 UTC
This bug is most likely the same issue as we have in bug 516090. It is already fixed by driver update. Try up-to-date kernel like 2.6.18-194.3.1.el5, or kernels from http://people.redhat.com/jwilson/el5/

Comment 2 Stanislaw Gruszka 2010-06-09 12:57:17 UTC
Any comments on above?

Comment 3 Linux engineering teams - Veritas 2010-06-24 10:29:37 UTC
The latest kernel from the above link did not work with us. Similar log entries were observed.

Symantec contact: udas@veritas.com

Comment 4 Stanislaw Gruszka 2010-06-24 11:27:31 UTC
We have another bnx2x panic, happens on RHEL5.4 and on up-to-date RHEL5 kernels.

Comment 5 Stanislaw Gruszka 2010-06-24 11:28:23 UTC
@Veritas, did blacklisting cnic and bnx2i modules help?

Comment 6 Stanislaw Gruszka 2010-06-30 12:13:15 UTC
Please try kernel 204.el5, it include cnic fix for bug that can cause bnx2x panic.

If it does not help we will probably need more info. I'm not sure if this bnx2x crash dump contains all information to allow Broadcom to fix the issue. 

@Broadcom, do you want any more info to track down this bnx2x panic?

Comment 7 Dmitry Kravkov 2010-06-30 13:07:37 UTC
Created attachment 427988 [details]
debug tool

debugging tool

Comment 8 Dmitry Kravkov 2010-06-30 13:09:15 UTC
Please provide us result of "grcDump" command using attached debug tool
Building it:
> tar xf edebug_linux_ver_0.1.4.tar.gz
> cd  edebug_0.1.4/
> make
>./load.sh

Using it
The tool will show list of BCM5771x device on the system
select one caused the crash by "device X" (you can recognize it by pci bus address or MAC address). Then apply command "grcDump regs.dump". Exit application by "exit" command. Upload generated regs.dump file.

Thanks

Comment 9 Dmitry Kravkov 2010-06-30 13:10:43 UTC
Created attachment 427990 [details]
example of running the debug tool

Comment 10 Linux engineering teams - Veritas 2010-07-07 19:45:50 UTC
Our repro setup is no longer available for this. We will get back to you whenever they become available again. However, the issue is easily reproduce on HP blades.

Comment 11 Michal Schmidt 2013-12-11 13:55:04 UTC
This BZ has had the needinfo? flag set for more than 3 years. Closing.


Note You need to log in before you can comment on or make changes to this bug.