Bug 599476

Summary: bnx2x driver dumps logs. Network unusable.
Product: Red Hat Enterprise Linux 5 Reporter: Linux engineering teams - Veritas <linux26port>
Component: kernelAssignee: Michal Schmidt <mschmidt>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: Dmitry.Kravkov, eilong, gideonn, mschmidt, udas, vladz
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-11 13:55:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Console messages from th ebnx2x driver
none
debug tool
none
example of running the debug tool none

Description Linux engineering teams - Veritas 2010-06-03 10:20:14 UTC
Created attachment 419311 [details]
Console messages from th ebnx2x driver

Description of problem:
The bnx2x driver dumps verbose messages to the console prefixed with "bnx2x_panic_dump". After that network becomes unusable. This causes the Symantec clusterware to eventually panic nodes.

Version-Release number of selected component (if applicable):
(Linux)(c1062-hpblade1) ~{1} uname -a
Linux c1062-hpblade1.engba.symantec.com 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
(Linux)(c1062-hpblade1) ~{2} modinfo bnx2x
filename:       /lib/modules/2.6.18-164.el5/kernel/drivers/net/bnx2x.ko
version:        1.48.105
license:        GPL
description:    Broadcom NetXtreme II BCM57710/57711/57711E Driver
author:         Eliezer Tamir
srcversion:     6D030D52DFD981356EEC2BE
alias:          pci:v000014E4d00001650sv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Fsv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Esv*sd*bc*sc*i*
depends:        
vermagic:       2.6.18-164.el5 SMP mod_unload gcc-4.1
parm:           multi_mode: Use per-CPU queues (int)
parm:           disable_tpa: Disable the TPA (LRO) feature (int)
parm:           int_mode: Force interrupt mode (1 INT#x; 2 MSI) (int)
parm:           poll: Use polling (for debug) (int)
parm:           mrrs: Force Max Read Req Size (0..3) (for debug) (int)
parm:           debug: Default debug msglevel (int)
module_sig:     883f3504a8b7cb4bd273d74512bb112f59309f6c2877f55cb62445a8c8af7cdaf1e559983bdb09f5b1b3fa9aca58a8f88267866b046e346ebe3bbaa
(Linux)(c1062-hpblade1) ~{3} 


How reproducible:
Occurs in relatively heavy network load. Easily reproducible with Symantec clusterware with large configurations.
  
Actual results:
Machine shows bnx2x messages on console. Network goes unusable.

Additional info:
Attaching the messages seen.

Symantec contact: udas

Comment 1 Stanislaw Gruszka 2010-06-07 12:24:37 UTC
This bug is most likely the same issue as we have in bug 516090. It is already fixed by driver update. Try up-to-date kernel like 2.6.18-194.3.1.el5, or kernels from http://people.redhat.com/jwilson/el5/

Comment 2 Stanislaw Gruszka 2010-06-09 12:57:17 UTC
Any comments on above?

Comment 3 Linux engineering teams - Veritas 2010-06-24 10:29:37 UTC
The latest kernel from the above link did not work with us. Similar log entries were observed.

Symantec contact: udas

Comment 4 Stanislaw Gruszka 2010-06-24 11:27:31 UTC
We have another bnx2x panic, happens on RHEL5.4 and on up-to-date RHEL5 kernels.

Comment 5 Stanislaw Gruszka 2010-06-24 11:28:23 UTC
@Veritas, did blacklisting cnic and bnx2i modules help?

Comment 6 Stanislaw Gruszka 2010-06-30 12:13:15 UTC
Please try kernel 204.el5, it include cnic fix for bug that can cause bnx2x panic.

If it does not help we will probably need more info. I'm not sure if this bnx2x crash dump contains all information to allow Broadcom to fix the issue. 

@Broadcom, do you want any more info to track down this bnx2x panic?

Comment 7 Dmitry Kravkov 2010-06-30 13:07:37 UTC
Created attachment 427988 [details]
debug tool

debugging tool

Comment 8 Dmitry Kravkov 2010-06-30 13:09:15 UTC
Please provide us result of "grcDump" command using attached debug tool
Building it:
> tar xf edebug_linux_ver_0.1.4.tar.gz
> cd  edebug_0.1.4/
> make
>./load.sh

Using it
The tool will show list of BCM5771x device on the system
select one caused the crash by "device X" (you can recognize it by pci bus address or MAC address). Then apply command "grcDump regs.dump". Exit application by "exit" command. Upload generated regs.dump file.

Thanks

Comment 9 Dmitry Kravkov 2010-06-30 13:10:43 UTC
Created attachment 427990 [details]
example of running the debug tool

Comment 10 Linux engineering teams - Veritas 2010-07-07 19:45:50 UTC
Our repro setup is no longer available for this. We will get back to you whenever they become available again. However, the issue is easily reproduce on HP blades.

Comment 11 Michal Schmidt 2013-12-11 13:55:04 UTC
This BZ has had the needinfo? flag set for more than 3 years. Closing.

Comment 12 Red Hat Bugzilla 2023-09-14 01:21:28 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days