Red Hat Bugzilla – Bug 214526
sporadic panic in bnx2 module
Last modified: 2014-06-29 18:58:05 EDT
Description of problem:
We're seeing occasional kernel panics in the bnx2 module's bnx_poll function.
I have a partial backtrace for the panic:
<3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20
DWARF2 unwinder stuck at error_exit+0x0/0x84
Leftover inexact backtrace:
<IRQ> [<ffffffff88175e8c>] :bnx2:bnx2_poll+0xf9/0xb7b
Kernel panic - not syncing: Aiee, killing interrupt handler!
The machine is a Dell PowerEdge 1950 with dual 2.66GHz Woodcrest Xeon CPUs and
16GB of RAM. The kernel only has one non-Red Hat patch applied to it, which
backs out the following change in order to fix automount /net trouble:
Does the backtrace ring any bells? I tried to trace it down myself, but don't
know how to get gdb to read debuginfo symbols for kernel modules in the
kernel-debuginfo package. Any pointers to docs on that would be greatly appreciated.
Version-Release number of selected component (if applicable):
Intermittent; one or two a day in a farm of about 200 machines.
Steps to Reproduce:
1. Boot machine
2. Let it chew through various rendering tasks
3. Read the stack trace when it eventually panics
4. Not a very good list of steps here. sorry.
a kernel panic
I've been seeing this recently on some bnx2 hardware. Can you please attach
`lspci -vvv` output so I can understand which bnx2 hardware is on the system?
Created attachment 141795 [details]
lspci -vv output for the machine suffering from bnx2 segfaults
Here you go. Thankfully we haven't seen one of these panics since submitting
the bug report, but we haven't changed anything that would have fixed them. I'd
still like to find a cause if we can.
Thanks for sending that output. I've been investigating panics like these on
other kernels and will let you know when we come up with a solution there since
it should apply here as well.
Please let me know if you continue to see this panic or if you come up with a
reliable way to reproduce it.
No problem. I was incorrect about not having seen it in since reporting the
bug--we actually catch seven or eight of them a day. The admins responsible for
the farm have just been rebooting the machines and not telling me about it. :)
So, if there's any other information I can provide, please let me know!
So far we've found no pattern to the panics.
Created attachment 142215 [details]
Currently we are still collecting data for the bnx2 crash and using the
Do you need me to roll a test kernel with this patch or would you be willing to
build one yourself?
I'm happy to build it myself. Thanks, though! It'll probably be a couple of days
before we can install it on a significant number of machines, but I'll get the
We finally had a panic on a machine with this patch installed. I don't see any
output from the patch in the messages file from before the crash; would it have
been logged to disk anywhere else before the machine froze up?
I'm hoping a serial console wouldn't have been required to catch the message; we
have hundreds of these machines, and attaching serial consoles to a number of
them large enough to catch a panic soon would be pretty difficult.
The output probably did go to the serial port, but that's OK. I've been working
this issue with some others on a different release and arch and the following
patch has produced good results:
This came as a suggestion from the upstream maintainer based on the output from
the patch in Comment #7.
Based on the other feedback I've gotten it seems this should probably resolve
your issue. I realize that installing yet another kernel on that many machines
is non-trivial, but based on the results from others it seems like a good
candidate to resolve the panics. Please let me know if this resolves your issue.
This patch looks like the final one that will resolve your issue:
Any chance you were able to verify the patch in comment #11?
We have the patch active on a test group of render machines, and so far things
are looking good. We're going to increase the number of machines using it soon,
so I should be able to have a more definitive answer soon.
Thanks for the checking in! I'll update again when I have more info.
Sounds good, Lars. The patch for this will appear in 2.6.20.