Bug 601136 - skge deadlocks during initialization in kdump kernel
skge deadlocks during initialization in kdump kernel
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.4
x86_64 Linux
low Severity medium
: rc
: ---
Assigned To: Michal Schmidt
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-07 06:11 EDT by Linux engineering teams - Veritas
Modified: 2013-12-11 08:56 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-12-11 08:56:12 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Console log messages (14.10 KB, text/plain)
2010-06-07 06:11 EDT, Linux engineering teams - Veritas
no flags Details
lspci -vv output (27.08 KB, text/plain)
2010-06-07 06:13 EDT, Linux engineering teams - Veritas
no flags Details
modinfo skge (1.08 KB, text/plain)
2010-06-07 06:14 EDT, Linux engineering teams - Veritas
no flags Details

  None (edit)
Description Linux engineering teams - Veritas 2010-06-07 06:11:14 EDT
Created attachment 421769 [details]
Console log messages

Description of problem: 
skge driver hits invalid opcode in kernel 2.6.18-164.el5 on x86_64 architecture
at skb_checksum_help called from dev_queue_xmit with:
Kernel BUG at net/core/dev.c:1266
invalid opcode: 0000 [1] SMP 

Version-Release number of selected component (if applicable):
The uname -a output shows:
Linux vcslx300.vxindia.veritas.com 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
After sending raw ethernet packet traffic over the skge interface, errors show up. The ensuing panic does not get a vmcore since the crash kernel environment itself hits softlockup issues in the same driver. Eventually the machine freezes

Actual results:
Relevant console log excerpt, modinfo output, and lspci output are attached herewith.

Additional info:
Symantec contact: ashutosh_dubey@symantec.com
Comment 1 Linux engineering teams - Veritas 2010-06-07 06:13:16 EDT
Created attachment 421770 [details]
lspci -vv output
Comment 2 Linux engineering teams - Veritas 2010-06-07 06:14:00 EDT
Created attachment 421773 [details]
modinfo skge
Comment 3 Michal Schmidt 2010-06-10 11:58:07 EDT
In the lspci output I see the system has three NICs:
 2x Intel 82541GI (e1000)
 1x SysKonnect (skge)

Did the test case consist of forwarding of the traffic from one of the Intel NICs to the SysKonnect interface?

The first kernel BUG in the console log suggests that something was being received on the e1000 NIC and then passed to the module "llt" (which I am not familiar with). llt then attempted to send something (a response? a forwarded packet?), but the skb it produced did not pass one of the BUG_ON(...) checks in skb_checksum_help().

So the original crash was caused by the llt module, or possibly by e1000. Nothing implicates skge so far.

Then kexec/kdump kicked in and started the backup kernel in an attempt to collect a vmcore. When it loaded the skge driver, it deadlocked. This does indeed look like a bug in skge - it received an interrupt before it was prepared to handle it correctly (the spinlock it tried to get in the ISR was not initialized yet).

I'll see how I can make skge initialization more robust in the kdump environment.
However, there's not much I could do about the original BUG which involves the llt module.
Comment 4 Michal Schmidt 2013-12-11 08:56:12 EST
This Bugzilla has been reviewed by Red Hat and is not planned on being
addressed in Red Hat Enterprise Linux 5, and therefore is being closed.
If this bug is critical to production systems, please contact your Red
Hat support representative and provide a sufficient business
justification in order to re-open it.

Note You need to log in before you can comment on or make changes to this bug.