Bug 601136 - skge deadlocks during initialization in kdump kernel
Summary: skge deadlocks during initialization in kdump kernel
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Michal Schmidt
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-06-07 10:11 UTC by Linux engineering teams - Veritas
Modified: 2013-12-11 13:56 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-12-11 13:56:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Console log messages (14.10 KB, text/plain)
2010-06-07 10:11 UTC, Linux engineering teams - Veritas
no flags Details
lspci -vv output (27.08 KB, text/plain)
2010-06-07 10:13 UTC, Linux engineering teams - Veritas
no flags Details
modinfo skge (1.08 KB, text/plain)
2010-06-07 10:14 UTC, Linux engineering teams - Veritas
no flags Details

Description Linux engineering teams - Veritas 2010-06-07 10:11:14 UTC
Created attachment 421769 [details]
Console log messages

Description of problem: 
skge driver hits invalid opcode in kernel 2.6.18-164.el5 on x86_64 architecture
at skb_checksum_help called from dev_queue_xmit with:
Kernel BUG at net/core/dev.c:1266
invalid opcode: 0000 [1] SMP 

Version-Release number of selected component (if applicable):
The uname -a output shows:
Linux vcslx300.vxindia.veritas.com 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
After sending raw ethernet packet traffic over the skge interface, errors show up. The ensuing panic does not get a vmcore since the crash kernel environment itself hits softlockup issues in the same driver. Eventually the machine freezes

Actual results:
Relevant console log excerpt, modinfo output, and lspci output are attached herewith.

Additional info:
Symantec contact: ashutosh_dubey

Comment 1 Linux engineering teams - Veritas 2010-06-07 10:13:16 UTC
Created attachment 421770 [details]
lspci -vv output

Comment 2 Linux engineering teams - Veritas 2010-06-07 10:14:00 UTC
Created attachment 421773 [details]
modinfo skge

Comment 3 Michal Schmidt 2010-06-10 15:58:07 UTC
In the lspci output I see the system has three NICs:
 2x Intel 82541GI (e1000)
 1x SysKonnect (skge)

Did the test case consist of forwarding of the traffic from one of the Intel NICs to the SysKonnect interface?

The first kernel BUG in the console log suggests that something was being received on the e1000 NIC and then passed to the module "llt" (which I am not familiar with). llt then attempted to send something (a response? a forwarded packet?), but the skb it produced did not pass one of the BUG_ON(...) checks in skb_checksum_help().

So the original crash was caused by the llt module, or possibly by e1000. Nothing implicates skge so far.

Then kexec/kdump kicked in and started the backup kernel in an attempt to collect a vmcore. When it loaded the skge driver, it deadlocked. This does indeed look like a bug in skge - it received an interrupt before it was prepared to handle it correctly (the spinlock it tried to get in the ISR was not initialized yet).

I'll see how I can make skge initialization more robust in the kdump environment.
However, there's not much I could do about the original BUG which involves the llt module.

Comment 4 Michal Schmidt 2013-12-11 13:56:12 UTC
This Bugzilla has been reviewed by Red Hat and is not planned on being
addressed in Red Hat Enterprise Linux 5, and therefore is being closed.
If this bug is critical to production systems, please contact your Red
Hat support representative and provide a sufficient business
justification in order to re-open it.


Note You need to log in before you can comment on or make changes to this bug.