Bug 601136

Summary: skge deadlocks during initialization in kdump kernel
Product: Red Hat Enterprise Linux 5 Reporter: Linux engineering teams - Veritas <linux26port>
Component: kernelAssignee: Michal Schmidt <mschmidt>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4CC: jwilson
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-11 13:56:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Console log messages
none
lspci -vv output
none
modinfo skge none

Description Linux engineering teams - Veritas 2010-06-07 10:11:14 UTC
Created attachment 421769 [details]
Console log messages

Description of problem: 
skge driver hits invalid opcode in kernel 2.6.18-164.el5 on x86_64 architecture
at skb_checksum_help called from dev_queue_xmit with:
Kernel BUG at net/core/dev.c:1266
invalid opcode: 0000 [1] SMP 

Version-Release number of selected component (if applicable):
The uname -a output shows:
Linux vcslx300.vxindia.veritas.com 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

How reproducible:
After sending raw ethernet packet traffic over the skge interface, errors show up. The ensuing panic does not get a vmcore since the crash kernel environment itself hits softlockup issues in the same driver. Eventually the machine freezes

Actual results:
Relevant console log excerpt, modinfo output, and lspci output are attached herewith.

Additional info:
Symantec contact: ashutosh_dubey

Comment 1 Linux engineering teams - Veritas 2010-06-07 10:13:16 UTC
Created attachment 421770 [details]
lspci -vv output

Comment 2 Linux engineering teams - Veritas 2010-06-07 10:14:00 UTC
Created attachment 421773 [details]
modinfo skge

Comment 3 Michal Schmidt 2010-06-10 15:58:07 UTC
In the lspci output I see the system has three NICs:
 2x Intel 82541GI (e1000)
 1x SysKonnect (skge)

Did the test case consist of forwarding of the traffic from one of the Intel NICs to the SysKonnect interface?

The first kernel BUG in the console log suggests that something was being received on the e1000 NIC and then passed to the module "llt" (which I am not familiar with). llt then attempted to send something (a response? a forwarded packet?), but the skb it produced did not pass one of the BUG_ON(...) checks in skb_checksum_help().

So the original crash was caused by the llt module, or possibly by e1000. Nothing implicates skge so far.

Then kexec/kdump kicked in and started the backup kernel in an attempt to collect a vmcore. When it loaded the skge driver, it deadlocked. This does indeed look like a bug in skge - it received an interrupt before it was prepared to handle it correctly (the spinlock it tried to get in the ISR was not initialized yet).

I'll see how I can make skge initialization more robust in the kdump environment.
However, there's not much I could do about the original BUG which involves the llt module.

Comment 4 Michal Schmidt 2013-12-11 13:56:12 UTC
This Bugzilla has been reviewed by Red Hat and is not planned on being
addressed in Red Hat Enterprise Linux 5, and therefore is being closed.
If this bug is critical to production systems, please contact your Red
Hat support representative and provide a sufficient business
justification in order to re-open it.