Created attachment 421769 [details]
Console log messages
Description of problem:
skge driver hits invalid opcode in kernel 2.6.18-164.el5 on x86_64 architecture
at skb_checksum_help called from dev_queue_xmit with:
Kernel BUG at net/core/dev.c:1266
invalid opcode: 0000  SMP
Version-Release number of selected component (if applicable):
The uname -a output shows:
Linux vcslx300.vxindia.veritas.com 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
After sending raw ethernet packet traffic over the skge interface, errors show up. The ensuing panic does not get a vmcore since the crash kernel environment itself hits softlockup issues in the same driver. Eventually the machine freezes
Relevant console log excerpt, modinfo output, and lspci output are attached herewith.
Symantec contact: email@example.com
Created attachment 421770 [details]
lspci -vv output
Created attachment 421773 [details]
In the lspci output I see the system has three NICs:
2x Intel 82541GI (e1000)
1x SysKonnect (skge)
Did the test case consist of forwarding of the traffic from one of the Intel NICs to the SysKonnect interface?
The first kernel BUG in the console log suggests that something was being received on the e1000 NIC and then passed to the module "llt" (which I am not familiar with). llt then attempted to send something (a response? a forwarded packet?), but the skb it produced did not pass one of the BUG_ON(...) checks in skb_checksum_help().
So the original crash was caused by the llt module, or possibly by e1000. Nothing implicates skge so far.
Then kexec/kdump kicked in and started the backup kernel in an attempt to collect a vmcore. When it loaded the skge driver, it deadlocked. This does indeed look like a bug in skge - it received an interrupt before it was prepared to handle it correctly (the spinlock it tried to get in the ISR was not initialized yet).
I'll see how I can make skge initialization more robust in the kdump environment.
However, there's not much I could do about the original BUG which involves the llt module.
This Bugzilla has been reviewed by Red Hat and is not planned on being
addressed in Red Hat Enterprise Linux 5, and therefore is being closed.
If this bug is critical to production systems, please contact your Red
Hat support representative and provide a sufficient business
justification in order to re-open it.