Red Hat Bugzilla – Bug 496248
When the network is initialized by e1000e driver, I lose connection to the IPMI card
Last modified: 2012-01-20 18:26:46 EST
Description of problem:
The card still responds correctly using the proprietary Supermicro tools, and also with ipmitool. Just not remotely.
Version-Release number of selected component (if applicable):
e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k4
Motherboard SuperMicro and SuperMicro IPMI card
Steps to Reproduce:
1. Start CentOS 5.3
2. Remote connection with OpenIPMI tool
Actual results: Non functional
If I understand it correctly, this is kernel problem.
The report is corrrect. I see the same problem with Supermicro server PDSMI+ motherboard and AOC-IPMI20-E IPMI card. I already had the IPMI card returned and replaced in vain... When rebooting an older kernel (2.6.18-92.1.18.el5xen), the IPMI starts working again.
See commit eb7c3adb1ca92450870dbb0d347fc986cd5e2af4 that is included in kernel patch-2.6.28-rc5-git2. This should urgently be back ported to 2.6.18 and noted in the errata.
What is the current status? Will the fix for this issue be included in the next kernel for RHEL?
I continue with the same problem and with the last kernel.
Kernel 2.6.18-128.1.14.el5xen still seems to contain the broken driver version 0.3.3.3-k4. So I guess nothing has changed to it.
Work-around in http://bugs.centos.org/view.php?id=3477
I'm actually seeing this behavior with with 2.6.18-164.el5.x86_64, the bnx2 driver that comes with it (1.9.3 I believe), and the Dell R710 platform.
The system comes up just fine with SOL over IPMI and as soon as the bnx2 driver takes over I lose IPMI. If I ssh to the machine and /sbin/reboot, as soon as the driver is unloaded, IPMI comes back.
I've downloaded and installed Broadcom's lastest netxtreme2 driver (1.9.20b5).
I've flashed the BIOS to the latest version (1.2.6) as well as the iDRAC firmware (to 1.20.1). None of it has helped much.
(In reply to comment #7)
> I'm actually seeing this behavior with with 2.6.18-164.el5.x86_64, the bnx2
> driver that comes with it (1.9.3 I believe), and the Dell R710 platform.
This bug is specific to hardware driven by the e1000e driver, you've got a similar-but-different problem, which should be filed under another bug.
Actually this turned out to be Ganglia+Multicast for me. The Dell iDRAC is running some OS (Linux?) that was attempting to process the multicast traffic. Turn off Ganglia and the shared bnx2 connection does the right thing.
Well, it looks like things have improved. Kernel 2.6.18-164.9.1.el5xen seems to contain a version of the e1000e kernel driver that supports the CrcStripping option. So on SuperMicro boards, one needs to add a file in the /etc/modprobe.d directory containing the line
options e1000e CrcStripping=0
Can this added to the errata?
The CrcStripping option was actually added in the 5.4 kernels, so this bug is already fixed, as far as I can see. We don't typically update an errata after it has already been released. Could be a candidate for adding a knowledgebase article for, but I don't know offhand how to make that happen...