Bug 500667 - Hardware Error bringing up e1000e interface with jumbo frames
Hardware Error bringing up e1000e interface with jumbo frames
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
low Severity medium
: rc
: ---
Assigned To: Andy Gospodarek
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-05-13 12:07 EDT by Orion Poplawski
Modified: 2014-06-29 19:01 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-04-22 09:57:31 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Orion Poplawski 2009-05-13 12:07:16 EDT
Description of problem:

Since kernel-2.6.18-128.1.10.el5, I'm seeing the following at boot about 25% of the time:

May 13 09:13:41 castor kernel: e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k4
May 13 09:13:41 castor kernel: e1000e: Copyright (c) 1999-2008 Intel Corporation.
May 13 09:13:41 castor kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 44 (level, low) -> IRQ 185
May 13 09:13:41 castor kernel: intel_rng: FWH not detected
May 13 09:13:41 castor kernel: eth0: (PCI Express:2.5GB/s:Width x4) 00:30:48:7f:45:1a
May 13 09:13:41 castor kernel: eth0: Intel(R) PRO/1000 Network Connection
May 13 09:13:41 castor kernel: eth0: MAC: 4, PHY: 5, PBA No: 2050ff-0ff
May 13 09:13:41 castor kernel: GSI 25 sharing vector 0x6A and IRQ 25
May 13 09:13:41 castor kernel: ACPI: PCI Interrupt 0000:05:00.1[B] -> GSI 40 (level, low) -> IRQ 106
May 13 09:13:41 castor kernel: GSI 26 sharing vector 0x7A and IRQ 26
May 13 09:13:41 castor kernel: ACPI: PCI Interrupt 0000:00:1f.3[C] -> GSI 18 (level, low) -> IRQ 122
May 13 09:13:41 castor kernel: 0000:05:00.1: Hardware Error
May 13 09:13:41 castor kernel: eth1: (PCI Express:2.5GB/s:Width x4) 00:30:48:7f:45:1b
May 13 09:13:41 castor kernel: eth1: Intel(R) PRO/1000 Network Connection
May 13 09:13:41 castor kernel: eth1: MAC: 4, PHY: 5, PBA No: 2050ff-0ff
May 13 09:13:42 castor kernel: eth0: Link is Up 1000 Mbps Full Duplex, Flow Control: None
May 13 09:13:42 castor kernel: eth1: changing MTU from 1500 to 8982
May 13 09:13:42 castor kernel: eth1: Hardware Error
May 13 09:13:44 castor kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready

And eth1 does not work.

05:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01)
        Subsystem: Super Micro Computer Inc Unknown device 1096
        Flags: bus master, fast devsel, latency 0, IRQ 186
        Memory at d8060000 (32-bit, non-prefetchable) [size=128K]
        Memory at d8040000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at 2020 [size=32]
        [virtual] Expansion ROM at d8310000 [disabled] [size=64K]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+
        Capabilities: [e0] Express Endpoint IRQ 0
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 1a-45-7f-ff-ff-48-30-00

I'm backing off to 2.6.18-128.1.6.el5 for now.
Comment 1 Andy Gospodarek 2009-05-20 11:55:17 EDT
There were no changes specifically to e1000e between -128.1.6 and -128.1.10, so this is a bit odd.  Is there a particular ring size between 1500 and 8982 that seems to work as close to 100% of the time as far as you can tell?  I'd be curious how consistently exactly 4000 or 8000 worked.
Comment 2 Orion Poplawski 2009-05-27 12:11:57 EDT
Also seeing:

e1000e: probe of 0000:05:00.1 failed with error -2

and no presence of eth1 at all.

I'll try 1500 a bit and see if that makes any difference.
Comment 4 Andy Gospodarek 2009-10-19 14:18:15 EDT
Has 5.4 been tried and does it resolve this problem?
Comment 5 Andy Gospodarek 2010-04-22 09:57:31 EDT
Several errors related to the system PHY that produced failure like this:

e1000e: probe of 0000:04:00.1 failed with error -2

were fixed in RHEL5.5.  There were a few other times when we have seen this error that were fixed with BIOS updates.  Please update to the latest kernel and re-open if that is still broken.

Note You need to log in before you can comment on or make changes to this bug.