Description of problem: Since kernel-2.6.18-128.1.10.el5, I'm seeing the following at boot about 25% of the time: May 13 09:13:41 castor kernel: e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k4 May 13 09:13:41 castor kernel: e1000e: Copyright (c) 1999-2008 Intel Corporation. May 13 09:13:41 castor kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 44 (level, low) -> IRQ 185 May 13 09:13:41 castor kernel: intel_rng: FWH not detected May 13 09:13:41 castor kernel: eth0: (PCI Express:2.5GB/s:Width x4) 00:30:48:7f:45:1a May 13 09:13:41 castor kernel: eth0: Intel(R) PRO/1000 Network Connection May 13 09:13:41 castor kernel: eth0: MAC: 4, PHY: 5, PBA No: 2050ff-0ff May 13 09:13:41 castor kernel: GSI 25 sharing vector 0x6A and IRQ 25 May 13 09:13:41 castor kernel: ACPI: PCI Interrupt 0000:05:00.1[B] -> GSI 40 (level, low) -> IRQ 106 May 13 09:13:41 castor kernel: GSI 26 sharing vector 0x7A and IRQ 26 May 13 09:13:41 castor kernel: ACPI: PCI Interrupt 0000:00:1f.3[C] -> GSI 18 (level, low) -> IRQ 122 May 13 09:13:41 castor kernel: 0000:05:00.1: Hardware Error May 13 09:13:41 castor kernel: eth1: (PCI Express:2.5GB/s:Width x4) 00:30:48:7f:45:1b May 13 09:13:41 castor kernel: eth1: Intel(R) PRO/1000 Network Connection May 13 09:13:41 castor kernel: eth1: MAC: 4, PHY: 5, PBA No: 2050ff-0ff May 13 09:13:42 castor kernel: eth0: Link is Up 1000 Mbps Full Duplex, Flow Control: None May 13 09:13:42 castor kernel: eth1: changing MTU from 1500 to 8982 May 13 09:13:42 castor kernel: eth1: Hardware Error May 13 09:13:44 castor kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready And eth1 does not work. 05:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet Controller (Copper) (rev 01) Subsystem: Super Micro Computer Inc Unknown device 1096 Flags: bus master, fast devsel, latency 0, IRQ 186 Memory at d8060000 (32-bit, non-prefetchable) [size=128K] Memory at d8040000 (32-bit, non-prefetchable) [size=128K] I/O ports at 2020 [size=32] [virtual] Expansion ROM at d8310000 [disabled] [size=64K] Capabilities: [c8] Power Management version 2 Capabilities: [d0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable+ Capabilities: [e0] Express Endpoint IRQ 0 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 1a-45-7f-ff-ff-48-30-00 I'm backing off to 2.6.18-128.1.6.el5 for now.
There were no changes specifically to e1000e between -128.1.6 and -128.1.10, so this is a bit odd. Is there a particular ring size between 1500 and 8982 that seems to work as close to 100% of the time as far as you can tell? I'd be curious how consistently exactly 4000 or 8000 worked.
Also seeing: e1000e: probe of 0000:05:00.1 failed with error -2 and no presence of eth1 at all. I'll try 1500 a bit and see if that makes any difference.
Has 5.4 been tried and does it resolve this problem?
Several errors related to the system PHY that produced failure like this: e1000e: probe of 0000:04:00.1 failed with error -2 were fixed in RHEL5.5. There were a few other times when we have seen this error that were fixed with BIOS updates. Please update to the latest kernel and re-open if that is still broken.