Description of problem: After installing an HP Proliant DL380 G3 with RHEL4 U4 initialization of eth0 (first integrated BCM5703X) fails intermittently. Kernel messages for a successful startups looks like: Aug 31 13:02:29 test02 kernel: tg3.c:v3.52-rh (Mar 06, 2006) Aug 31 13:02:29 test02 kernel: ACPI: PCI interrupt 0000:02:01.0[A] -> GSI 29 (level, low) -> IRQ 193 Aug 31 13:02:29 test02 kernel: eth0: Tigon3 [partno(TBD) rev 1002 PHY(5703)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:0b:cd:69:ee:79 Aug 31 13:02:29 test02 kernel: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] Aug 31 13:02:29 test02 kernel: eth0: dma_rwctrl[769f4000] Aug 31 13:02:29 test02 kernel: ACPI: PCI interrupt 0000:02:02.0[A] -> GSI 31 (level, low) -> IRQ 201 Aug 31 13:02:29 test02 kernel: eth1: Tigon3 [partno(TBD) rev 1002 PHY(5703)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:0b:cd:69:ee:78 Aug 31 13:02:29 test02 kernel: eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] Aug 31 13:02:29 test02 kernel: eth1: dma_rwctrl[769f4000] but sometimes fails with the following kernel messages: Aug 31 13:21:26 test02 kernel: tg3.c:v3.52-rh (Mar 06, 2006) Aug 31 13:21:26 test02 kernel: ACPI: PCI interrupt 0000:02:01.0[A] -> GSI 29 (level, low) -> IRQ 193 Aug 31 13:21:26 test02 kernel: tg3: Could not obtain valid ethernet address, aborting. Aug 31 13:21:26 test02 kernel: tg3: probe of 0000:02:01.0 failed with error -22 Aug 31 13:21:26 test02 kernel: ACPI: PCI interrupt 0000:02:02.0[A] -> GSI 31 (level, low) -> IRQ 201 Aug 31 13:21:26 test02 kernel: eth0: Tigon3 [partno(TBD) rev 1002 PHY(5703)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:0b:cd:69:ee:78 Aug 31 13:21:26 test02 kernel: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] TSOcap[1] Aug 31 13:21:26 test02 kernel: eth0: dma_rwctrl[769f4000] Aug 31 13:21:26 test02 kernel: ACPI: PCI interrupt 0000:00:0f.2[A] -> GSI 10 (level, low) -> IRQ 10 and the initialization of eth0 fails because now its the second integrated NIC: ifup: Device eth0 has different MAC address than expected, ignoring. Version-Release number of selected component (if applicable): kernel-smp-2.6.9-42.EL and kernel-smp-2.6.9-42.0.2.EL How reproducible: Reboot the server several times and get a success, failure, success, failure sequence. Steps to Reproduce: 1. Reboot the server 2. Watch the network startup and /var/log/messages 3. Actual results: eth0 is not always initialized. Expected results: eth0 should be always initialized Additional info: We have tried tg3.ko from kernel-smp-2.6.9-34.0.2.EL and bcm5700-8.3.17c-1 provided by HP and they work fine.
Created attachment 135373 [details] /var/log/messages
A later tg3 update is available in the test kernels here: http://people.redhat.com/linville/kernels/rhel4/ Please give those a try and post the results here...thanks!
Created attachment 136138 [details] Console log Hi John, I was unable to boot the server with this test kernel, as you will see in the attached console log, the cciss driver didn't initialize correctly.
Created attachment 136139 [details] Console log This is the console log for a successful boot with 2.6.9-42.0.2.EL. The issue seems to be related to PCI interrupts. This is the lspci output, hope this will help: 00:00.0 Host bridge: Broadcom CMIC-WS Host Bridge (GC-LE chipset) (rev 13) 00:00.1 Host bridge: Broadcom CMIC-WS Host Bridge (GC-LE chipset) 00:00.2 Host bridge: Broadcom CMIC-LE 00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 00:04.0 System peripheral: Compaq Computer Corporation Integrated Lights Out Controller (rev 01) 00:04.2 System peripheral: Compaq Computer Corporation Integrated Lights Out Processor (rev 01) 00:0f.0 ISA bridge: Broadcom CSB5 South Bridge (rev 93) 00:0f.1 IDE interface: Broadcom CSB5 IDE Controller (rev 93) 00:0f.2 USB Controller: Broadcom OSB4/CSB5 OHCI USB Controller (rev 05) 00:0f.3 Host bridge: Broadcom CSB5 LPC bridge 00:10.0 Host bridge: Broadcom CIOB-X2 PCI-X I/O Bridge (rev 05) 00:10.2 Host bridge: Broadcom CIOB-X2 PCI-X I/O Bridge (rev 05) 00:11.0 Host bridge: Broadcom CIOB-X2 PCI-X I/O Bridge (rev 05) 00:11.2 Host bridge: Broadcom CIOB-X2 PCI-X I/O Bridge (rev 05) 01:03.0 RAID bus controller: Compaq Computer Corporation Smart Array 5i/532 (rev 01) 02:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit Ethernet (rev 02) 02:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit Ethernet (rev 02) 06:01.0 RAID bus controller: Compaq Computer Corporation Smart Array 5i/532 (rev 01) 06:1e.0 PCI Hot-plug controller: Compaq Computer Corporation PCI Hotplug Controller (rev 14)
Please test the kernels listed here: http://people.redhat.com/agospoda/#rhel4 and let me know if they resolve the issue. You *may* encounter the same problem with this build that you had with Linville's, so if you do see instructions below for rebuilding it: http://kbase.redhat.com/faq/FAQ_80_4969.shtm
This kernel works fine.
Had the same problem with a HP workstation xw6000 05:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5702X Gigabit Ethernet (rev 02) Tried the 2.6.9-42.15.EL.gsstest.100320060 kernel. This also works fine.
I had the same problem on an HP DL380 G3 running update 4. But 2.6.9-42.22.EL works. I noticed this after a re-install to upgrade from RHEL3. A different DL380 G3, running 2.4.21-15.ELsmp (upgrade to -47 did not help), had the same problem. But it was both interfaces, and worked after a cold boot. I switched to HP's bcm5700 on this machine. I have other DL380s that do not have the same problem. These two trouble machines were fine for a couple years (and many reboots). Is it known why this happens? Should I expect my DL380s to come back up without networking at any random reboot?
Had the same problem with a HP DL380 G3 RHEL4.4 in production. When will it be fixed in the distributed RH kernel? It has completely blocked our RH cluster and I don't want to recompile all the cluster modules (GFS & c.) for each test kernel I use.
Did you test any of the kernels in the previous comments? Do they fix the issues you are seeing?
Yes 2.6.9-42.15.EL.gsstest.100320060 kernel is ok No problem with it Thanks
*** Bug 216871 has been marked as a duplicate of this bug. ***
Any idea when a fixed errata kernel will be released?