Bug 204916 - tg3: Could not obtain valid ethernet address, aborting.
Summary: tg3: Could not obtain valid ethernet address, aborting.
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.4
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: John W. Linville
QA Contact: Brian Brock
URL:
Whiteboard:
Keywords:
: 216871 (view as bug list)
Depends On:
Blocks: 229570
TreeView+ depends on / blocked
 
Reported: 2006-09-01 11:53 UTC by Juanjo Villaplana
Modified: 2018-10-19 21:15 UTC (History)
6 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2007-02-13 19:02:33 UTC


Attachments (Terms of Use)
/var/log/messages (31.13 KB, application/octet-stream)
2006-09-01 11:53 UTC, Juanjo Villaplana
no flags Details
Console log (11.75 KB, text/plain)
2006-09-13 07:13 UTC, Juanjo Villaplana
no flags Details
Console log (9.98 KB, text/plain)
2006-09-13 07:27 UTC, Juanjo Villaplana
no flags Details

Description Juanjo Villaplana 2006-09-01 11:53:12 UTC
Description of problem:

After installing an HP Proliant DL380 G3 with RHEL4 U4 initialization of eth0
(first integrated BCM5703X) fails intermittently.

Kernel messages for a successful startups looks like:

Aug 31 13:02:29 test02 kernel: tg3.c:v3.52-rh (Mar 06, 2006)
Aug 31 13:02:29 test02 kernel: ACPI: PCI interrupt 0000:02:01.0[A] -> GSI 29
(level, low) -> IRQ 193
Aug 31 13:02:29 test02 kernel: eth0: Tigon3 [partno(TBD) rev 1002 PHY(5703)]
(PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:0b:cd:69:ee:79
Aug 31 13:02:29 test02 kernel: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0]
Split[0] WireSpeed[1] TSOcap[1]
Aug 31 13:02:29 test02 kernel: eth0: dma_rwctrl[769f4000]
Aug 31 13:02:29 test02 kernel: ACPI: PCI interrupt 0000:02:02.0[A] -> GSI 31
(level, low) -> IRQ 201
Aug 31 13:02:29 test02 kernel: eth1: Tigon3 [partno(TBD) rev 1002 PHY(5703)]
(PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:0b:cd:69:ee:78
Aug 31 13:02:29 test02 kernel: eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0]
Split[0] WireSpeed[1] TSOcap[1]
Aug 31 13:02:29 test02 kernel: eth1: dma_rwctrl[769f4000]

but sometimes fails with the following kernel messages:

Aug 31 13:21:26 test02 kernel: tg3.c:v3.52-rh (Mar 06, 2006)
Aug 31 13:21:26 test02 kernel: ACPI: PCI interrupt 0000:02:01.0[A] -> GSI 29
(level, low) -> IRQ 193
Aug 31 13:21:26 test02 kernel: tg3: Could not obtain valid ethernet address,
aborting.
Aug 31 13:21:26 test02 kernel: tg3: probe of 0000:02:01.0 failed with error -22
Aug 31 13:21:26 test02 kernel: ACPI: PCI interrupt 0000:02:02.0[A] -> GSI 31
(level, low) -> IRQ 201
Aug 31 13:21:26 test02 kernel: eth0: Tigon3 [partno(TBD) rev 1002 PHY(5703)]
(PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:0b:cd:69:ee:78
Aug 31 13:21:26 test02 kernel: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0]
Split[0] WireSpeed[1] TSOcap[1]
Aug 31 13:21:26 test02 kernel: eth0: dma_rwctrl[769f4000]
Aug 31 13:21:26 test02 kernel: ACPI: PCI interrupt 0000:00:0f.2[A] -> GSI 10
(level, low) -> IRQ 10

and the initialization of eth0 fails because now its the second integrated NIC:

ifup: Device eth0 has different MAC address than expected, ignoring.


Version-Release number of selected component (if applicable):

kernel-smp-2.6.9-42.EL and kernel-smp-2.6.9-42.0.2.EL


How reproducible:

Reboot the server several times and get a success, failure, success, failure
sequence.

Steps to Reproduce:
1. Reboot the server
2. Watch the network startup and /var/log/messages
3.
  
Actual results:

eth0 is not always initialized.


Expected results:

eth0 should be always initialized

Additional info:

We have tried tg3.ko from kernel-smp-2.6.9-34.0.2.EL and bcm5700-8.3.17c-1
provided by HP and they work fine.

Comment 1 Juanjo Villaplana 2006-09-01 11:53:12 UTC
Created attachment 135373 [details]
/var/log/messages

Comment 2 John W. Linville 2006-09-12 12:00:28 UTC
A later tg3 update is available in the test kernels here:

   http://people.redhat.com/linville/kernels/rhel4/

Please give those a try and post the results here...thanks!

Comment 3 Juanjo Villaplana 2006-09-13 07:13:26 UTC
Created attachment 136138 [details]
Console log

Hi John,

I was unable to boot the server with this test kernel, as you will see in the
attached console log, the cciss driver didn't initialize correctly.

Comment 4 Juanjo Villaplana 2006-09-13 07:27:47 UTC
Created attachment 136139 [details]
Console log

This is the console log for a successful boot with 2.6.9-42.0.2.EL.

The issue seems to be related to PCI interrupts. This is the lspci output, hope
this will help:

00:00.0 Host bridge: Broadcom CMIC-WS Host Bridge (GC-LE chipset) (rev 13)
00:00.1 Host bridge: Broadcom CMIC-WS Host Bridge (GC-LE chipset)
00:00.2 Host bridge: Broadcom CMIC-LE
00:03.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:04.0 System peripheral: Compaq Computer Corporation Integrated Lights Out
Controller (rev 01)
00:04.2 System peripheral: Compaq Computer Corporation Integrated Lights Out 
Processor (rev 01)
00:0f.0 ISA bridge: Broadcom CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: Broadcom CSB5 IDE Controller (rev 93)
00:0f.2 USB Controller: Broadcom OSB4/CSB5 OHCI USB Controller (rev 05)
00:0f.3 Host bridge: Broadcom CSB5 LPC bridge
00:10.0 Host bridge: Broadcom CIOB-X2 PCI-X I/O Bridge (rev 05)
00:10.2 Host bridge: Broadcom CIOB-X2 PCI-X I/O Bridge (rev 05)
00:11.0 Host bridge: Broadcom CIOB-X2 PCI-X I/O Bridge (rev 05)
00:11.2 Host bridge: Broadcom CIOB-X2 PCI-X I/O Bridge (rev 05)
01:03.0 RAID bus controller: Compaq Computer Corporation Smart Array 5i/532
(rev 01)
02:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit
Ethernet (rev 02)
02:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit
Ethernet (rev 02)
06:01.0 RAID bus controller: Compaq Computer Corporation Smart Array 5i/532
(rev 01)
06:1e.0 PCI Hot-plug controller: Compaq Computer Corporation PCI Hotplug
Controller (rev 14)

Comment 5 Andy Gospodarek 2006-10-09 16:58:46 UTC
Please test the kernels listed here:

http://people.redhat.com/agospoda/#rhel4

and let me know if they resolve the issue.  You *may* encounter the same problem
with this build that you had with Linville's, so if you do see instructions
below for rebuilding it:

http://kbase.redhat.com/faq/FAQ_80_4969.shtm

Comment 7 Juanjo Villaplana 2006-10-13 09:18:25 UTC
This kernel works fine.

Comment 8 Chris Verhoef 2006-10-13 13:53:11 UTC
Had the same problem with a HP workstation xw6000

05:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5702X Gigabit
Ethernet (rev 02)

Tried the 2.6.9-42.15.EL.gsstest.100320060 kernel. This also works fine.


Comment 10 Chuck Berg 2006-11-02 23:34:04 UTC
I had the same problem on an HP DL380 G3 running update 4. But 2.6.9-42.22.EL
works. I noticed this after a re-install to upgrade from RHEL3.

A different DL380 G3, running 2.4.21-15.ELsmp (upgrade to -47 did not help), had
the same problem. But it was both interfaces, and worked after a cold boot. I
switched to HP's bcm5700 on this machine.

I have other DL380s that do not have the same problem. These two trouble
machines were fine for a couple years (and many reboots).

Is it known why this happens? Should I expect my DL380s to come back up without
networking at any random reboot?

Comment 11 Ettore Virzi 2006-11-14 17:29:25 UTC
Had the same problem with a HP DL380 G3 RHEL4.4 in production.
When will it be fixed in the distributed RH kernel?

It has completely blocked our RH cluster and I don't want to recompile all the
cluster modules (GFS & c.) for each test kernel I use.



Comment 12 John W. Linville 2006-11-14 17:59:51 UTC
Did you test any of the kernels in the previous comments?  Do they fix the 
issues you are seeing?

Comment 13 Ettore Virzi 2006-11-20 09:56:39 UTC
Yes 2.6.9-42.15.EL.gsstest.100320060 kernel is ok
No problem with it
Thanks

Comment 14 John W. Linville 2006-12-05 15:13:44 UTC
*** Bug 216871 has been marked as a duplicate of this bug. ***

Comment 16 Dennixx 2007-01-04 20:16:16 UTC
Any idea when a fixed errata kernel will be released?


Note You need to log in before you can comment on or make changes to this bug.