Bug 75267

Summary: Tigon3 (3C996B-T) NIC does not start properly
Product: [Retired] Red Hat Linux Reporter: petr
Component: kernelAssignee: Jeff Garzik <jgarzik>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 8.0CC: anne.possoz, davem, ola, peterm
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-03-04 20:14:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description petr 2002-10-06 15:49:26 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.2b) Gecko/20020921

Description of problem:
Tigon3 (3COM 3C996B-T) network interface card goes up and down during startup.
It means the netwok is not initialized.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. boot the Red Hat Linux 8.0.


Actual Results:  Boot screen displays:
Determining IP information for eth0... failed; no link present. Check cable?

In the log can be seen:
Oct  6 17:03:09 petr6 kernel: eth0: Tigon3 [partno(3C996B-T) rev 0105 PHY(5701)]
(PCI:33MHz:32-bit) 10/100/1000BaseT Ethernet 00:04:76:dd:34:67
Oct  6 17:03:09 petr6 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
Oct  6 17:03:09 petr6 kernel: tg3: eth0: Flow control is on for TX and on for RX.
Oct  6 17:03:09 petr6 kernel: tg3: eth0: Link is down.
Oct  6 17:03:03 petr6 ifup: Determining IP information for eth0... 
Oct  6 17:03:08 petr6 network: Bringing up interface eth0:  failed 




Expected Results:  I'd expect to initialize the card correctly

Additional info:

When I do rmmod tg3 and modprobe tg3 later, everything works smoothly:
Oct  6 17:12:20 petr6 kernel: tg3: eth0: Link is up at 1000 Mbps, full duplex.
Oct  6 17:12:20 petr6 kernel: tg3: eth0: Flow control is on for TX and on for RX.
Oct  6 17:12:27 petr6 dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67

I have on the same system (multi-boot) Red Hat 6.2, 7.3, W98, W2K, WXP and no
problem so far. This is what I see in Red Hat Linux 6.2 with bcm5700 driver:

Oct  4 18:00:45 petr6 kernel: eth0: 3Com 3C996B Gigabit Server NIC found at mem
fea70000, IRQ 10, node addr 000476dd3467 
Oct  4 18:00:45 petr6 kernel: eth0: Broadcom BCM5701 Integrated Copper
transceiver found 
Oct  4 18:00:45 petr6 kernel: eth0: Rx Checksum ON 
Oct  4 18:00:45 petr6 kernel: bcm5700: eth0 NIC Link is UP, 1000 Mbps full duplex 
Oct  4 18:00:45 petr6 kernel: bcm5700: eth0 NIC Link is Down 
Oct  4 18:00:45 petr6 kernel: bcm5700: eth0 NIC Link is Up, 1000 Mbps full duplex 
Oct  4 18:00:32 petr6 ifup: Determining IP information for eth0... 
Oct  4 18:00:44 petr6 network: Bringing up interface eth0 succeeded 
Oct  4 18:00:47 petr6 kernel: bcm5700: eth0 NIC Link is UP, 1000 Mbps full duplex 
Oct  4 18:00:48 petr6 kernel: bcm5700: eth0 NIC Link is Down 
Oct  4 18:00:50 petr6 kernel: bcm5700: eth0 NIC Link is Up, 1000 Mbps full duplex

Comment 1 Arjan van de Ven 2002-10-07 08:00:53 UTC
could you try increasing the timeout (say double) in the code that tries to find
link?
This is in the "sleep command in file
/etc/sysconfig/network-scripts/network-functions on line 136 (the
check_link_down function)

Comment 2 petr 2002-10-07 08:34:20 UTC
OK, I increased sleep 5 to sleep 10 and now 8 of 10 boots are successful, 2
unsuccessful. Probably further increasing the delay would help, but it needs
time for testing. But the delay seems to be much longer than in Red Hat Linux
6.2 or 7.3.


Comment 3 ola 2002-10-15 14:56:11 UTC
We also have similar problems with the tg3 driver.
Even if the IF comes up, it goes down again in a couple of hours.

System: IBM x305
NIC: Built-in Broadcom.


Oct 13 19:46:29 ns kernel: tg3.c:v1.0 (Jul 19, 2002)
Oct 13 19:46:29 ns kernel: tg3: eth0: Link is up at 100 Mbps, full duplex.
Oct 13 19:46:29 ns kernel: tg3: eth0: Flow control is on for TX and on for RX.
Oct 13 19:46:29 ns kernel: tg3: eth1: Link is up at 100 Mbps, full duplex.
Oct 13 19:46:29 ns kernel: tg3: eth1: Flow control is on for TX and on for RX.
Oct 13 19:46:29 ns kernel: tg3: eth1: Link is down.
Oct 13 19:46:29 ns kernel: tg3: eth0: Link is down.
Oct 13 19:46:29 ns kernel: tg3: eth1: Link is up at 100 Mbps, full duplex.
Oct 13 19:46:29 ns kernel: tg3: eth1: Flow control is on for TX and on for RX.
Oct 13 19:46:29 ns kernel: tg3: eth0: Link is up at 100 Mbps, full duplex.
Oct 13 19:46:29 ns kernel: tg3: eth0: Flow control is on for TX and on for RX.
Oct 15 11:59:27 ns kernel: tg3: eth0: Link is down.
Oct 15 12:34:06 ns kernel: tg3: eth1: Link is down.
Oct 15 12:34:09 ns kernel: tg3: eth1: Link is up at 100 Mbps, full duplex.
Oct 15 12:34:09 ns kernel: tg3: eth1: Flow control is off for TX and off for RX.
Oct 15 12:51:59 ns kernel: tg3: eth1: Link is up at 100 Mbps, full duplex.
Oct 15 12:51:59 ns kernel: tg3: eth1: Flow control is off for TX and off for RX.
Oct 15 12:51:59 ns kernel: tg3: eth1: Link is down.
Oct 15 12:52:02 ns kernel: tg3: eth1: Link is up at 10 Mbps, half duplex.
Oct 15 12:52:02 ns kernel: tg3: eth1: Flow control is off for TX and off for RX.
Oct 15 16:08:44 ns kernel: tg3.c:v1.0 (Jul 19, 2002)
Oct 15 16:08:46 ns kernel: tg3: eth1: Link is up at 100 Mbps, full duplex.
Oct 15 16:08:46 ns kernel: tg3: eth1: Flow control is off for TX and off for RX.
Oct 15 16:08:47 ns kernel: tg3: eth1: Link is down.
Oct 15 16:08:49 ns kernel: tg3: eth1: Link is up at 10 Mbps, half duplex.
Oct 15 16:08:49 ns kernel: tg3: eth1: Flow control is off for TX and off for RX.


Comment 4 ola 2002-10-15 15:10:48 UTC
Same problem on RH 7.3 with stock and latest errata kernel.
I also suggest you upprade the priority and take IBM x305 (and others with the
same gigabit chip) off the certified hardware list.


Comment 5 Arjan van de Ven 2002-10-18 11:00:00 UTC
the erratum kernel we released yesterday has major tg3 updates. do they fix the
issue for you?

Comment 6 petr 2002-10-18 13:17:39 UTC
kernel-2.4.18-17.8.0.i686.rpm - behavior changed, but it is worse. With sleep 10
the network won't start up as with previous kernel, it is necessary to increase
it at least to sleep 15. 
This is valid for reboot the whole linux, /etc/init.d/network stop and start
works correctly even with sleep 1.


Comment 7 petr 2002-10-19 19:45:58 UTC
I tried to replace tg3 driver by bcm5700 driver, decreased timeout to the
original value 5 seconds, and everything works perfectly.
Maybe tg3 driver should be removed from the autoconfiguarion?
And what's the difference, why there are 2 drivers for the same NIC?

In 2.2.30 version of Broadcom bcm5700 driver, there is written following:

Note 2: If loading the driver on Red Hat 7.3, Red Hat 2.1 AS, and other newer
Red Hat kernels and patches, it is necessary to unload the tg3 driver first if
it is loaded. While tg3 is a fully functioning driver written by Red Hat et al,
Broadcom recommends users to use the bcm5700 driver written and tested by
Broadcom. Use ifconfig to bring down all eth# interfaces used by tg3 and do
the following to unload the tg3 driver:

rmmod tg3

It may also be necessary to manually edit the file /etc/modules.conf to
change interface alias names from tg3 to bcm5700. Example:

alias eth0 tg3

Replace tg3 with bcm5700:

alias eth0 bcm5700

And it is true, at least in my case bcm5700 behaves much better than tg3.


Comment 8 David Miller 2002-10-19 23:34:34 UTC
Actually, it's perfectly valid for a link to take 10 to 15
seconds to come up.  It's very hardware and driver dependant
how long this process takes to complete.

So the real bug is in the machanisms used by the dhcp client
we ship currently.  We have to wait until it actually completes
the query to the dhcp server, and this means waiting for however
long is the longest a link could take to come up.

A proper dhcp client implementation would work asynchronously
and therefore work regardless of how long a link state would
take to arrive.

The tg3 driver is just fine, it works it just takes longer to
bring the link up which is perfectly valid behavior.

As a side note it is unfortunate that after Broadcom agreed to help
us with the tg3 driver, which we will continue to ship and use by
default, they instead recommend de-installing our driver in their
documentation.  Thanks for bringing this to our attention, it was
news to us.



Comment 9 Jeff Garzik 2003-01-20 20:57:13 UTC
Ok, some of these reports have actually been fixed in more recently posted rpms.

Just to get everybody on the latest page, please use "aragorn2" test rpms,
posted at http://people.redhat.com/jgarzik/pub/

This is the latest Red Hat errata kernel for 7.x/8.x, with the recent tg3 bug fixes.