Bug 116704

Summary: (NET TG3) Broadcom Ethernet Adapter Locks Up
Product: Red Hat Enterprise Linux 3 Reporter: Mark Donner <mark>
Component: kernelAssignee: David Miller <davem>
Status: CLOSED WONTFIX QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 3.0CC: bammi, davem, jgarzik, john, mvoelker, petrides, tofranci
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-10-19 19:29:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mark Donner 2004-02-24 15:31:24 UTC
Description of problem: 
The system locks up when using the onboard Broadcom 5704 Gigabit 
Ethernet adapters of a Tyan S2880 Thunder K8S Motherboard.  The 
adapters are recognized by the system during the install (or with 
kudzu later) and load the tg3 adapter.  They become activated but 
within about 10 minutes, the system locks up and has to rebooted.  
Several versions of the Broadcom drivers up to and including version 
7.1.22 have been compiled and installed on the system.  This driver 
is the bcm5700.  When the system is changed to use this driver, the 
system locks up solid when activating eth0 during boot up.  Same 
result with many previous versions of the driver.  This also happens 
with RedHat 9.  The system has 4 GB of RAM and is using a 3Ware SATA 
controller as the on-board Promise SATA was not recognized by any 
installation of RedHat since version 8.


Version-Release number of selected component (if applicable):


How reproducible: Fails every time


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Phil Knirsch 2004-03-05 14:42:32 UTC
Most likely a kernel driver problem, reassigning to correct component.

Read ya, Phil

Comment 2 Mark T. Voelker 2004-07-07 19:52:02 UTC
We have seen this same issue with tg3 on a Tyan s2882 motherboard,
which also uses the Broadcom BCM5704.  We saw the box completely
freeze up (Magic Sysreq wasn't able to do anything) at apparently
random intervals for no apparent reason.  No error messages on screen
or /var/log/messages or dmesg or anywhere else that we could see. 
Usually we were able to crash the box within an hour or two by SSHing
from the aflicted machine to another server and running some commands
that produce a lot of output.  The network load was fairly low on the
BCM5704 though, so I'm not sure why this would trigger the freeze up.
 After switching from the tg3 driver to the bcm5700 driver from
Broadcom the crashes went away.  We understand, of course, that Red
Hat does not support the bcm5700 driver and would rather put more
effort into fixing the tg3 driver than have to support a vendor-built
driver.  But until this bug is fixed it's the only way we can keep the
server alive.

Comment 3 Tymm Twillman 2004-07-26 18:47:48 UTC
I believe we've now run into this issue as well, in addition to the
autonegotiation problems with the tg3 drivers / Cisco switches
mentioned in other reports.  Wondering if the bcm5700 drivers are
still the best solution, and if anyone has had problems with them
under RHEL3 U2... 

Comment 4 Mark T. Voelker 2004-07-26 19:33:52 UTC
Tymm--we tried U2 and still had the same problems, so we're still
using the bcm5700 drivers for now.

Comment 5 Tom Rockwell 2004-08-09 04:18:47 UTC
I have a system with a Tyan K8W connected to a 100Mb network.  The
system hangs requiring a power cycle to recover with heavy network
traffic.  I've tried the tg3 and Broadcom's drivers and various Linux
kernel versions.  It hangs repeatably under very heavy network load -
for instance running bonnie on an NFS mounted directory.  Otherwise
the system seems ok.   

However, the system also hangs under high network load when running
the Windows 2003 64 bit beta.

I'm planning to RMA the motherboard.

Comment 6 Ray Van Dolson 2004-10-19 14:37:07 UTC
We're running quite a few HP DL140-based servers that all have the
Broadcom Gigabit chipset in them.  These servers are running as PPTP
Concentrators using PoPToP.  All of them were mysteriously locking up
randomly and I couldn't figure it out... someone mentioned a problem
with Broadcom NIC's and recommended either replacing the NIC's with
Intel's (not a great option in 1U servers) or disabling apic.  Our
problem seems to go away when we boot with nosmp noapic ... this is
using the bcm5700 driver which I grabbed from HP's website.

Using 2.4.21-20ELsmp (w/ MPPE patch from PPP 2.4.3)

Comment 7 Jeff Garzik 2005-02-18 07:40:52 UTC
Is this still happening with the tg3 driver, in RHEL3?


Comment 8 Alan Jay 2005-10-29 08:21:49 UTC
Is this still a problem with RHEL4 or has the bug been fixed for RHELv3.

We are just installing machines with the Tyan S2882 motherboard and two 
Broadcomm gigabit adapters.  The tg3 driver appears to work in normal use but 
when we load the ethernet port the machine hangs without any error messages and 
we wondered if we were seeing this bug?

Comment 9 Steve Netting 2005-11-29 16:34:06 UTC
We're experiencing identical problems here with HP DL-350's and RHEL4 (latest
kernel/patches).

Any update on this issue?   

Comment 10 Jwahar Bammi 2006-02-02 18:52:46 UTC
I am seeing the same problem on a Shuttle box, with the same Broadcom adapter
onboard with Fedora Core 5 Test2 (i never hit it in Fedora Core 4, fc4 had the
other problem with broadcom, when the buffer was allocated above the 1Gig
address range).

Comment 11 Paul Schubert 2006-10-03 01:32:21 UTC
I am seeing a similar problem on FC4 kernel 2.6.17-1.2142_FC4smp.  This 
machine has a Gigabyte motherboard with on-board Broadcom 57xx GigE NIC.  I 
only get the problem when using BitTorrent.  I have tried different BT clients 
and they all exhibit the same behaviour.  The eth0 port just locks up every 
few minutes then resets itself. The machine is cabled into a Cisco switch.  I 
have changed switchport and cables, tried 100baseT - all make zero 
difference.  Broadcom sent me an ISO of a boot CD with FreeDOS and some diags 
which I ran and everything passed ok.  They have suggested maybe an interrupt 
conflict with PCI RAID card.

Sample log:
Oct  2 10:02:15 localhost kernel: NETDEV WATCHDOG: eth0: transmit timed out
Oct  2 10:02:15 localhost kernel: tg3: eth0: transmit timed out, resetting
Oct  2 10:02:15 localhost kernel: tg3: tg3_stop_block timed out, ofs=2c00 
enable
_bit=2
Oct  2 10:02:15 localhost kernel: tg3: tg3_stop_block timed out, ofs=1400 
enable
_bit=2
Oct  2 10:02:15 localhost kernel: tg3: tg3_stop_block timed out, ofs=c00 
enable_
bit=2
Oct  2 10:02:15 localhost kernel: tg3: eth0: Link is down.
Oct  2 10:02:19 localhost kernel: tg3: eth0: Link is up at 1000 Mbps, full 
duple
x.
Oct  2 10:02:19 localhost kernel: tg3: eth0: Flow control is off for TX and off
for RX.


Comment 12 RHEL Program Management 2007-10-19 19:29:52 UTC
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.