Bug 103267 - tg3 locks up under heavy traffic, recovers
tg3 locks up under heavy traffic, recovers
Product: Red Hat Linux
Classification: Retired
Component: kernel (Show other bugs)
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Arjan van de Ven
Depends On:
  Show dependency treegraph
Reported: 2003-08-28 07:15 EDT by Jan Iven
Modified: 2007-04-18 12:57 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2004-09-30 11:41:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Jan Iven 2003-08-28 07:15:36 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030703

Description of problem:
Under intensive traffic on dual 2.4GHz XEON, the tg3 driver locks up after
~90minutes. We get messages like

NETDEV WATCHDOG: eth0: transmit timed out
tg3: eth0: transmit timed out, resetting
tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2

with long enough timeouts that NFS/AFS connections get timed out. The system
eventually recovers by itself (which is good, btw).

Version-Release number of selected component (if applicable):
2.4.20-18.7.cernsmp (recompiled at CERN), tg3.c: 1.5

How reproducible:

Steps to Reproduce:
1. criss-cross traffic inside a farm is enough to produce this. We use
memory-to-memory transfers for tests.


Additional info:

I have tried this as well with the tg3 driver from the current "severn" beta
(tg3.c: 1.6), recompiled inside the 2.4.20-18 kernel. Same problem, same
frequency. Will try to repeat with "severn" next week.

# lspci -vvv -s 03:01.0
03:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit
Ethernet (rev 12)
        Subsystem: 3Com Corporation 3C996-T 1000BaseTX
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
        Latency: 64 (16000ns min), cache line size 08
        Interrupt: pin A routed to IRQ 48
        Region 0: Memory at fc200000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [40] PCI-X non-bridge device.
                Command: DPERE- ERO- RBC=0 OST=0
                Status: Bus=0 Dev=0 Func=0 64bit- 133MHz- SCD- USC-, DC=simple,
        Capabilities: [48] Power Management version 2
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
                Status: D0 PME-Enable+ DSel=0 DScale=1 PME-
        Capabilities: [50] Vital Product Data
        Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-
                Address: 08d90c80d0a44488  Data: 0002
Comment 1 Jan Iven 2003-08-29 09:53:12 EDT
FYI, the bcm5700 driver does NOT work in this case, it falls over completely
(kernel panic).
Comment 2 Jan Iven 2003-09-12 07:36:04 EDT
Next try, this time with the 2.4.21-20.1.2024.2.1.nptlsmp kernel from
severn-beta1. Lockup is a lot quicker, first machine down after 30seconds or so. 

I have tried with "tg3_debug=0x7fff" (and reloading the driver), doesn't seem to
make any difference in terms of verbosity.

Comment 3 Robert Binz 2004-09-14 16:46:13 EDT
I was getting the same error with a Sun V20z server with RHES 3 loaded
usign the tg3 driver.  I have installed the BCM  linux driver v 7.3.5
from http://www.broadcom.com/drivers/downloaddrivers.php and then used
the following to set my network card:
ethtool -s eth0 speed 100 duplex full autoneg off
(The cisco switch is set to 100baseT-FD)

With these two changes, I have been able to perform a 36 hour 495
concurrent user with no erros.
Comment 4 Bugzilla owner 2004-09-30 11:41:29 EDT
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/

Note You need to log in before you can comment on or make changes to this bug.