Bug 123218 - (IT_47194) RHEL3_U4 kernel reports tg3_stop_block timed out and network interface stops responding
RHEL3_U4 kernel reports tg3_stop_block timed out and network interface stops ...
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
medium Severity high
: ---
: ---
Assigned To: John W. Linville
:
Depends On:
Blocks: 132991
  Show dependency treegraph
 
Reported: 2004-05-13 17:41 EDT by Matthew Crawford
Modified: 2007-11-30 17:07 EST (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-05-11 16:26:31 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matthew Crawford 2004-05-13 17:41:25 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/124 (KHTML, like Gecko) Safari/125.1

Description of problem:

On a running HP G3 DL380 at a random period of time the ethernet interface using the tg3 driver stops responding. Upon restarting the interface (service network restart) network service returns. When the ethernet interface stops to respond the following message is recorded over and over again by syslog: 

May 13 21:03:52 mail3 kernel: NETDEV WATCHDOG: eth0: transmit timed out
May 13 21:03:52 mail3 kernel: tg3: eth0: transmit timed out, resetting
May 13 21:03:52 mail3 kernel: tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
May 13 21:03:53 mail3 kernel: tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
May 13 21:03:53 mail3 kernel: tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
May 13 21:03:53 mail3 kernel: tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2


Below is ther details of the interfaces in the messages files upon startup:

May 13 21:14:38 mail3 kernel: tg3.c:v2.3 (November 5, 2003)
May 13 21:14:38 mail3 kernel: eth0: Tigon3 [partno(NA) rev 1002 PHY(5703)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:0e:7f:21:0e:4d
May 13 21:14:38 mail3 kernel: eth1: Tigon3 [partno(NA) rev 1002 PHY(5703)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:0e:7f:21:0e:4c
May 13 21:14:38 mail3 kernel: tg3: eth0: Link is up at 100 Mbps, full duplex.
May 13 21:14:38 mail3 kernel: tg3: eth0: Flow control is off for TX and off for RX.
May 13 21:14:38 mail3 kernel: tg3: eth1: Link is up at 100 Mbps, full duplex.
May 13 21:14:38 mail3 kernel: tg3: eth1: Flow control is off for TX and off for RX.

Version-Release number of selected component (if applicable):
kernel-smp-2.4.21-9.0.1.EL

How reproducible:
Sometimes

Steps to Reproduce:
1. Have system that requires tg3 network driver 
2. Load traffic on interface
3. Wait some time (2h-5d) for interface to stop responding
    

Actual Results:  The messages file reports the following error messages over and over again from the tg3 driver and the interface stops responding until restart is performed.

May 13 21:03:52 mail3 kernel: NETDEV WATCHDOG: eth0: transmit timed out
May 13 21:03:52 mail3 kernel: tg3: eth0: transmit timed out, resetting
May 13 21:03:52 mail3 kernel: tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
May 13 21:03:53 mail3 kernel: tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
May 13 21:03:53 mail3 kernel: tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
May 13 21:03:53 mail3 kernel: tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2

Expected Results:  The interface should never just stop responding.

Additional info:


HP G3 DL380 (3.06G Dual Proc).

May 13 21:14:38 mail3 kernel: tg3.c:v2.3 (November 5, 2003)
May 13 21:14:38 mail3 kernel: eth0: Tigon3 [partno(NA) rev 1002 PHY(5703)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:0e:7f:21:0e:4d
May 13 21:14:38 mail3 kernel: eth1: Tigon3 [partno(NA) rev 1002 PHY(5703)] (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:0e:7f:21:0e:4c
May 13 21:14:38 mail3 kernel: tg3: eth0: Link is up at 100 Mbps, full duplex.
May 13 21:14:38 mail3 kernel: tg3: eth0: Flow control is off for TX and off for RX.
May 13 21:14:38 mail3 kernel: tg3: eth1: Link is up at 100 Mbps, full duplex.
May 13 21:14:38 mail3 kernel: tg3: eth1: Flow control is off for TX and off for RX.
Comment 1 Need Real Name 2004-07-22 08:56:53 EDT
I have a Red Hat Enterprise 3.1 linux box with 2.4.21-9.0.1 smp 
kernel currently running on it, a whole load of tg3 based NICs (as 
the unit is being used as a router, so running Quagga) on HP ML350R03 
hardware.

The relevant lspci entries:

00:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5702X 
Gigabit Ethernet (rev 02)
02:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 
Gigabit Ethernet (rev 12)
02:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 
Gigabit Ethernet (rev 15)
05:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 
Gigabit Ethernet (rev 15)
05:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5701 
Gigabit Ethernet (rev 15)

From dmesg

eth1: Tigon3 [partno(N/A) rev 1002 PHY(5703)] (PCI:33MHz:32-bit) 
10/100/1000BaseT Ethernet 00:0b:cd:cb:e1:74
divert: allocating divert_blk for eth2
eth2: Tigon3 [partno(BCM95700A6) rev 7102 PHY(5401)] (PCIX:100MHz:64-
bit) 10/100/1000BaseT Ethernet 00:04:76:3b:61:94
divert: allocating divert_blk for eth3
eth3: Tigon3 [partno(3C996B-T) rev 0105 PHY(5701)] (PCIX:100MHz:64-
bit) 10/100/1000BaseT Ethernet 00:0a:5e:23:76:51
divert: allocating divert_blk for eth4
eth4: Tigon3 [partno(3C996B-T) rev 0105 PHY(5701)] (PCIX:100MHz:64-
bit) 10/100/1000BaseT Ethernet 00:0a:5e:23:77:08
divert: allocating divert_blk for eth5
eth5: Tigon3 [partno(3C996B-T) rev 0105 PHY(5701)] (PCIX:100MHz:64-
bit) 10/100/1000BaseT Ethernet 00:0a:5e:23:76:5b

eth0 is an Intel e100 card, eth1 is on the motherboard, eth2-5 are 
PCI cards.

I am seeing the same tg3_stop_block timed out error message on eth3-5 
(eth0-eth2 are working fine) but the interfaces never transmit any 
traffic and eventually the timeout message occurs as follows:

tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
loop: loaded (max 8 devices)
tg3: eth3: Link is up at 100 Mbps, full duplex.
tg3: eth3: Flow control is off for TX and off for RX.
tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3: eth4: Link is up at 100 Mbps, full duplex.
tg3: eth4: Flow control is off for TX and off for RX.
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3: eth4: Link is up at 100 Mbps, full duplex.
tg3: eth4: Flow control is off for TX and off for RX.
tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
e100: eth0 NIC Link is Up 100 Mbps Full duplex
tg3: eth5: Link is up at 100 Mbps, full duplex.
tg3: eth5: Flow control is off for TX and off for RX.
tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2

I have tested this using different switches, different network 
cables, and even on the 9.0.1 non-smp kernel and the 15.0.3 smp and 
non-smp kernels all with the same problem.
Comment 2 Robert Binz 2004-08-19 23:07:36 EDT
I am working with a Sun V20z system and RedHat ES3.  We are currently 
finding that this system and OS generate the following errors just 
before the NIC becomes brain dead and the system must be rebooted to 
function again.

NETDEV WATCHDOG: eth0: transmit timed out
tg3: eth0: transmit timed out, resetting
tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2
tg3: eth0: Link is down.

Attempting to bring the NIC backup with out rebboting does not result 
in the NIC becoming active again.
Comment 3 David Miller 2004-08-19 23:18:29 EDT
Do the people seeing this all have 4GB or more
of ram in the machine?  If not, how much ram do
you have?
Comment 4 Eric Paris 2004-08-20 12:59:55 EDT
Customer seeing problem has 4 gigs.  Dual opteron BCM5703X nic cards.
Comment 6 David Miller 2004-08-20 15:18:00 EDT
If the machine has that much memory, this problem is almost
certainly fixed by a patch which will go into U4 update.
Comment 7 Herman Sheremetyev 2004-08-21 18:47:32 EDT
I just had this happen on one of my Dell Poweredge 2450 boxes running
RHEL ES3 with kernel 2.4.21-9.0.1.EL.  It's been running fine for 102
days now and this is the first time this has happened.  One thing
that's changed recently is I added an aliased IP to the interface and
it started receiving quite a bit more traffic.  The interface stopped
responding for about 12 minutes after which it's working fine again,
no restart was required.  The box I have only has 1GB of RAM.  Also,
my monitors alerted me that the box went down at 4:56pm, then came
back up at 5:09pm and these errors in syslog are at 5:08pm, right
before the interface came back rather than when it went down:

Aug 21 17:08:53 mail1 kernel: NETDEV WATCHDOG: eth0: transmit timed out
Aug 21 17:08:53 mail1 kernel: tg3: eth0: transmit timed out, resetting
Aug 21 17:08:53 mail1 kernel: tg3: tg3_stop_block timed out, ofs=3400
enable_bit=2
Aug 21 17:08:53 mail1 kernel: tg3: tg3_stop_block timed out, ofs=2400
enable_bit=2
Aug 21 17:08:53 mail1 kernel: tg3: tg3_stop_block timed out, ofs=1400
enable_bit=2
Aug 21 17:08:53 mail1 kernel: tg3: tg3_stop_block timed out, ofs=c00
enable_bit=2
Aug 21 17:08:53 mail1 kernel: tg3: tg3_stop_block timed out, ofs=4800
enable_bit=2
Comment 8 Eric Paris 2004-08-25 11:00:07 EDT
I saw the problem without an aliases IP, so I do not believe that was
the issue.  The issue was also seen with a large number of connections
(couple hundres simultaneous web requests).  The system was tested
with gigs and gigs of scp traffic but a low number of simultaneous
requests and no issue was found.
Comment 9 Herman Sheremetyev 2004-08-25 11:15:17 EDT
that seems consistent with my setup as well, the box was running for
over 3 months with no problems doing very light web and mail services,
it was pushing through considerable traffic but concurrent connections
were relatively few.  Once I moved a high-traffic website to the box
it crashed within a few days.  I switched to the bcm5700 driver from
Broadcom to avoid it for the time being, is there any chance of
getting that included in the distribution as there seems to be no
resolution in sight to the tg3 problems?
Comment 14 Tony Scholes 2004-09-14 10:05:56 EDT
I too have this problem, on a Dell PowerEdge with 2GB RAM... Running
RHEL3 Update 3 i.e. all patches to date applied...

Any progress on this, the system is about to go live, but it won't if
it ain't fixed...

Comment 19 Charles Duffy 2004-09-27 15:41:39 EDT
I have a machine with only 2GB of RAM which is showing the same
symptoms (currently running RHEL3U2). Error messages are only slightly
different:

Unexpected dirty buffer encountered at do_get_write_access:616 108:05
blocknr 0
tg3: eth1: transmit timed out, resetting
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3: eth1: Link is down.

I'm going to be testing the RHEL3U3 kernel shortly. We have a number
of these machines fielded, and more hardware to build more. If we can
do anything to help (including testing the RHEL3U4 kernel), please let
me know.
Comment 21 Bradford Leak 2004-12-08 10:27:11 EST
I also have seen this recently on a Dell PE6650 running RHELAS3.0
Update 2 (2.4.21-15.ELsmp).  This system has 16GB RAM.
The system was fine until the interface was moved from a gig switch to
a 100mb switch.  Both switches are Cicso.

Here is the behavior I've experienced:
After some period of time, the interface stops responding.  
I log into the console and bring the interface down: 
# ifdown eth0
tg3: tg3_stop_block timed out, ofs=3400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=2400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=1400 enable_bit=2
tg3: tg3_stop_block timed out, ofs=c00 enable_bit=2
tg3: tg3_stop_block timed out, ofs=4800 enable_bit=2

So far, I've been able to bring the interface back up with `ifup eth0`

On another occasion, I attempted to unload the tg3 driver with the
following results:
# rmmod tg3
unregister_netdevice: waiting for eth0 to become free.  Usage count=2
unregister_netdevice: waiting for eth0 to become free.  Usage count=2
This repeats in syslog indefintely until the system reboots.

Comment 22 Michael Tonn 2004-12-22 12:46:10 EST
I am experiencing the same problem with a Dell PowerEdge 6650 with 
16gig of memory.  The system is running RedHat Release AS 3.0 Update 
2.
Comment 23 Wijnand 2005-01-05 10:21:39 EST
Hi we had the same problem

Redhat AS3.0 update 3
4 GB of mem
kernel 2.4.21-22
Dell Poweredge 6650

Dec 29 13:29:17 geel04 kernel: NETDEV WATCHDOG: eth0: transmit timed 
out
Dec 29 13:29:17 geel04 kernel: tg3: eth0: transmit timed out, 
resetting
Dec 29 13:29:17 geel04 kernel: tg3: tg3_stop_block timed out, 
ofs=1800 enable_bit=2
Dec 29 13:29:17 geel04 kernel: tg3: tg3_stop_block timed out, 
ofs=4800 enable_bit=2
Dec 29 13:29:17 geel04 kernel: tg3: eth0: Link is down.
Dec 29 13:29:21 geel04 kernel: tg3: eth0: Link is up at 1000 Mbps, 
full duplex.
Dec 29 13:29:21 geel04 kernel: tg3: eth0: Flow control is off for TX 
and off for RX.
ec 29 13:30:26 geel04 kernel: NETDEV WATCHDOG: eth0: transmit timed 
out
Dec 29 13:30:26 geel04 kernel: tg3: eth0: transmit timed out, 
resetting
Dec 29 13:30:26 geel04 kernel: tg3: tg3_stop_block timed out, 
ofs=2c00 enable_bit=2
Dec 29 13:30:26 geel04 kernel: tg3: tg3_stop_block timed out, 
ofs=3400 enable_bit=2
Dec 29 13:30:26 geel04 kernel: tg3: tg3_stop_block timed out, 
ofs=2400 enable_bit=2
Dec 29 13:30:26 geel04 kernel: tg3: tg3_stop_block timed out, 
ofs=1800 enable_bit=2
Dec 29 13:30:26 geel04 kernel: tg3: tg3_stop_block timed out, 
ofs=4800 enable_bit=2
Dec 29 13:30:27 geel04 kernel: tg3: eth0: Link is down.
Dec 29 13:30:30 geel04 kernel: tg3: eth0: Link is up at 1000 Mbps, 
full duplex.
Dec 29 13:30:30 geel04 kernel: tg3: eth0: Flow control is off for TX 
and off for RX.

We had to issue a /etc/rc.d/init.d/network restart to make it work 
again. 
Probably a ifdown and ifup would have worked also.

This is the third time we experienced this problem and every time it 
was happening 
it was during high nfs traffic, maybe that could be a hint for some 
of you.
Our machine has 6 interfaces from witch 2 are broadcom and 4 are 
intel nic's 
I'm going to use the intel interface's because to me it seems 
somewhat like a driver problem.

Wijnand Reimink
Netherlands
Comment 24 Kevin M. Myer 2005-02-13 18:39:20 EST
Similar problem here today.  A lightly loaded mail server (about 200
accounts), running Red Hat EL 3, U4 and kernel 2.4.21-27.0.1smp became
unreachable over the network.  Hardware is a PE 2650, and the dual
NICs are bonded using the bonding driver.  Base NIC driver is tg3.

Console access was still available and the following message scrolled
by  constantly:

tg3: eth0: transmit timed out, resetting

I was unable to login over the console (never got a prompt) and had to
resort to a remote reboot using the RAC management card.

Nothing is logged in /var/log/messages and dmesg got overwritten with
the reboot.
Comment 25 Kevin M. Myer 2005-02-13 18:42:27 EST
Forgot to add, since there appears to have been issues if a box has
greather than 4Gb of RAM.  This server has 2Gb of RAM...
Comment 28 John W. Linville 2005-02-18 17:04:30 EST
tg3 is taking a considerable update for U5.  I have test kernels w/
the update available here:

   http://people.redhat.com/linville/.bz123218

Please give these a try and post whether or not the problem persists.

Thanks!
Comment 29 John W. Linville 2005-02-25 15:10:45 EST
I have deleted the kernels referenced in comment 28 (storage issues).
 Let me know if anyone still needs them...
Comment 30 John W. Linville 2005-03-01 12:53:32 EST
FWIW, the patch in question is also included in the kernels here:

   http://people.redhat.com/linville/kernels/rhel3/

It isn't explicitly listed as a patch on that page, but it is there
just the same...
Comment 31 John W. Linville 2005-03-09 08:32:07 EST
Any word on re-creating the problem with the test kernels from comment 30?
Comment 32 Marc Michelsen 2005-03-29 15:59:57 EST
I have experienced this problem once on a Tyan 4882 quad opteron
with 16GB of ram running RHEL 4 (not 3_U4) running this kernel:
138 challenger% uname -a
Linux challenger 2.6.9-6.25.ELsmp #1 SMP Tue Mar 8 21:55:12 EST 2005 x86_64
x86_64 x86_64 GNU/Linux
139 challenger%

Mar 24 13:27:54 challenger kernel: tg3: tg3_stop_block timed out, ofs=2000
enable_bit=2
Mar 24 13:27:54 challenger kernel: tg3: tg3_stop_block timed out, ofs=1400
enable_bit=2
Mar 24 13:27:54 challenger kernel: tg3: tg3_stop_block timed out, ofs=c00
enable_bit=2

service network restart fixed it.

Have the tg3 updates for RHEL-3 made it into the RHEL-4 kernel I'm using?
Comment 33 John W. Linville 2005-03-29 16:37:37 EST
Looks like 2.6.9-6.25 should have the 3.22rh tg3 driver.  "ethtool -i" should
confirm.

Marc, it would be helpful for procedural purposes if you could open a RHEL4
bugzilla for this issue.

In the meantime, I'll look more deeply into what may be causing this...
Comment 34 Marc Michelsen 2005-03-29 17:55:16 EST
Yes, it is:
[root@challenger]# ethtool -i eth0
driver: tg3
version: 3.22-rh
firmware-version:
bus-info: 0000:02:09.0
[root@challenger]#

And I just opened it for RHEL4
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=152518
Comment 35 John W. Linville 2005-04-05 13:05:44 EDT
I have a patch to update the tg3 in RHEL3 to 3.25RH.  I have pre-built test 
kernels available here: 
 
   http://people.redhat.com/linville/kernels/rhel3/ 
 
Please give them a try to see if they help.  Please post the results.  Thanks! 
Comment 37 John W. Linville 2005-05-09 15:55:28 EDT
Closed due to lack of response.  Please reopen (with test results) if this 
continues to be a problem.  Thanks! 
Comment 40 John W. Linville 2005-05-10 15:08:29 EDT
Leaving this open for now as a RHEL3 twin of the still active bug 152518... 
Comment 42 John W. Linville 2005-05-11 16:26:31 EDT
Internal reports indicate that updating to later firmware for tg3 hardware has 
resolved this issue.  I'm closing this (again) on that basis.  Please re-open 
if this issue continues.  Please note that not all instances of the 
"tg3_stop_block timed out" message are problematic, so please limit any more 
activity in this bug to cases involving actual lose of network function.  
Thanks! 
Comment 44 Steve Wilson 2005-09-07 16:04:57 EDT
(In reply to comment #42)
> Internal reports indicate that updating to later firmware for tg3 hardware has 
> resolved this issue.  I'm closing this (again) on that basis.  Please re-open 
> if this issue continues.  Please note that not all instances of the 
> "tg3_stop_block timed out" message are problematic, so please limit any more 
> activity in this bug to cases involving actual lose of network function.  
> Thanks! 

What is the most current firmware version for BCM5700.  Experiencing the lockup
described here problem with this version:

# lspci | grep Broadcom
03:01.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit
Ethernet (rev 10)

# uname -a
Linux sutnas01 2.4.21-32.0.1.ELsmp #1 SMP Wed May 25 14:26:33 EDT 2005 i686 i686
i386 GNU/Linux

# ethtool -i eth0
driver: tg3
version: 3.22RH
firmware-version: 
bus-info: 03:01.0

This has been a thorn in our side for quite a while.
Comment 45 John W. Linville 2005-09-08 09:51:20 EDT
Steve, 
 
Have you tried the test kernels available at the location mentioned in comment 
30?  Please do so and post the results. 
 
Regarding firmware updates, you'll have to contact the provider of your card 
(or motherboard) to see what may be available.  Some have reported that a 
firmware update alleviates these problems. 
 
Finally, please note that the "tg3_stop_block timed out" message is not 
necessarily a problem, particularly if it only shows-up when the interface is 
going down anyway (like during an ifdown).  If you still see the message with 
the updated driver from the test kernels, please determine whether or not the 
card passes traffic after an ifup.  Thanks! 
Comment 46 Steve Wilson 2005-09-11 12:31:08 EDT
Just tried to install kernel-smp-2.4.21-37.EL.jwltest.52.i686.rpm, but many
needed modules weren't compiled in resulting in a failed boot.  Is it possible
to get just the tg3 specific changes as source/patch?  Thanks for the
clarification on "tg3_stop_block timed out" - guess I usually associate these
with this issue since dmesg will usually show them when we experience the
problem.  Also, forgot to note our system type in my previous post:

System        : ProLiant DL380 G4
Serial No.    : (removed)
ROM version   : P51 12/02/2004
iLo present   : Yes
Embedded NICs : 2
        NIC1 MAC: 00:13:21:0c:46:b7
        NIC2 MAC: 00:13:21:0c:46:b6

Processor: 0
        Name         : Intel Xeon
        Stepping     : 1
        Speed        : 3200 MHz
        Bus          : 800 MHz
        Socket       : 1
        Level2 Cache : 1024 KBytes
        Status       : Ok

Processor total  : 1

Memory installed : 2048 MBytes
ECC supported    : Yes

Seems our G4 DL360 never have an issue, only the G4 DL380's.

The bcm5700 driver from Broadcom exhibits the same behavior, although dmesg
never shows a time out.  The bcm5700 also shows the firmware version:

# ethtool -i eth0
driver: bcm5700
version: 8.2.18
firmware-version: 5704-v3.27b
bus-info: 03:01.0

Thinking it's not related specifically to the tg3 driver since Broadcom's own
driver has the same issue.  Would really like to find the most current firmware,
but HP's site doesn't seem to list anything relevant.

Thanks for your interest in this issue.

Steve

> Steve, 
>  
> Have you tried the test kernels available at the location mentioned in comment 
> 30?  Please do so and post the results. 
>  
> Regarding firmware updates, you'll have to contact the provider of your card 
> (or motherboard) to see what may be available.  Some have reported that a 
> firmware update alleviates these problems. 
>  
> Finally, please note that the "tg3_stop_block timed out" message is not 
> necessarily a problem, particularly if it only shows-up when the interface is 
> going down anyway (like during an ifdown).  If you still see the message with 
> the updated driver from the test kernels, please determine whether or not the 
> card passes traffic after an ifup.  Thanks! 

Comment 47 Steve Wilson 2006-01-03 09:49:24 EST
I solved our tg3 locking problem by installing the HP firmware update found here:

http://h18004.www1.hp.com/support/files/server/us/download/23367.html

It's for hpnicfwupg-1.2.2-1.i386.rpm which has been running without issue for a
couple months now. YMMV.

Steve


Note You need to log in before you can comment on or make changes to this bug.