Bug 182215 - e1000_clean_tx_irq: Detected Tx Unit Hang with Kernel v2.6.15.4 and Intel PRO/1000 Fiber card
e1000_clean_tx_irq: Detected Tx Unit Hang with Kernel v2.6.15.4 and Intel PRO...
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
4
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: John W. Linville
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-02-20 21:26 EST by Brett Wyer
Modified: 2007-11-30 17:11 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-07-11 10:25:13 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Config file I'm using to build 2.6.15.4 (28.37 KB, text/plain)
2006-02-21 20:22 EST, Brett Wyer
no flags Details

  None (edit)
Description Brett Wyer 2006-02-20 21:26:24 EST
Description of problem:
After upgrading from 2.6.14.2 to 2.6.15.4, I started encountering the following
messages repeatedly in my /var/log/messages once eth0 was activated:

kernel: e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex
kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
kernel:   TDH                  <0>
kernel:   TDT                  <0>
kernel:   next_to_use          <2>
kernel:   next_to_clean        <0>
kernel: buffer_info[next_to_clean]
kernel:   dma                  <379e2e02>
kernel:   time_stamp           <fffcd0cc>
kernel:   next_to_watch        <0>
kernel:   jiffies              <fffcd896>
kernel:   next_to_watch.status <0>
kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
kernel:   TDH                  <0>
kernel:   TDT                  <0>
kernel:   next_to_use          <2>
kernel:   next_to_clean        <0>
kernel: buffer_info[next_to_clean]
kernel:   dma                  <379e2e02>
kernel:   time_stamp           <fffcd0cc>
kernel:   next_to_watch        <0>
kernel:   jiffies              <fffce066>
kernel:   next_to_watch.status <0>

This is apparently related to changes to the Intel e1000 driver since 6.0.60-k2.

Version-Release number of selected component (if applicable):
FC4 Kernel version 2.6.15.4

How reproducible:
Running in a Dell 6650, 2GB memory, Dual Xeon 1.4GHz
Network card--Intel Fiber Gig-E (from lspci):
15:01.0 Ethernet controller: Intel Corporation 82542 Gigabit Ethernet Controller
(rev 03)

Note that I also have a (tg3) ethernet (built-in) that works fine under both
revs of the kernel in this machine.

Steps to Reproduce:
1. Boot under 2.6.15.4 built with identical .config file from 2.6.14.2 with
e1000 support and the above card
2.
3.
  
Actual results:
Numerous e1000_clean_tx_irq: Detected Tx Unit Hang in /var/log/messages.  Fiber
NIC will not come up and allow transmissions.

Expected results:
No error messages, fiber NIC comes up and functions

Additional info:
[root@webserver proc]# lspci
00:00.0 Host bridge: Broadcom CMIC-HE (rev 22)
00:00.1 Host bridge: Broadcom CMIC-HE
00:00.2 Host bridge: Broadcom CMIC-HE
00:00.3 Host bridge: Broadcom CMIC-HE
00:03.0 SCSI storage controller: Adaptec AIC-7892P U160/m (rev 02)
00:04.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:05.0 Communication controller: NetMos Technology PCI 9835 Multi-I/O
Controller (rev 01)
00:0f.0 Host bridge: Broadcom CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: Broadcom CSB5 IDE Controller (rev 93)
00:0f.2 USB Controller: Broadcom OSB4/CSB5 OHCI USB Controller (rev 05)
00:0f.3 ISA bridge: Broadcom CSB5 LPC bridge
00:10.0 Host bridge: Broadcom CIOB30 (rev 03)
00:10.2 Host bridge: Broadcom CIOB30 (rev 03)
00:11.0 Host bridge: Broadcom CIOB30 (rev 03)
00:11.2 Host bridge: Broadcom CIOB30 (rev 03)
00:12.0 Host bridge: Broadcom CIOB30 (rev 03)
00:12.2 Host bridge: Broadcom CIOB30 (rev 03)
03:02.0 VGA compatible controller: nVidia Corporation NV5M64 [RIVA TNT2 Model
64/Model 64 Pro] (rev 15)
08:01.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5700 Gigabit
Ethernet (rev 14)
09:01.0 PCI bridge: Intel Corporation 21154 PCI-to-PCI Bridge
0a:00.0 PCI bridge: Intel Corporation 21154 PCI-to-PCI Bridge
0a:01.0 SCSI storage controller: QLogic Corp. ISP12160 Dual Channel Ultra3 SCSI
Processor (rev 06)
0b:00.0 RAID bus controller: American Megatrends Inc. MegaRAID (rev 20)
15:01.0 Ethernet controller: Intel Corporation 82542 Gigabit Ethernet Controller
(rev 03)

-----------------------
Relevant section of dmesg during bootup of 2.6.14.2 (which works):

Intel(R) PRO/1000 Network Driver - version 6.0.60-k2-NAPI
Copyright (c) 1999-2005 Intel Corporation.
ACPI: PCI Interrupt 0000:15:01.0[A] -> GSI 29 (level, low) -> IRQ 177
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
tg3.c:v3.42 (Oct 3, 2005)
ACPI: PCI Interrupt 0000:08:01.0[A] -> GSI 17 (level, low) -> IRQ 185
eth1: Tigon3 [partno(BCM95700A6) rev 7104 PHY(5411)] (PCI:66MHz:64-bit)
10/100/1000BaseT Ethernet 00:06:5b:0e:52:eb
eth1: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] Split[0] WireSpeed[0] TSOcap[0]
eth1: dma_rwctrl[76ff000f]
netconsole: not configured, aborting

-----------------
Similar section in /var/log/messages under 2.6.15.4 which doesn't work:

Intel(R) PRO/1000 Network Driver - version 6.1.16-k2-NAPI
Copyright (c) 1999-2005 Intel Corporation.
ACPI: PCI Interrupt 0000:15:01.0[A] -> GSI 29 (level, low) -> IRQ 177
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
tg3.c:v3.47 (Dec 28, 2005)
ACPI: PCI Interrupt 0000:08:01.0[A] -> GSI 17 (level, low) -> IRQ 185
eth1: Tigon3 [partno(BCM95700A6) rev 7104 PHY(5411)] (PCI:66MHz:64-bit)
10/100/1000BaseT Ethernet
00:06:5b:0e:52:eb
eth1: RXcsums[1] LinkChgREG[1] MIirq[1] ASF[0] Split[0] WireSpeed[0] TSOcap[0]
eth1: dma_rwctrl[76ff000f]
netconsole: not configured, aborting
Comment 1 John W. Linville 2006-02-21 10:38:07 EST
It sounds like you are using kernels from upstream which you have built 
yourself rather than actual Fedora kernels.  Is that true?  Please include the 
output of "uname -a" in your reply. 
 
Can this problem be seen on an actual Fedora kernel? 
 
Have you tried using the fedora-netdev kernels? 
 
   http://people.redhat.com/linville/kernels/fedora-netdev/ 
 
Does the problem appear there? 
Comment 2 Brett Wyer 2006-02-21 20:19:24 EST
John-

Thanks for the quick response.  Here's my dilemma: my boot partition is
ReiserFS, so I need a kernel that has it built-in.  If I use a canned kernel, I
end up with an "error 6 mounting reiserfs" and a kernel panic.  I'm certainly up
for suggestions here.

In answer to your other questions, I'm downloading the latest-and-greatest
kernel image from www.kernel.org, full sources.  Aside from stripping down the
options and building my hardware into the kernel instead of using modules (via
make xconfig), my configuration is pretty plain-Jane.  Additionally, I'm using a
.config file from my (working) 2.6.14.2 to build my 2.6.15 image.

Here's my uname -a output:

Linux webserver.wyerdom.local 2.6.14.2 #2 SMP PREEMPT Thu Jan 5 20:29:33 CST
2006 i686 i686 i386 GNU/Linux

I'd be happy to try a canned 2.6.15 kernel if I can get one that has ReiserFS
built into it.

Oh...  I'm attaching a copy of my .config in case that might help.

Brett
Comment 3 Brett Wyer 2006-02-21 20:22:56 EST
Created attachment 124995 [details]
Config file I'm using to build 2.6.15.4

Here's a copy of my .config file.  I took a working (what I'm running on right
now) 2.6.14 .config and just copied it up to the 2.6.15 directory, did a make
xconfig, saved and did a make.
Comment 4 Michael Anderson 2006-03-06 14:17:21 EST
I have the same problem with the 2.6.15 kernels, and I _am_using the FC4 kernels
(kernel-2.6.15-1.1833_FC4 latest).  The information below is from a sucessful
boot with kernel-2.6.14-1.1656_FC4. 

lspci:
00:0b.0 Ethernet controller: Intel Corporation 82542 Gigabit Ethernet Controller
(rev 03)

dmesg:
Intel(R) PRO/1000 Network Driver - version 6.0.60-k2-NAPI
Copyright (c) 1999-2005 Intel Corporation.
ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 10
PCI: setting IRQ 10 as level-triggered
ACPI: PCI Interrupt 0000:00:0b.0[A] -> Link [LNKD] -> GSI 10 (level, low) -> IRQ  10
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
i2c-sis96x version 1.0.0
Comment 5 Brett Wyer 2006-04-08 19:42:21 EDT
What is the status on this issue?  An example of someone with a canned kernel
build was given and no further updates have been posted.
Comment 6 John W. Linville 2006-04-10 08:35:03 EDT
Hmmm...perhaps you should not have reassigned this bug back to the generic   
"kernel-maint" alias on 3/19...?   
   
Regarding the "canned kernel" that I asked you to try, you never provided any   
feedback about it.  Please do so.  If you want/need to rebuild the kernel with 
specific configuration changes, source is provided:   
   
   http://people.redhat.com/linville/kernels/fedora-netdev/4/rpms/SRPMS/  
  
Please try the provided kernel and post the results here...thanks! 
Comment 7 Brett Wyer 2006-04-13 20:43:32 EDT
Please see Comment #2 regarding trying a canned kernel.  Short of rebuilding a
production machine from scratch, I have no way to do so, unless there's a canned
kernel that supports ReiserFS on the root partition.

There was, however, an individual that is having the same issue I'm running into
with a canned kernel.  Please see comment #4.

Is this sufficient to work from?

Thank you.
Comment 8 John W. Linville 2006-05-18 15:37:30 EDT
Is this issue still occurring w/ current Fedora kernels?
Comment 9 Brett Wyer 2006-06-01 20:58:56 EDT
This is no longer an issue for me, as I have pulled the Pro/1000 Fiber card 
out and am using an on-board Broadcom copper Gig-E connection.  I believe 
there are additional people watching this bug that may be able to contribute.
Comment 11 John W. Linville 2006-06-02 10:45:10 EDT
FWIW, found this (singleton) thread, which appears to have a similar issue  
upstream:  
  
   http://www.ussg.iu.edu/hypermail/linux/net/0603.3/0013.html  
Comment 12 Randy Zagar 2006-06-02 12:49:14 EDT
Having similar problems with RHEL-4 2.6.9-34.EL

See https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=190262
Comment 13 Auke Kok 2006-06-02 14:12:48 EDT
the issue is still under investigation but no resolution has been found yet.
Please also monitor http://e1000.sf.net/ (bugs section).

As a temporary workaround we advise people to turn off tso - this seems to make
the problem go away, at some cost of performance.
Comment 14 John W. Linville 2006-06-05 12:24:10 EDT
Brett, could you try "ethtool -K eth0 tso off" (replace eth0 as appropriate)?  
Please do so and post the results here...thanks! 
Comment 15 Jesse Brandeburg 2006-06-06 14:28:09 EDT
Unfortunately the e1000 interface 
15:01.0 Ethernet controller: Intel Corporation 82542 Gigabit Ethernet Controller
(rev 03)

is known to not work any more with the current drivers.  This specific adapter
is no longer officially supported by Intel.  Our position is if someone can
suggest a fix (and still has one of these ancient adapters) we would push to get
the fix into the kernel, but we are unable to prioritize spending time to fix
this issue for end of lifed hardware.

I think this bug can be closed, but the other issues mentioned are distinct and
different problems from the original issue reported in this bug.
Comment 16 John W. Linville 2006-06-08 13:53:46 EDT
Bug 194460 seems to have the same symptoms on an 82573.  Are the issues 
related?
Comment 17 Jesse Brandeburg 2006-06-08 14:25:21 EDT
these bugs are definintely not related.
Comment 18 John W. Linville 2006-07-11 10:25:13 EDT
I'm going to close this based on comment 15 and comment 17.

Note You need to log in before you can comment on or make changes to this bug.