Red Hat Bugzilla – Bug 249185
Non Xen Kernel generates 'Detected Tx Unit Hang' messages on e1000 driver while Xen enabled Kernels do not.
Last modified: 2008-01-09 12:11:29 EST
Description of problem:
Please see f5 bug 200656, ES bug 248787, and f6 bug 219496 . I am working on
bug 249136 concerning USB problems. The f7 Zen Kernel 2.6.20-2925.11.fc7xen has
a problem with the USB issue. When I rebooted to the new 18.104.22.168-27 kernel to
test the kernel as fix for USB issues, my tx hang issues on the same hardware in
bug 200656 appeared. However, the 2.6.20-2925.11.fc7xen kernel is rock solid.
I am coping a bunch of wav files from my NFS server to a 400gig USB drive when
the error occurs. However, simple web surfing can also cause problems.
Version-Release number of selected component (if applicable):
Switching from the 22.214.171.124-27 kernel back to the 2.6.20-2925.11.fc7xen then
back to the 126.96.36.199-27 kernel will produce the error, fix the error and the
produce the error respectfully.
Steps to Reproduce:
1.Upgrade to the 188.8.131.52-27 kernel with an e1000 card.
Study performance on a heavy load.
Long delays in coping data or surfing the web.
I just closed the bug on FC5 yesterday thinking that the problem was gone.
Jul 22 02:35:26 mowgli kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit
Jul 22 02:35:26 mowgli kernel: Tx Queue <0>
Jul 22 02:35:26 mowgli kernel: TDH <61>
Jul 22 02:35:26 mowgli kernel: TDT <61>
Jul 22 02:35:26 mowgli kernel: next_to_use <61>
Jul 22 02:35:26 mowgli kernel: next_to_clean <75>
Jul 22 02:35:26 mowgli kernel: buffer_info[next_to_clean]
Jul 22 02:35:26 mowgli kernel: time_stamp <1e8878f>
Jul 22 02:35:26 mowgli kernel: next_to_watch <75>
Jul 22 02:35:26 mowgli kernel: jiffies <1e8a5c0>
Jul 22 02:35:26 mowgli kernel: next_to_watch.status <0>
Jul 22 02:35:27 mowgli kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jul 22 02:35:30 mowgli kernel: e1000: eth0: e1000_watchdog: NIC Link is Up 1000
Mbps Full Duplex, Flow Control: RX/TX
The working driver is
Jul 21 17:31:03 mowgli kernel: input: PC Speaker as /class/input/input3
Jul 21 17:31:03 mowgli kernel: Intel(R) PRO/1000 Network Driver - version
Jul 21 17:31:03 mowgli kernel: Copyright (c) 1999-2006 Intel Corporation.
The TK hang driver is
Jul 21 17:22:59 mowgli kernel: Intel(R) PRO/1000 Network Driver - version
Jul 21 17:22:59 mowgli kernel: Copyright (c) 1999-2006 Intel Corporation.
18:05:04 up 15:28,...
with no TX hang errors on the 7.3.15 driver while coping 320G from an NFS Intel
gigabit enabled server, if that helps.
OK so I revised the Summary and provide you with one of those handy Time-Life
charts. ;-) The real issue here is that the Xen kernels available in my grub
menu do not generate the TX hang error messages while the regular fc7 kernels
generate the TX hang messages. Trying to use gvim to create the table below
between reboots on an NFS mounted home directory was very unresponsive with the
TX hanging kernels. The Summary message was updated accordingly.
Driver Kernel TK Hang Issues
7.3.15-k2-NAPI /boot/vmlinuz-2.6.20-2925.11.fc7xen Rock Solid
7.3.15-k2-NAPI /boot/vmlinuz-2.6.20-2925.9.fc7xen Rock Solid
7.3.20-k2-NAPI /boot/vmlinuz-2.6.21-1.3228.fc7 TX Issues Encountered
7.3.20-k2-NAPI /boot/vmlinuz-184.108.40.206-27.fc7 TX Issues Encountered
One workaround to try is turning off TSO:
# ethtool -K eth0 tso off
I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.
I am CC'ing myself to this bug and will try and assist you in resolving it if I can.
There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel? Did Chuck's suggestion
work for you?
If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.
Ack. I'll check the machine tonight with the ethtool from comment #3. Note
that this is the same host mentioned in bug 200656. I did perform some of the
ethool operations in that bug. However, I know that I am still using the Xen
kernels at this point with no problems at all. The Xen Kernel e1000
7.3.15-k2-NAPI driver is very similar to the 7.3.15tdh code that was provided me
in bug 200656 comment #10.
Any change using ethtool?
Chris, Sorry for the many delays. The problem still exists with the ethtool
command. This problem also exists in f8 as I posted here in bug 398921. The
7.3.15-k2-NAPI Intel driver fixed the problem but the 7.3.20-k2-NAPI Intel
driver regressed? or added back a similar problem that creates the
Nov 25 18:19:03 mowgli kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit
Nov 25 18:19:03 mowgli kernel: Tx Queue <0>
Nov 25 18:19:03 mowgli kernel: TDH <5a>
Nov 25 18:19:03 mowgli kernel: TDT <5a>
Nov 25 18:19:03 mowgli kernel: next_to_use <5a>
Nov 25 18:19:03 mowgli kernel: next_to_clean <6e>
Nov 25 18:19:03 mowgli kernel: buffer_info[next_to_clean]
Nov 25 18:19:03 mowgli kernel: time_stamp <37cf825>
Nov 25 18:19:03 mowgli kernel: next_to_watch <6e>
Nov 25 18:19:03 mowgli kernel: jiffies <37d1100>
Nov 25 18:19:03 mowgli kernel: next_to_watch.status <0>
Nov 25 18:19:05 mowgli kernel: NETDEV WATCHDOG: eth0: transmit timed out
The problem occurs under load but can occur during web surfing.
Thanks for the update. I'm closing this as a dupe of bug 398921 then - thanks
for filing that one. 2.6.24 is just around the corner so we could see what that
brings ... :)
*** This bug has been marked as a duplicate of 398921 ***