Bug 249185 - Non Xen Kernel generates 'Detected Tx Unit Hang' messages on e1000 driver while Xen enabled Kernels do not.
Non Xen Kernel generates 'Detected Tx Unit Hang' messages on e1000 driver whi...
Status: CLOSED DUPLICATE of bug 398921
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
7
All Linux
low Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-07-22 06:12 EDT by Greg Morgan
Modified: 2008-01-09 12:11 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-01-09 12:11:29 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Greg Morgan 2007-07-22 06:12:36 EDT
Description of problem:

Please see f5 bug 200656, ES bug 248787, and f6 bug 219496 .  I am working on
bug 249136 concerning USB problems.  The f7 Zen Kernel 2.6.20-2925.11.fc7xen has
a problem with the USB issue.  When I rebooted to the new 2.6.22.1-27 kernel to
test the kernel as fix for USB issues, my tx hang issues on the same hardware in
 bug 200656 appeared.  However, the 2.6.20-2925.11.fc7xen kernel is rock solid.
 I am coping a bunch of wav files from my NFS server to a 400gig USB drive when
the error occurs.  However, simple web surfing can also cause problems.

Version-Release number of selected component (if applicable):
2.6.22.1-27 kernel

How reproducible:
Switching from the 2.6.22.1-27 kernel back to the 2.6.20-2925.11.fc7xen then
back to the 2.6.22.1-27 kernel will produce the error, fix the error and the
produce the error respectfully.

Steps to Reproduce:
1.Upgrade to the 2.6.22.1-27 kernel with an e1000 card.
2.
3.
  
Actual results:

Study performance on a heavy load.

Expected results:

Long delays in coping data or surfing the web.

Additional info:
I just closed the bug on FC5 yesterday thinking that the problem was gone.

Jul 22 02:35:26 mowgli kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit
Hang
Jul 22 02:35:26 mowgli kernel:   Tx Queue             <0>
Jul 22 02:35:26 mowgli kernel:   TDH                  <61>
Jul 22 02:35:26 mowgli kernel:   TDT                  <61>
Jul 22 02:35:26 mowgli kernel:   next_to_use          <61>
Jul 22 02:35:26 mowgli kernel:   next_to_clean        <75>
Jul 22 02:35:26 mowgli kernel: buffer_info[next_to_clean]
Jul 22 02:35:26 mowgli kernel:   time_stamp           <1e8878f>
Jul 22 02:35:26 mowgli kernel:   next_to_watch        <75>
Jul 22 02:35:26 mowgli kernel:   jiffies              <1e8a5c0>
Jul 22 02:35:26 mowgli kernel:   next_to_watch.status <0>
Jul 22 02:35:27 mowgli kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jul 22 02:35:30 mowgli kernel: e1000: eth0: e1000_watchdog: NIC Link is Up 1000
Mbps Full Duplex, Flow Control: RX/TX
Comment 1 Greg Morgan 2007-07-22 21:07:51 EDT
The working driver is
Jul 21 17:31:03 mowgli kernel: input: PC Speaker as /class/input/input3
Jul 21 17:31:03 mowgli kernel: Intel(R) PRO/1000 Network Driver - version
7.3.15-k2-NAPI
Jul 21 17:31:03 mowgli kernel: Copyright (c) 1999-2006 Intel Corporation.

The TK hang driver is 
Jul 21 17:22:59 mowgli kernel: Intel(R) PRO/1000 Network Driver - version
7.3.20-k2-NAPI
Jul 21 17:22:59 mowgli kernel: Copyright (c) 1999-2006 Intel Corporation.

uptime
 18:05:04 up 15:28,...
with no TX hang errors on the 7.3.15 driver while coping 320G from an NFS Intel
gigabit enabled server, if that helps.

Comment 2 Greg Morgan 2007-07-22 22:03:49 EDT
OK so I revised the Summary and provide you with one of those handy Time-Life
charts.  ;-)  The real issue here is that the Xen kernels available in my grub
menu do not generate the TX hang error messages while the regular fc7 kernels
generate the TX hang messages.  Trying to use gvim to create the table below
between reboots on an NFS mounted home directory was very unresponsive with the
TX hanging kernels.  The Summary message was updated accordingly.

Intel
Driver		Kernel					TK Hang Issues
7.3.15-k2-NAPI	/boot/vmlinuz-2.6.20-2925.11.fc7xen	Rock Solid
7.3.15-k2-NAPI	/boot/vmlinuz-2.6.20-2925.9.fc7xen	Rock Solid
7.3.20-k2-NAPI	/boot/vmlinuz-2.6.21-1.3228.fc7		TX Issues Encountered
7.3.20-k2-NAPI	/boot/vmlinuz-2.6.22.1-27.fc7		TX Issues Encountered
Comment 3 Chuck Ebbert 2007-07-23 15:46:55 EDT
One workaround to try is turning off TSO:

    # ethtool -K eth0 tso off



Comment 4 Christopher Brown 2007-09-20 07:00:02 EDT
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel? Did Chuck's suggestion
work for you?

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Cheers
Chris
Comment 5 Greg Morgan 2007-10-01 12:11:30 EDT
Chris,

Ack.  I'll check the machine tonight with the ethtool from comment #3.  Note
that this is the same host mentioned in bug 200656. I did perform some of the
ethool operations in that bug.  However, I know that I am still using the Xen
kernels at this point with no problems at all. The Xen Kernel e1000
7.3.15-k2-NAPI driver is very similar to the 7.3.15tdh code that was provided me
in bug 200656 comment #10.

Regards,
Greg
Comment 6 Christopher Brown 2007-11-19 10:16:12 EST
Hi Greg,

Any change using ethtool?

Cheers
Chris
Comment 7 Greg Morgan 2007-11-25 20:27:52 EST
Chris,  Sorry for the many delays.  The problem still exists with the ethtool
command.  This problem also exists in f8 as I posted here in bug 398921.  The
7.3.15-k2-NAPI Intel driver fixed the problem but the 7.3.20-k2-NAPI Intel
driver regressed? or added back a similar problem that creates the 
Nov 25 18:19:03 mowgli kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit
Hang
Nov 25 18:19:03 mowgli kernel:   Tx Queue             <0>
Nov 25 18:19:03 mowgli kernel:   TDH                  <5a>
Nov 25 18:19:03 mowgli kernel:   TDT                  <5a>
Nov 25 18:19:03 mowgli kernel:   next_to_use          <5a>
Nov 25 18:19:03 mowgli kernel:   next_to_clean        <6e>
Nov 25 18:19:03 mowgli kernel: buffer_info[next_to_clean]
Nov 25 18:19:03 mowgli kernel:   time_stamp           <37cf825>
Nov 25 18:19:03 mowgli kernel:   next_to_watch        <6e>
Nov 25 18:19:03 mowgli kernel:   jiffies              <37d1100>
Nov 25 18:19:03 mowgli kernel:   next_to_watch.status <0>
Nov 25 18:19:05 mowgli kernel: NETDEV WATCHDOG: eth0: transmit timed out
messages.

The problem occurs under load but can occur during web surfing.
Comment 8 Christopher Brown 2008-01-09 12:11:29 EST
Hi Greg,

Thanks for the update. I'm closing this as a dupe of bug 398921 then - thanks
for filing that one. 2.6.24 is just around the corner so we could see what that
brings ... :)

*** This bug has been marked as a duplicate of 398921 ***

Note You need to log in before you can comment on or make changes to this bug.