Description of problem: Please see f5 bug 200656, ES bug 248787, and f6 bug 219496 . I am working on bug 249136 concerning USB problems. The f7 Zen Kernel 2.6.20-2925.11.fc7xen has a problem with the USB issue. When I rebooted to the new 2.6.22.1-27 kernel to test the kernel as fix for USB issues, my tx hang issues on the same hardware in bug 200656 appeared. However, the 2.6.20-2925.11.fc7xen kernel is rock solid. I am coping a bunch of wav files from my NFS server to a 400gig USB drive when the error occurs. However, simple web surfing can also cause problems. Version-Release number of selected component (if applicable): 2.6.22.1-27 kernel How reproducible: Switching from the 2.6.22.1-27 kernel back to the 2.6.20-2925.11.fc7xen then back to the 2.6.22.1-27 kernel will produce the error, fix the error and the produce the error respectfully. Steps to Reproduce: 1.Upgrade to the 2.6.22.1-27 kernel with an e1000 card. 2. 3. Actual results: Study performance on a heavy load. Expected results: Long delays in coping data or surfing the web. Additional info: I just closed the bug on FC5 yesterday thinking that the problem was gone. Jul 22 02:35:26 mowgli kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Jul 22 02:35:26 mowgli kernel: Tx Queue <0> Jul 22 02:35:26 mowgli kernel: TDH <61> Jul 22 02:35:26 mowgli kernel: TDT <61> Jul 22 02:35:26 mowgli kernel: next_to_use <61> Jul 22 02:35:26 mowgli kernel: next_to_clean <75> Jul 22 02:35:26 mowgli kernel: buffer_info[next_to_clean] Jul 22 02:35:26 mowgli kernel: time_stamp <1e8878f> Jul 22 02:35:26 mowgli kernel: next_to_watch <75> Jul 22 02:35:26 mowgli kernel: jiffies <1e8a5c0> Jul 22 02:35:26 mowgli kernel: next_to_watch.status <0> Jul 22 02:35:27 mowgli kernel: NETDEV WATCHDOG: eth0: transmit timed out Jul 22 02:35:30 mowgli kernel: e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
The working driver is Jul 21 17:31:03 mowgli kernel: input: PC Speaker as /class/input/input3 Jul 21 17:31:03 mowgli kernel: Intel(R) PRO/1000 Network Driver - version 7.3.15-k2-NAPI Jul 21 17:31:03 mowgli kernel: Copyright (c) 1999-2006 Intel Corporation. The TK hang driver is Jul 21 17:22:59 mowgli kernel: Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI Jul 21 17:22:59 mowgli kernel: Copyright (c) 1999-2006 Intel Corporation. uptime 18:05:04 up 15:28,... with no TX hang errors on the 7.3.15 driver while coping 320G from an NFS Intel gigabit enabled server, if that helps.
OK so I revised the Summary and provide you with one of those handy Time-Life charts. ;-) The real issue here is that the Xen kernels available in my grub menu do not generate the TX hang error messages while the regular fc7 kernels generate the TX hang messages. Trying to use gvim to create the table below between reboots on an NFS mounted home directory was very unresponsive with the TX hanging kernels. The Summary message was updated accordingly. Intel Driver Kernel TK Hang Issues 7.3.15-k2-NAPI /boot/vmlinuz-2.6.20-2925.11.fc7xen Rock Solid 7.3.15-k2-NAPI /boot/vmlinuz-2.6.20-2925.9.fc7xen Rock Solid 7.3.20-k2-NAPI /boot/vmlinuz-2.6.21-1.3228.fc7 TX Issues Encountered 7.3.20-k2-NAPI /boot/vmlinuz-2.6.22.1-27.fc7 TX Issues Encountered
One workaround to try is turning off TSO: # ethtool -K eth0 tso off
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest kernel? Did Chuck's suggestion work for you? If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged. Cheers Chris
Chris, Ack. I'll check the machine tonight with the ethtool from comment #3. Note that this is the same host mentioned in bug 200656. I did perform some of the ethool operations in that bug. However, I know that I am still using the Xen kernels at this point with no problems at all. The Xen Kernel e1000 7.3.15-k2-NAPI driver is very similar to the 7.3.15tdh code that was provided me in bug 200656 comment #10. Regards, Greg
Hi Greg, Any change using ethtool? Cheers Chris
Chris, Sorry for the many delays. The problem still exists with the ethtool command. This problem also exists in f8 as I posted here in bug 398921. The 7.3.15-k2-NAPI Intel driver fixed the problem but the 7.3.20-k2-NAPI Intel driver regressed? or added back a similar problem that creates the Nov 25 18:19:03 mowgli kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Nov 25 18:19:03 mowgli kernel: Tx Queue <0> Nov 25 18:19:03 mowgli kernel: TDH <5a> Nov 25 18:19:03 mowgli kernel: TDT <5a> Nov 25 18:19:03 mowgli kernel: next_to_use <5a> Nov 25 18:19:03 mowgli kernel: next_to_clean <6e> Nov 25 18:19:03 mowgli kernel: buffer_info[next_to_clean] Nov 25 18:19:03 mowgli kernel: time_stamp <37cf825> Nov 25 18:19:03 mowgli kernel: next_to_watch <6e> Nov 25 18:19:03 mowgli kernel: jiffies <37d1100> Nov 25 18:19:03 mowgli kernel: next_to_watch.status <0> Nov 25 18:19:05 mowgli kernel: NETDEV WATCHDOG: eth0: transmit timed out messages. The problem occurs under load but can occur during web surfing.
Hi Greg, Thanks for the update. I'm closing this as a dupe of bug 398921 then - thanks for filing that one. 2.6.24 is just around the corner so we could see what that brings ... :) *** This bug has been marked as a duplicate of 398921 ***