There are two Intel(R) PRO/1000 adapters in this machine, there's been a problem for a while but I kind of forgot about it because I used the other NICs on this machine instead. The devices had worked before, but if I remember correctly the got this issue around the F8/9 transition. AMD Athlon(tm) Dual Core Processor 4850e 2.6.30.8-64.fc11.x86_64 #1 SMP Somewhat similar to https://bugzilla.redhat.com/show_bug.cgi?id=504873 After a while adapters stop transmitting and the /var/log/messages will receive similar to this: Oct 9 17:12:33 sketch kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang Oct 9 17:12:33 sketch kernel: Tx Queue <0> Oct 9 17:12:33 sketch kernel: TDH <31> Oct 9 17:12:33 sketch kernel: TDT <41> Oct 9 17:12:33 sketch kernel: next_to_use <41> Oct 9 17:12:33 sketch kernel: next_to_clean <2e> Oct 9 17:12:33 sketch kernel: buffer_info[next_to_clean] Oct 9 17:12:33 sketch kernel: time_stamp <100fc5024> Oct 9 17:12:33 sketch kernel: next_to_watch <34> Oct 9 17:12:33 sketch kernel: jiffies <100fc547d> Oct 9 17:12:33 sketch kernel: next_to_watch.status <0> Oct 9 17:12:35 sketch kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang Oct 9 17:12:35 sketch kernel: Tx Queue <0> Oct 9 17:12:35 sketch kernel: TDH <31> Oct 9 17:12:35 sketch kernel: TDT <41> Oct 9 17:12:35 sketch kernel: next_to_use <41> Oct 9 17:12:35 sketch kernel: next_to_clean <2e> Oct 9 17:12:35 sketch kernel: buffer_info[next_to_clean] Oct 9 17:12:35 sketch kernel: time_stamp <100fc5024> Oct 9 17:12:35 sketch kernel: next_to_watch <34> Oct 9 17:12:35 sketch kernel: jiffies <100fc5c4d> Oct 9 17:12:35 sketch kernel: next_to_watch.status <0> Oct 9 17:12:37 sketch kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang Oct 9 17:12:37 sketch kernel: Tx Queue <0> Oct 9 17:12:37 sketch kernel: TDH <31> Oct 9 17:12:37 sketch kernel: TDT <41> Oct 9 17:12:37 sketch kernel: next_to_use <41> Oct 9 17:12:37 sketch kernel: next_to_clean <2e> Oct 9 17:12:37 sketch kernel: buffer_info[next_to_clean] Oct 9 17:12:37 sketch kernel: time_stamp <100fc5024> Oct 9 17:12:37 sketch kernel: next_to_watch <34> Oct 9 17:12:37 sketch kernel: jiffies <100fc641d> Oct 9 17:12:37 sketch kernel: next_to_watch.status <0>
Created attachment 364292 [details] lspci -tv
Created attachment 364293 [details] lspci -vvv -xxx
Created attachment 364294 [details] ethtool -i eth1
Created attachment 364295 [details] ethtool -e eth1
Created attachment 364296 [details] cat /proc/cpuinfo
This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
I experience this problem too, this time it's on Fedora 12, kernel kernel-2.6.32.11-99.fc12.x86_64 Log output: May 11 23:15:38 thunderbolt kernel: e1000: eth2: e1000_clean_tx_irq: Detected Tx Unit Hang May 11 23:15:38 thunderbolt kernel: Tx Queue <0> May 11 23:15:38 thunderbolt kernel: TDH <4b> May 11 23:15:38 thunderbolt kernel: TDT <53> May 11 23:15:38 thunderbolt kernel: next_to_use <53> May 11 23:15:38 thunderbolt kernel: next_to_clean <44> May 11 23:15:38 thunderbolt kernel: buffer_info[next_to_clean] May 11 23:15:38 thunderbolt kernel: time_stamp <ffffa41a> May 11 23:15:38 thunderbolt kernel: next_to_watch <4f> May 11 23:15:38 thunderbolt kernel: jiffies <ffffacca> May 11 23:15:38 thunderbolt kernel: next_to_watch.status <0> May 11 23:15:40 thunderbolt kernel: e1000: eth2: e1000_clean_tx_irq: Detected Tx Unit Hang May 11 23:15:40 thunderbolt kernel: Tx Queue <0> May 11 23:15:40 thunderbolt kernel: TDH <4b> May 11 23:15:40 thunderbolt kernel: TDT <53> May 11 23:15:40 thunderbolt kernel: next_to_use <53> May 11 23:15:40 thunderbolt kernel: next_to_clean <44> May 11 23:15:40 thunderbolt kernel: buffer_info[next_to_clean] May 11 23:15:40 thunderbolt kernel: time_stamp <ffffa41a> May 11 23:15:40 thunderbolt kernel: next_to_watch <4f> May 11 23:15:40 thunderbolt kernel: jiffies <ffffb49a> May 11 23:15:40 thunderbolt kernel: next_to_watch.status <0> May 11 23:15:42 thunderbolt kernel: e1000: eth2: e1000_clean_tx_irq: Detected Tx Unit Hang May 11 23:15:42 thunderbolt kernel: Tx Queue <0> May 11 23:15:42 thunderbolt kernel: TDH <4b> May 11 23:15:42 thunderbolt kernel: TDT <53> May 11 23:15:42 thunderbolt kernel: next_to_use <53> May 11 23:15:42 thunderbolt kernel: next_to_clean <44> May 11 23:15:42 thunderbolt kernel: buffer_info[next_to_clean] May 11 23:15:42 thunderbolt kernel: time_stamp <ffffa41a> May 11 23:15:42 thunderbolt kernel: next_to_watch <4f> May 11 23:15:42 thunderbolt kernel: jiffies <ffffbc6a> May 11 23:15:42 thunderbolt kernel: next_to_watch.status <0> Additionally: May 11 23:15:44 thunderbolt kernel: ------------[ cut here ]------------ May 11 23:15:44 thunderbolt kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xf3/0x164() May 11 23:15:44 thunderbolt kernel: Hardware name: P5K PRO May 11 23:15:44 thunderbolt kernel: NETDEV WATCHDOG: eth2 (e1000): transmit queue 0 timed out May 11 23:15:44 thunderbolt kernel: Modules linked in: ipt_MASQUERADE bridge stp llc nfsd nfs_acl auth_rpcgss exportfs autofs4 tun lockd sunrpc sit tunnel4 nf_conntrack_sane nf_conntrack_ftp nf_nat_sip nf_conntrack_sip xt_limit ipt_LOG iptable_nat nf_nat nf_conntrack_netbios_ns nf_conntrack_ipv6 ip6t_LOG ip6table_filter ip6_tables ipv6 nls_utf8 jfs kvm_intel kvm uinput tuner_simple tuner_types wm8775 tda9887 tda8290 tea5767 tuner cx25840 snd_hda_codec_realtek snd_hda_intel ivtv snd_hda_codec cx2341x v4l2_common videodev snd_hwdep v4l1_compat v4l2_compat_ioctl32 tveeprom snd_seq snd_seq_device snd_pcm shpchp snd_timer snd i2c_i801 iTCO_wdt e1000e iTCO_vendor_support soundcore e1000 snd_page_alloc asus_atk0110 sky2 serio_raw joydev raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 dm_multipath firewire_ohci firewire_core crc_itu_t nouveau ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: microcode] May 11 23:15:44 thunderbolt kernel: Pid: 5685, comm: qemu-kvm Not tainted 2.6.32.11-99.fc12.x86_64 #1 May 11 23:15:44 thunderbolt kernel: Call Trace: May 11 23:15:44 thunderbolt kernel: <IRQ> [<ffffffff81056330>] warn_slowpath_common+0x7c/0x94 May 11 23:15:44 thunderbolt kernel: [<ffffffff8105639f>] warn_slowpath_fmt+0x41/0x43 May 11 23:15:44 thunderbolt kernel: [<ffffffff813c660b>] ? netif_tx_lock+0x44/0x6d May 11 23:15:44 thunderbolt kernel: [<ffffffff813c6775>] dev_watchdog+0xf3/0x164 May 11 23:15:44 thunderbolt kernel: [<ffffffff81046089>] ? task_tick_fair+0x2d/0x127 May 11 23:15:44 thunderbolt kernel: [<ffffffff8106509c>] run_timer_softirq+0x1c4/0x268 May 11 23:15:44 thunderbolt kernel: [<ffffffff81026b9e>] ? apic_write+0x16/0x18 May 11 23:15:44 thunderbolt kernel: [<ffffffff8105d94c>] __do_softirq+0xe5/0x1a9 May 11 23:15:44 thunderbolt kernel: [<ffffffff810805e2>] ? tick_program_event+0x2a/0x2c May 11 23:15:44 thunderbolt kernel: [<ffffffff81012e6c>] call_softirq+0x1c/0x30 May 11 23:15:44 thunderbolt kernel: [<ffffffff810143ea>] do_softirq+0x46/0x86 May 11 23:15:44 thunderbolt kernel: [<ffffffff8105d78a>] irq_exit+0x3b/0x7d May 11 23:15:44 thunderbolt kernel: [<ffffffff8145a47a>] smp_apic_timer_interrupt+0x86/0x94 May 11 23:15:44 thunderbolt kernel: [<ffffffff81012833>] apic_timer_interrupt+0x13/0x20 May 11 23:15:44 thunderbolt kernel: <EOI> [<ffffffff811238b4>] ? pipe_write+0x3fb/0x42e May 11 23:15:44 thunderbolt kernel: [<ffffffff811238b4>] ? pipe_write+0x3fb/0x42e May 11 23:15:44 thunderbolt kernel: [<ffffffff8111b388>] ? do_sync_write+0xe8/0x125 May 11 23:15:44 thunderbolt kernel: [<ffffffff8107488b>] ? autoremove_wake_function+0x0/0x39 May 11 23:15:44 thunderbolt kernel: [<ffffffff811ef3c9>] ? selinux_file_permission+0xa7/0xb3 May 11 23:15:44 thunderbolt kernel: [<ffffffff811e571d>] ? security_file_permission+0x16/0x18 May 11 23:15:44 thunderbolt kernel: [<ffffffff8111b94c>] ? vfs_write+0xae/0x10b May 11 23:15:44 thunderbolt kernel: [<ffffffff8111ba69>] ? sys_write+0x4a/0x6e May 11 23:15:44 thunderbolt kernel: [<ffffffff81011d32>] ? system_call_fastpath+0x16/0x1b May 11 23:15:44 thunderbolt kernel: ---[ end trace d715ef3ca1fc0d76 ]---
Oh, and.. All I need to do is to generate some real data load (copying a bigger file using either SMB or SCP.). Infrastructure is that behind the ethernet card there is a Linksys Wireless Access Point and I'm working from my laptop on a Wireless G connection. The eth2 card is up to 1000Mb/s but is connected on 100Mb/s. Here's my ethtool on eth2: Settings for eth2: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 100Mb/s Duplex: Full Port: Twisted Pair PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: umbg Wake-on: g Current message level: 0x00000007 (7) Link detected: yes And here is the lspci -v|sed -n '/^0.*Ethernet/,/^$/p' (note that the third card is the one with the e1000 module): 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8056 PCI-E Gigabit Ethernet Controller (rev 12) Subsystem: ASUSTeK Computer Inc. Device 81f8 Flags: bus master, fast devsel, latency 0, IRQ 29 Memory at fe9fc000 (64-bit, non-prefetchable) [size=16K] I/O ports at c800 [size=256] Expansion ROM at fe9c0000 [disabled] [size=128K] Capabilities: [48] Power Management version 3 Capabilities: [50] Vital Product Data Capabilities: [5c] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] Express Legacy Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Kernel driver in use: sky2 Kernel modules: sky2 03:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) (rev 06) Subsystem: Intel Corporation PRO/1000 PT Desktop Adapter Flags: bus master, fast devsel, latency 0, IRQ 30 Memory at feae0000 (32-bit, non-prefetchable) [size=128K] Memory at feac0000 (32-bit, non-prefetchable) [size=128K] I/O ports at dc00 [size=32] Expansion ROM at feaa0000 [disabled] [size=128K] Capabilities: [c8] Power Management version 2 Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [e0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [140] Device Serial Number 00-15-17-ff-ff-16-90-3d Kernel driver in use: e1000e Kernel modules: e1000e 05:02.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05) Subsystem: Intel Corporation PRO/1000 GT Desktop Adapter Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 18 Memory at febe0000 (32-bit, non-prefetchable) [size=128K] Memory at febc0000 (32-bit, non-prefetchable) [size=128K] I/O ports at ec00 [size=64] [virtual] Expansion ROM at feb00000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Capabilities: [e4] PCI-X non-bridge device Kernel driver in use: e1000 Kernel modules: e1000
I'm having this problem too: May 12 21:27:24 storage kernel: e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang May 12 21:27:24 storage kernel: Tx Queue <0> May 12 21:27:24 storage kernel: TDH <3a> May 12 21:27:24 storage kernel: TDT <6a> May 12 21:27:24 storage kernel: next_to_use <6a> May 12 21:27:24 storage kernel: next_to_clean <37> May 12 21:27:24 storage kernel: buffer_info[next_to_clean] May 12 21:27:24 storage kernel: time_stamp <11f34738c> May 12 21:27:24 storage kernel: next_to_watch <3a> May 12 21:27:24 storage kernel: jiffies <11f348941> May 12 21:27:24 storage kernel: next_to_watch.status <0> I have the e1000 hanging off a bridge (so that my VMs are on my LAN), with a static IP address, with IPv4 and IPv6 enabled. It occurs _very_ frequently when loading the interface (like every 2 seconds to every 10 seconds), and a bit less frequently when not loading the interface (like every minute) Thing is, I switched from the rtl8111 to e1000 since the rtl was giving me hard hangs... This one is hardly better. Really really annoying, esp. since this is my storage server :-( I vote for a _much_ higher priority kernel 2.6.32.11-99.fc12.x86_64
btw I'm on F12
@ben: do you still have this issue or was it fixed for you? updated the version in the bug to 12 as multiple users are reporting similar problems. If you're having this issue we need to know the exact hardware you have, so please attach an lspci -vvv as well as dmidecode output. If we have your exact kernel version we might be able to build you a driver that can output some debug information.
Running kernel-2.6.32.11-99.fc12.x86_64 Following will be my output of dmidecode and lspci -vvv
Created attachment 413584 [details] lspci output (thunderbolt)
Created attachment 413585 [details] dmidecode (thunderbolt)
got a stack trace now too, and will attach lspci and dmidecode. May 13 09:17:30 storage kernel: ------------[ cut here ]------------ May 13 09:17:30 storage kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0xf3/0x164() May 13 09:17:30 storage kernel: Hardware name: MS-7576 May 13 09:17:30 storage kernel: NETDEV WATCHDOG: eth1 (e1000): transmit queue 0 timed out May 13 09:17:30 storage kernel: Modules linked in: fuse tun nfs fscache ipt_MASQUERADE iptable_nat nf_nat nfsd nfs_acl auth_rpcgss exportfs lockd sunrpc cpufreq_ondemand powernow_k8 freq_table bridge stp llc xt_physdev ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 microcode kvm_amd kvm uinput snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq joydev snd_seq_device snd_pcm snd_timer e1000 i2c_piix4 r8169 snd soundcore snd_page_alloc edac_core mii edac_mce_amd shpchp serio_raw dm_multipath ata_generic pata_acpi pata_atiixp arcmsr firewire_ohci firewire_core crc_itu_t radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan] May 13 09:17:30 storage kernel: Pid: 0, comm: swapper Not tainted 2.6.32.11-99.fc12.x86_64 #1 May 13 09:17:30 storage kernel: Call Trace: May 13 09:17:30 storage kernel: <IRQ> [<ffffffff81056330>] warn_slowpath_common+0x7c/0x94 May 13 09:17:30 storage kernel: [<ffffffff8105639f>] warn_slowpath_fmt+0x41/0x43 May 13 09:17:30 storage kernel: [<ffffffff813c660b>] ? netif_tx_lock+0x44/0x6d May 13 09:17:30 storage kernel: [<ffffffff813c6775>] dev_watchdog+0xf3/0x164 May 13 09:17:30 storage kernel: [<ffffffff81070702>] ? __queue_work+0x3a/0x41 May 13 09:17:30 storage kernel: [<ffffffff8106509c>] run_timer_softirq+0x1c4/0x268 May 13 09:17:30 storage kernel: [<ffffffff8105d94c>] __do_softirq+0xe5/0x1a9 May 13 09:17:30 storage kernel: [<ffffffff810ace7d>] ? handle_IRQ_event+0x60/0x121 May 13 09:17:30 storage kernel: [<ffffffff81012e6c>] call_softirq+0x1c/0x30 May 13 09:17:30 storage kernel: [<ffffffff810143ea>] do_softirq+0x46/0x86 May 13 09:17:30 storage kernel: [<ffffffff8105d78a>] irq_exit+0x3b/0x7d May 13 09:17:30 storage kernel: [<ffffffff8145a3dd>] do_IRQ+0xa5/0xbc May 13 09:17:30 storage kernel: [<ffffffff81012693>] ret_from_intr+0x0/0x11 May 13 09:17:30 storage kernel: <EOI> [<ffffffff8103020d>] ? native_safe_halt+0xb/0xd May 13 09:17:30 storage kernel: [<ffffffff81018f37>] ? default_idle+0x36/0x53 May 13 09:17:30 storage kernel: [<ffffffff8101902d>] ? c1e_idle+0xd9/0x102 May 13 09:17:30 storage kernel: [<ffffffff81010cdd>] ? cpu_idle+0xaa/0xe4 May 13 09:17:30 storage kernel: [<ffffffff8144e0ed>] ? start_secondary+0x1f2/0x233 May 13 09:17:30 storage kernel: ---[ end trace b045d4a237d2d973 ]--- kernel 2.6.32.11-99.fc12 (x86_64)
Created attachment 413656 [details] lspci -vvv
Created attachment 413657 [details] dmidecode
just saw this, I think it's related https://bugzilla.kernel.org/show_bug.cgi?id=15704
@Jesse I've since moved the NICs to an intel server, without problems there. Seemed to be in issue with AMD CPUs, and still looks like it? So I'm not able to test this on F11/12 on a machine with AMD. Hopefully this can be resolved with the others.
For those of you on this bug with athlon cpus, can you please try the 8.0.16 e1000 driver from e1000.sourceforge.net, verify that the issue still occurs, then reload the driver with the ignore_64bit_dma=1 module option set.
do you mean amd cpus of specifically only athlons?
I am sorry for the confusion, I meant anyone with an AMD cpu.
In response to comment 19, I'm using an Intel machine with these problems. So it's premature (or even false) to diagnose it's AMD only. Is there a major difference between AMD and Intel, seen from the point of view of e1000 drivers? What could I do to help further (running on an Intel machine, that is)?
@jesse yesterday i switched to the 8.0.19 driver with the ignore_64bit_dma=1 module option set. I haven't been able to reproduce the tx unit hang since and everything is peachy so far. I'd love it if this option would be made a default....
it has now been 4 days and I've not seen the problem anymore. setting the ignore option on the new driver fixes the problem for me
ping. anybody working on this? this bug is now over 8 months old and it appears that there is a solution! what's up with the delay in getting this fixed with an update??? I have not seem the problem since I upgraded to the new driver with the option set
I had the same issue with e1000e and it seems to be fixed by installing driver in 1.2.10-NAPI version from Intel download center : http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=15817 I also used dracut to add the compiled module in my initramfs # tar xzf e1000e-1.2.10.tar.gz # make install # mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r)-old.img # dracut My Ethernet controller is an Intel PCI-E Gigabit 82574L
This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.