Description of problem: Please see f5 bug 200656, ES bug 248787, f6 bug 219496, and f7 bug 249185. My tx hang issues on the same hardware in bug 200656 continue on the F8 kernel. I am coping a bunch of wav files from my NFS server to a 400gig USB drive when the error occurs. However, simple web surfing can also cause problems. e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Version-Release number of selected component (if applicable): Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI 2.6.23.1-42.fc8 How reproducible: e1000 driver on selected hardware. The 7.3.15-k2-NAPI Intel kernel driver does not have this problem. I have not tried Xen on f8 yet. The Xen kernel ran the 7.3.15 driver and was a work around for the problem. History of the Problem on the ECS AMD Sempron Motherboard. Intel Driver Kernel TK Hang Issues 7.3.15-k2-NAPI Fedora 5 kernel ? Rock Solid 7.3.??-k2-NAPI Fedora 6 kernel ? Not installed. 7.3.15-k2-NAPI /boot/vmlinuz-2.6.20-2925.11.fc7xen Rock Solid 7.3.15-k2-NAPI /boot/vmlinuz-2.6.20-2925.9.fc7xen Rock Solid 7.3.20-k2-NAPI /boot/vmlinuz-2.6.21-1.3228.fc7 TX Issues Encountered 7.3.20-k2-NAPI /boot/vmlinuz-2.6.22.1-27.fc7 TX Issues Encountered 7.3.20-k2-NAPI /boot/vmlinuz-2.6.23.1-42.fc8 TX Issues Encountered I tried bug 249185 Comment #3 as posted by Chuck Ebbert of Red Hat suggestion to use " One workaround to try is turning off TSO: # ethtool -K eth0 tso off The problem still exists with the ethtool command modifications. Also in the f5 post the Intel adapter was evaluated for firmware fix problems. The Intel adapter did not have these problems either. I think I picked up a no-name gigabit card that I can try as a work around or if my wife is off her computer, I can try installing the Xen kernel to see if the 7.3.15-k2-NAPI is still available. Please advise.
Oh full message in the log file Nov 25 18:19:03 mowgli kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Nov 25 18:19:03 mowgli kernel: Tx Queue <0> Nov 25 18:19:03 mowgli kernel: TDH <5a> Nov 25 18:19:03 mowgli kernel: TDT <5a> Nov 25 18:19:03 mowgli kernel: next_to_use <5a> Nov 25 18:19:03 mowgli kernel: next_to_clean <6e> Nov 25 18:19:03 mowgli kernel: buffer_info[next_to_clean] Nov 25 18:19:03 mowgli kernel: time_stamp <37cf825> Nov 25 18:19:03 mowgli kernel: next_to_watch <6e> Nov 25 18:19:03 mowgli kernel: jiffies <37d1100> Nov 25 18:19:03 mowgli kernel: next_to_watch.status <0> Nov 25 18:19:05 mowgli kernel: NETDEV WATCHDOG: eth0: transmit timed out messages. Bug 249185 was updated with the same information.
Can you try adding this line to /etc/modprobe.conf and then rebooting? options e1000 InterruptThrottleRate=0 Some other workarounds are also at: http://sourceforge.net/tracker/index.php?func=detail&aid=1463045&group_id=42302&atid=447449]
Also related to bug 400561
I'd like to point that at least Intel ESB2/Gilgal (82563EB) NIC (for instance, this NIC is used on Supermicro motherboards like this: http://www.supermicro.com/products/motherboard/Xeon1333/5000V/X7DVL-E.cfm) requires at driver version 7.6.5-NAPI or later. Although driver versions before 7.6.5-NAPI announce support for 0x8086:0x1096 the fact is that the system with such a NIC becomes unreachable via network in 5-10 minutes after the boot. I have also reported the same bug on OpenVZ bugzilla: http://bugzilla.openvz.org/show_bug.cgi?id=530#c6
I updated bug 248787 Comment #12. Essentially, the network service died when trying to copy 346gig to a usb 2.0 drive on the client. I've never seen that before with this problem. I wonder if the reason "# ethtool -K eth0 tso off" did not work for me is because I have an early generation chip. lspci 00:09.0 Ethernet controller: Intel Corporation 82544GC Gigabit Ethernet Controller (Copper) (rev 02) As per bug 200656 comment #3 1.) I still have a reduced number of interrupts configured in the bios. 2.) This modprobe line has worked in the pasted options e1000 XsumRX=0 Speed=1000 Duplex=2 InterruptThrottleRate=0 FlowControl=3 RxDescriptors=4096 TxDescriptors=4096 RxIntDelay=0 TxIntDelay=0 I will be happy try try Chuck Ebbert modprobe.conf setting as per Comment #2 above. Note that I am the one posting the modprobe.conf configration in the http://sourceforge.net/tracker/index.php?func=detail&aid=1463045&group_id=42302&atid=447449 link as dr_kludge. ;-) 3.) Swapping cards proved that there were no hardware problems with the Intel Card. The card that I have swapped with before is a 00:0b.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02) As per bug 200656 comment #10 the test driver 7.3.15_tdhdump-NAPIsolved the problem. My statement in bug 200656 Comment #12 was incorrect. I spoke too soon and closed the bug report. Some additional information. ethtool -e eth0 Offset Values ------ ------ 0x0000 00 02 b3 96 09 9b 20 02 ff ff ff ff ff ff ff ff 0x0010 29 a6 07 47 0b 66 12 11 86 80 0c 10 86 80 04 f2 0x0020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0040 ff db 11 00 11 37 ff ff ff ff ff ff ff ff ff ff 0x0050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0060 fc 00 00 40 0f 10 ff ff ff ff ff ff ff ff ff ff 0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 76 b9 lspci -vv 00:09.0 Ethernet controller: Intel Corporation 82544GC Gigabit Ethernet Controller (Copper) (rev 02) Subsystem: Intel Corporation PRO/1000 T Desktop Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 32 (63750ns min), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 19 Region 0: Memory at eb020000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at eb000000 (32-bit, non-prefetchable) [size=128K] Region 2: I/O ports at c000 [size=32] [virtual] Expansion ROM at 58000000 [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=1 PME- Capabilities: [e4] PCI-X non-bridge device Command: DPERE- ERO+ RBC=512 OST=1 Status: Dev=00:00.0 64bit- 133MHz- SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=8 RSCEM- 266MHz- 533MHz- Capabilities: [f0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable- Address: 0000000000000000 Data: 0000
Just cus I posted a bunch of junk in comment #5 above, I'll follow these steps and report back as time permits. 1.) Try just the /etc/modprobe.conf alias eth0 e1000 options e1000 InterruptThrottleRate=0 and reboot. 2.) Try the longer options e1000 XsumRX=0 Speed=1000 Duplex=2 InterruptThrottleRate=0 FlowControl=3 RxDescriptors=4096 TxDescriptors=4096 RxIntDelay=0 TxIntDelay=0 3.) Try the f8 Xen kernel. I have not installed it yet. However, in the prior Fedora series, the Xen kernel used a driver that fixed the problem while the non-Xen kernel prodcued the tx hang messages and performance issues. 4.) Try the f8 kernel-2.6.23.9-85.fc8 test kernel as noted in bug 400561 comment #23 via su -c 'yum --enablerepo=updates-testing update kernel'
As per comment #6 I implemented 1.) Try just the /etc/modprobe.conf alias eth0 e1000 options e1000 InterruptThrottleRate=0 and reboot. I performed an yum update, there were three packages to install and started the same copy as reported in this initial bug report. The only change was to the go directly to a ide hard drive verses a usb 2.0 hard drive. The results are Dec 12 23:43:33 mowgli kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Dec 12 23:43:33 mowgli kernel: Tx Queue <0> Dec 12 23:43:33 mowgli kernel: TDH <f> Dec 12 23:43:33 mowgli kernel: TDT <f> Dec 12 23:43:33 mowgli kernel: next_to_use <f> Dec 12 23:43:33 mowgli kernel: next_to_clean <22> Dec 12 23:43:33 mowgli kernel: buffer_info[next_to_clean] Dec 12 23:43:33 mowgli kernel: time_stamp <62ee7> Dec 12 23:43:33 mowgli kernel: next_to_watch <22> Dec 12 23:43:33 mowgli kernel: jiffies <635d8> Dec 12 23:43:33 mowgli kernel: next_to_watch.status <0>
As per comment #6 I implemented 2.) Try the longer options e1000 XsumRX=0 Speed=1000 Duplex=2 InterruptThrottleRate=0 FlowControl=3 RxDescriptors=4096 TxDescriptors=4096 RxIntDelay=0 TxIntDelay=0 I started the same copy as reported in this initial bug report. The only change was to the go directly to a ide hard drive verses a usb 2.0 hard drive. The results are Dec 12 23:57:49 mowgli kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Dec 12 23:57:49 mowgli kernel: Tx Queue <0> Dec 12 23:57:49 mowgli kernel: TDH <a25> Dec 12 23:57:49 mowgli kernel: TDT <a25> Dec 12 23:57:49 mowgli kernel: next_to_use <a25> Dec 12 23:57:49 mowgli kernel: next_to_clean <a39> Dec 12 23:57:49 mowgli kernel: buffer_info[next_to_clean] Dec 12 23:57:49 mowgli kernel: time_stamp <ffffad8f> Dec 12 23:57:49 mowgli kernel: next_to_watch <a39> Dec 12 23:57:49 mowgli kernel: jiffies <ffffb888> Dec 12 23:57:49 mowgli kernel: next_to_watch.status <0> Just for grins here's the log file showing all the modprobe options being implemented correctly. Dec 12 23:54:22 mowgli kernel: Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI Dec 12 23:54:22 mowgli kernel: Copyright (c) 1999-2006 Intel Corporation. Dec 12 23:54:22 mowgli kernel: ACPI: PCI Interrupt 0000:00:09.0[A] -> GSI 17 (level, low) -> IRQ 19 Dec 12 23:54:22 mowgli kernel: e1000: 0000:00:09.0: e1000_validate_option: Transmit Descriptors set to 4096 Dec 12 23:54:22 mowgli kernel: e1000: 0000:00:09.0: e1000_validate_option: Receive Descriptors set to 4096 Dec 12 23:54:22 mowgli kernel: e1000: 0000:00:09.0: e1000_validate_option: Checksum Offload Disabled Dec 12 23:54:22 mowgli kernel: e1000: 0000:00:09.0: e1000_validate_option: Flow Control Enabled Dec 12 23:54:22 mowgli kernel: e1000: 0000:00:09.0: e1000_validate_option: Transmit Interrupt Delay set to 0 Dec 12 23:54:22 mowgli kernel: e1000: 0000:00:09.0: e1000_validate_option: Receive Interrupt Delay set to 0 Dec 12 23:54:22 mowgli kernel: e1000: 0000:00:09.0: e1000_check_options: Interrupt Throttling Rate (ints/sec) turned off Dec 12 23:54:22 mowgli kernel: e1000: 0000:00:09.0: e1000_check_copper_options: Using Autonegotiation at 1000 Mbps Full Duplex only Dec 12 23:54:22 mowgli kernel: e1000: 0000:00:09.0: e1000_probe: (PCI:33MHz:32-bit) 00:02:b3:96:09:9b Installing Xen kernels next...
This is a new development. As per comment #6 I implemented 3.) Try the f8 Xen kernel. I have not installed it yet. However, in the prior Fedora series, the Xen kernel used a driver that fixed the problem while the non-Xen kernel prodcued the tx hang messages and performance issues. by using yum install xen-libs.i386 kernel-xen.i686 kernel-xen-2.6-doc.noarch kernel-xen-devel.i686 xen.i386 xen-devel.i386 Also note that the options e1000 XsumRX=0 Speed=1000 Duplex=2 InterruptThrottleRate=0 FlowControl=3 RxDescriptors=4096 TxDescriptors=4096 RxIntDelay=0 TxIntDelay=0 were in effect. Now this is very interesting. The Xen kernel has been updated to the same Intel driver that the stock kernel is using and produces the tx unit hangs. Dec 13 00:11:30 mowgli kernel: Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI ****but**** where du -sh reports that I only copied __58M__ and received a tx unit hang message, the Xen kernel has let me copy __5.2G__ without a single tx unit hang message. How does the Xen eth0, peth0, virbr0 combination of drivers prevent the tx hang messages? Because now it looks like the driver is a problem in a stock kernel but Xen shields the problem away from the system in a Xen kernel!? Let me try another test on this Zen kernel with out the modprobe settings. I am now up to 8.9G during a runtime of 26 minutes before the reboot.
This is a test of comment #6 and comment #9 without the modprobe settings as shown in log file Dec 13 00:45:01 mowgli kernel: Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI Dec 13 00:45:01 mowgli kernel: Copyright (c) 1999-2006 Intel Corporation. Dec 13 00:45:01 mowgli kernel: ACPI: PCI Interrupt 0000:00:09.0[A] -> GSI 17 (level, low) -> IRQ 19 Dec 13 00:45:01 mowgli kernel: e1000: 0000:00:09.0: e1000_probe: (PCI:33MHz:32-bit) 00:02:b3:96:09:9b Dec 13 00:45:01 mowgli kernel: e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection I am already into 9 minutes of uptime and 2.2G copied without a tx hang message. I'll try the "yum --enablerepo=updates-testing update kernel" in the next couple of days. For now I'll set grub to use the Xen kernel because it is working with or without the modprobe settings.
The 2.6.23.9-85.fc8 was pushed to stable by the time I installed. Linux version 2.6.23.9-85.fc8 ... Dec 22 23:28:50 mowgli kernel: Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI ... Dec 22 23:35:23 mowgli kernel: NETDEV WATCHDOG: eth0: transmit timed out Dec 22 23:35:23 mowgli kernel: NETDEV WATCHDOG: eth0: transmit timed out Dec 22 23:35:23 mowgli kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Dec 22 23:35:23 mowgli kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Dec 22 23:35:23 mowgli kernel: Tx Queue <0> Dec 22 23:35:23 mowgli kernel: Tx Queue <0> Dec 22 23:35:23 mowgli kernel: TDH <b> Dec 22 23:35:23 mowgli kernel: TDH <b> Dec 22 23:35:23 mowgli kernel: TDT <b> Dec 22 23:35:23 mowgli kernel: TDT <b> Dec 22 23:35:23 mowgli kernel: next_to_use <b> Dec 22 23:35:23 mowgli kernel: next_to_use <b> Dec 22 23:35:23 mowgli kernel: next_to_clean <1f> Dec 22 23:35:23 mowgli kernel: next_to_clean <1f> Dec 22 23:35:23 mowgli kernel: buffer_info[next_to_clean] Dec 22 23:35:23 mowgli kernel: buffer_info[next_to_clean] Dec 22 23:35:23 mowgli kernel: time_stamp <27c56> Dec 22 23:35:23 mowgli kernel: time_stamp <27c56> Dec 22 23:35:23 mowgli kernel: next_to_watch <1f> Dec 22 23:35:23 mowgli kernel: next_to_watch <1f> Dec 22 23:35:23 mowgli kernel: jiffies <29810> Dec 22 23:35:23 mowgli kernel: jiffies <29810> Dec 22 23:35:23 mowgli kernel: next_to_watch.status <0> Dec 22 23:35:23 mowgli kernel: next_to_watch.status <0> Dec 22 23:35:26 mowgli kernel: e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX Dec 22 23:35:26 mowgli kernel: e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX I will try the Xen Kernel again.... Linux version 2.6.21-2952.fc8xen ... Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI ... and there were no problems. I also posted information in bug 400561 comment #26 and bug 400561 comment #27.
*** Bug 249185 has been marked as a duplicate of this bug. ***
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest kernel?
In response to Christopher's question in comment #13, comment #9 was the illuminating development. With both the Xen Kernel and the normal kernel using the same version of the Intel e1000 driver, the Xen Kernel "somehow buffers" the e1000 and prevents the TX Unit Hang. Moreover, if I perform updates with yum and forget to check to see if the kernel changed and thus the grub menu, then a non-Xen kernel will generate the TX Unit Hang message. Kernel 2.6.24.3-12.fc8 generated the messages below. My wife said her a web page was freezing when I realized that a non-Xen kernel was being used with this hardware. Mar 10 07:03:07 mowgli kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang Mar 10 07:03:07 mowgli kernel: Tx Queue <0> Mar 10 07:03:07 mowgli kernel: Tx Queue <0> Mar 10 07:03:07 mowgli kernel: TDH <b0> Mar 10 07:03:07 mowgli kernel: TDH <b0> Mar 10 07:03:07 mowgli kernel: TDT <b0> Mar 10 07:03:07 mowgli kernel: TDT <b0> Mar 10 07:03:07 mowgli kernel: next_to_use <b0> Mar 10 07:03:07 mowgli kernel: next_to_use <b0> Mar 10 07:03:07 mowgli kernel: next_to_clean <6f> Mar 10 07:03:07 mowgli kernel: next_to_clean <6f> Mar 10 07:03:07 mowgli kernel: buffer_info[next_to_clean] Mar 10 07:03:07 mowgli kernel: buffer_info[next_to_clean] Mar 10 07:03:07 mowgli kernel: time_stamp <5e90259f> Mar 10 07:03:07 mowgli kernel: time_stamp <5e90259f> Mar 10 07:03:07 mowgli kernel: next_to_watch <6f> Mar 10 07:03:07 mowgli kernel: next_to_watch <6f> Mar 10 07:03:07 mowgli kernel: jiffies <5e904468> Mar 10 07:03:07 mowgli kernel: jiffies <5e904468> Mar 10 07:03:07 mowgli kernel: next_to_watch.status <0> Mar 10 07:03:07 mowgli kernel: next_to_watch.status <0>
In response to Christopher's question in comment #13, here's some additional information on my experience with the TH Unit Hang issue. I've posted several reports but two that may be of interest are here http://sourceforge.net/tracker/index.php?func=detail&aid=1463045&group_id=42302&atid=447449 and here https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=200656#c11 IN the SF tracker a POC driver was posted as "e1000-7.3.15tdh.tar.gz" The "driver may help if TDH==TDT in your tx hang" and so from above comment #14 we have Mar 10 07:03:07 mowgli kernel: TDH <b0> Mar 10 07:03:07 mowgli kernel: TDH <b0> Mar 10 07:03:07 mowgli kernel: TDT <b0> Mar 10 07:03:07 mowgli kernel: TDT <b0> The driver code is located here http://sourceforge.net/tracker/download.php?group_id=42302&atid=447449&file_id=198849&aid=1463045 go_jessie went on to say, "It is not our final version of the fix, and probably will only help people that have the signature in their traces of TDH <cb> TDT <cb> where TDH equals TDT. "if your TDH does not equal TDT then it is likely you are having a hardware problem for some reason or another. "the TDHclean driver may well have some problems, as it has not been tested as thoroughly as our production drivers. It is more of a proof of concept. Unfortunately I haven't had time yet to figure out a way to integrate it into our production code. Your info is very useful however, as it does point to some problem in the TDH based clean up code. "We still don't have any systems here to reproduce this error (i.e. it is fairly rare, and system dependent) I was puzzled at the thought that a buggy BIOS may be part of the problem especially since as I understand it, the kernel replaces most all of the single task BIOS code with multi task kernel code of its own. Once again, the thing that is most interesting is that since a reboot into Xen kernel, Mar 10 08:20:45 mowgli kernel: Linux version 2.6.21-2952.fc8xen (kojibuilder.redhat.com) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1 SMP Mon Nov 19 07:06:55 EST 2007, I have had no problems. Both the Xen and non-Xen kernel show the same Intel driver version respectively, but for some reason the Xen kernel does not produce the TH Unit hang messages. That is to me the most illuminating new information. Linux version 2.6.21-2952.fc8xen (kojibuilder.redhat.com) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) Mar 10 08:20:45 mowgli kernel: Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI Linux version 2.6.23.14-107.fc8 (mockbuild.phx.redhat.com) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33) Feb 13 18:50:22 mowgli kernel: Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI
we have quite a bit more information about this bug than we did in the past. we have been able to confirm that on 2.6.18 stock kernels on some AMD systems the memory the driver allocates for tx_ring->desc using pci_alloc_consistent is not actually consistent as the linux kernel guarantees it to be. Using bus analyzers and extra driver debugging we can see that the driver updates the ->desc memory and then tells the adapter to fetch it. The adapter then does a DMA and sees the *previous* version of that memory location. This has to be a misconfiguration of either the memory controller inside the processor, or somehow a miscommunication where the Host Bridge does not send the snoop cycles to the memory controller to let it know there are DMA transactions going to main memory. my theory at this point is that xen is either setting up the processor with mtrr (see cat /proc/mtrr for both kernels) in a better way, or something else that is similar. we are having a difficult time getting any technical documentation (and expertise) for suggesting a fix to this issue for non-xen kernels.
I used find and a small bash script to create two files during a boot of the Xen and non-Xen kernels. While using a gvim -d on the two files, I found it interesting that even cpu MHZ info was slightly different between the two kernels. I understand that some of these differences come from the drivers that will be loaded. However, I'd think that the iomem would report the same system memory. Also note the difference of the timer_stats versions. Is there something more that would like from these files besides the information below or do you have another tool that you'd like me to use? Xen kernel ./mtrr reg00: base=0x00000000 ( 0MB), size=1024MB: write-back, count=0 reg01: base=0xe0000000 (3584MB), size= 128MB: write-combining, count=0 reg02: base=0xd0000000 (3328MB), size= 128MB: write-combining, count=1 ./timer_stats Timer Stats Version: v0.1 ./buddyinfo Node 0, zone DMA 1 433 582 474 387 142 63 37 24 1 98 Node 0, zone HighMem 0 1 0 1 0 1 1 0 0 0 0 ./slabinfo slabinfo - version: 2.1 ./iomem 00000000-0009efff : System RAM 00000000-00000000 : Crash kernel 0009fc00-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000cefff : Video ROM 000d0000-000d3fff : pnp 00:00 000d8000-000d97ff : Adapter ROM 000d9800-000dbfff : pnp 00:00 000f0000-000fffff : System ROM 00100000-3ffeffff : System RAM 3fff0000-3fff2fff : ACPI Non-volatile Storage 3fff3000-3fffffff : ACPI Tables 50000000-53ffffff : PCI CardBus #02 54000000-57ffffff : PCI CardBus #02 58000000-5801ffff : 0000:00:09.0 d0000000-dfffffff : PCI Bus #01 d0000000-dfffffff : 0000:01:00.0 e0000000-e7ffffff : 0000:00:00.0 e8000000-e9ffffff : PCI Bus #01 e8000000-e8ffffff : 0000:01:00.0 e9000000-e901ffff : 0000:01:00.0 eb000000-eb01ffff : 0000:00:09.0 eb000000-eb01ffff : e1000 eb020000-eb03ffff : 0000:00:09.0 eb020000-eb03ffff : e1000 eb040000-eb040fff : 0000:00:0b.0 eb040000-eb040fff : yenta_socket eb045000-eb0450ff : 0000:00:10.4 eb045000-eb0450ff : ehci_hcd fec00000-fec00fff : reserved fee00000-fee00fff : reserved ffff0000-ffffffff : reserved ./ioports non-Xen ./mtrr reg00: base=0x00000000 ( 0MB), size=1024MB: write-back, count=1 reg01: base=0xe0000000 (3584MB), size= 128MB: write-combining, count=1 reg02: base=0xd0000000 (3328MB), size= 128MB: write-combining, count=1 ./timer_stats Timer Stats Version: v0.2 ./buddyinfo Node 0, zone DMA 5 5 5 3 5 4 2 2 3 1 1 Node 0, zone Normal 2 15 446 395 351 121 51 12 9 5 129 Node 0, zone HighMem 1 1 1 1 1 1 0 0 0 0 0 ./slabinfo slabinfo - version: 2.1 ./iomem 00000000-0009fbff : System RAM 0009fc00-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000cefff : Video ROM 000d0000-000d3fff : pnp 00:00 000d8000-000d97ff : Adapter ROM 000d9800-000dbfff : pnp 00:00 000f0000-000fffff : System ROM 00100000-3ffeffff : System RAM 00400000-0062fa08 : Kernel code 0062fa09-0074a723 : Kernel data 00797000-0084ae77 : Kernel bss 3fff0000-3fff2fff : ACPI Non-volatile Storage 3fff3000-3fffffff : ACPI Tables 50000000-53ffffff : PCI CardBus #02 54000000-57ffffff : PCI CardBus #02 58000000-5801ffff : 0000:00:09.0 d0000000-dfffffff : PCI Bus #01 d0000000-dfffffff : 0000:01:00.0 e0000000-e7ffffff : 0000:00:00.0 e8000000-e9ffffff : PCI Bus #01 e8000000-e8ffffff : 0000:01:00.0 e9000000-e901ffff : 0000:01:00.0 eb000000-eb01ffff : 0000:00:09.0 eb000000-eb01ffff : e1000 eb020000-eb03ffff : 0000:00:09.0 eb020000-eb03ffff : e1000 eb040000-eb040fff : 0000:00:0b.0 eb040000-eb040fff : yenta_socket eb045000-eb0450ff : 0000:00:10.4 eb045000-eb0450ff : ehci_hcd fec00000-fec00fff : reserved fee00000-fee00fff : reserved fff80000-fffeffff : pnp 00:00 ffff0000-ffffffff : reserved ./ioports # Essentially, I did this. cd ~/ touch xen.txt pushd /proc find . -maxdepth 1 -type f -exec ~/procinfo.sh '{}' ~/xen.txt \; pushd # reboot in normal kernel. touch xen_non.txt pushd find . -maxdepth 1 -type f -exec ~/procinfo.sh '{}' ~/xen_non.txt \; [root@mowgli ~]# more procinfo.sh #!/bin/bash case "$1" in # skip big file. ./kcore) echo $1 >> $2 ;; # stock kernel issues # Permission denied ./sys/kernel/sched_nr_migrate) echo $1 >> $2 ;; # Permission denied ./sys/net/ipv4/route/flush) echo $1 >> $2 ;; # Permission denied ./sys/net/ipv6/route/flush) echo $1 >> $2 ;; # Invalid argument ./sys/fs/binfmt_misc/register) echo $1 >> $2 ;; # Input/output error ./sysrq-trigger) echo $1 >> $2 ;; #xen issues # Device or resource busy ./acpi/event) echo $1 >> $2 ;; # Invalid argument ./xen/privcmd) echo $1 >> $2 ;; *) echo $1 >> $2 cat $1 >> $2 ;; esac [root@mowgli ~]#
This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.
It looks like this bug resurfaced :( My system was working perfectly for over a year. A couple of days ago I updated the kernel to the newest one available for Fedora 10 and rebooted. The first time I tried to transfer some larger files the system completely locked up and I had to power-cycle it. From this point on I have not been able to transfer any larger file across my network without a) getting “e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang” b) complete lock up I've tried to use the older kernel that used to work fine, but it has the same problem now. I've just updated to Fedora 11, but I still have the same problem. This is 100% reproducible, if I start to transfer some large file. I've also tried the module option “InterruptThrottleRate=0”, but it made no difference. Any ideas on how to fix this?
(In reply to comment #20) > It looks like this bug resurfaced :( Did you find any workaround yet? The same kind of bug hit me when I upgraded the RAM (3Gb to 12Gb) on a RHEL5.4 machine...
Ironically, I stumbled upon it aswel, I am running gentoo however, so this seems to be a generic kernel thing. The 3 -> 12Gb upgrade is interesting. I still have a 32bit system, so i'm using PAE at the moment. I used this install on a Xeon (64bit capable) with 3gb using PAE. I since swapped the motherboard for an AMD Phenom 2 one, and also went to 8Gb. I stayed 32bit and only recompiled my kernel with correct drivers/cpu architecture (still remaining 32bit). The nic is the same (pci e1000). And when transfering anything more then ... say 150mb worth, it chokes badly. So it appears a memory related thing maybe? hard to say however. For now I used ethtool to disable tx offloading ... i'll test again sometime to see if it helps. ethtool -K eth0 tso off
Same problem on fresh kernel and e1000 module on Debian Lenny: # modinfo e1000|grep ^version version: 7.3.21-k5-NAPI # lspci 03:02.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller (rev 05) # uname -a Linux hostname 2.6.32-bpo.5-amd64 #1 SMP Sat Sep 18 19:03:14 UTC 2010 x86_64 GNU/Linux But the problem shows not often, 1 time in 2-3 days. Will "ethtool -K eth0 tso off" solve the problem?