Description of problem: With the e1000e driver built into this kernel, I get very unstable network access. DNS requests fail; even pinging my router drops ~25%. Version-Release number of selected component (if applicable): kernel-2.6.38.7-30.fc15.x86_64 (e1000e 1.2.20-k2) How reproducible: fully Steps to Reproduce: $ ping 192.168.0.1 Actual results: lost packets. Expected results: 0% packet loss (at least for local network). Additional info: I don't know what diagnostics would be helpful here, but please ask and I will try. I tried e1000e 1.3.17-NAPI from e1000.sf.net, and so far with that the network is flawless. Perhaps a newer version could be patched into the F15 kernel? Here's the exact device from lspic -vv: > 00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 05) > Subsystem: Intel Corporation Device 2003 > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin A routed to IRQ 52 > Region 0: Memory at fe500000 (32-bit, non-prefetchable) [size=128K] > Region 1: Memory at fe528000 (32-bit, non-prefetchable) [size=4K] > Region 2: I/O ports at f080 [size=32] > Capabilities: [c8] Power Management version 2 > Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- > Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Address: 00000000fee0100c Data: 41e9 > Capabilities: [e0] PCI Advanced Features > AFCap: TP+ FLR+ > AFCtrl: FLR- > AFStatus: TP- > Kernel driver in use: e1000e > Kernel modules: e1000e
The problem continues with 2.6.38.8-35.fc15.x86_64. Is there anything I can provide to track this down? The network trouble is very obvious to me as soon as I boot up, because I control the machine via synergy-plus and the mouse is very laggy. As soon as I install the updated e1000e and reload the module, everything is smooth again.
The problem continues with kernel-2.6.40-4.fc15.x86_64 (e1000e 1.3.10-k2). I don't know if it's the newer external version or NAPI that's really helping. If anyone else is playing along, e1000e-1.4.4 needs a patch to build with this fake kernel 2.6.40 that's really kernel 3.0.0. --- e1000e-1.4.4/src/kcompat.h.orig 2011-06-23 11:12:53.000000000 -0700 +++ e1000e-1.4.4/src/kcompat.h 2011-08-02 12:28:34.557189588 -0700 @@ -2638,7 +2638,7 @@ static inline int _kc_skb_checksum_start #endif /* < 2.6.39 */ /*****************************************************************************/ -#if ( LINUX_VERSION_CODE < KERNEL_VERSION(3,0,0) ) +#if ( LINUX_VERSION_CODE < KERNEL_VERSION(2,6,40) ) #ifdef ETHTOOL_GRXRINGS #ifndef FLOW_EXT #define FLOW_EXT 0x80000000
what do the ethtool -S stats for the device in question tell you about dropped frames? Ditto the network stats (cat /proc/net/snmp, netstat -S)
Created attachment 516411 [details] network statistics on e1000e 1.3.10-k2 These stats were collected just after booting and attempting to interact via Synergy+ for a bit. It was definitely having issues, but nothing in these numbers jumps out at me.
You're right, nothing jumps out. in fact no significant dropped packets are listed. This, coupled with the fact that changing your ethernet driver doesn't affect the problem suggests: 1) The frames are getting lost in such a way that the intel NIC can't recognize them as lost frames 2) Its not the card on this system dropping frames Have you checked the above stats on the sending system to ensure that you're not loosing the frames there? Also, have you disabled offloads (tso/lro/gro/etc) on both systems to ensure that such frame coalescing technology is not responsible for the lag?
(In reply to comment #5) > You're right, nothing jumps out. in fact no significant dropped packets are > listed. This, coupled with the fact that changing your ethernet driver doesn't > affect the problem suggests: Changing to the e1000e driver from e1000.sf.net does fix it though -- first 1.3.17-NAPI and now 1.4.4-NAPI both work fine, suggesting a local problem. > Have you checked the above stats on the sending system to ensure that you're > not loosing the frames there? It's not any particular target having issues -- pinging anything on my home network has about 25% packet loss. That could finger my router, I suppose, but it looks like dd-wrt doesn't have ethtool and its netstat doesn't have -S. > Also, have you disabled offloads (tso/lro/gro/etc) on both systems to ensure > that such frame coalescing technology is not responsible for the lag? I don't know how, but I'm willing to try.
I don't know if NAPI would have anything to do with this, but since it turns out easy to disable, I tried building the external e1000e-1.4.4 without NAPI. It still appears to be fine.
ethtool -K ethX gro off lro off tso off gso off ufo off
I'm testing by watching ping losses to my router now. DD-WRT doesn't seem to have ethtool though, so I'm only able to tweak on the troubled machine itself. Here's my pre-existing ethtool -k output, and there's no difference between the kernel builtin or external e1000e: Offload parameters for em1: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: off udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off receive-hashing: off I can't run your command exactly, as it says: > Cannot set device udp large send offload settings: Operation not supported I think that's for udp-fragmentation-offload, which is already off anyway. So taking that one out of the list, then ethtool -k says: Offload parameters for em1: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: off udp-fragmentation-offload: off generic-segmentation-offload: off generic-receive-offload: off large-receive-offload: off rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off receive-hashing: off No better. For kicks, I turned off all the remaining ones too, still no good. On e1000.sf.net there's also a version 1.3.10a, which I thought in theory might be similar to the 1.3.10 in the kernel. That one also works just fine though. The diff is actually fairly big, but I guess I'll see if anything between the two looks suspicious.
There's a very similar Ubuntu bug #756485, with no resolution. They found as I did that the external driver fixes things. They also suggest throttling to 10 Mbps as a workaround, and I found that does work for me too. Another thing I just noticed between drivers is a slight difference in dmesg: > e1000e: Intel(R) PRO/1000 Network Driver - 1.3.10-k2 > ... > e1000e 0000:00:19.0: eth0: MAC: 10, PHY: 11, PBA No: FFFFFF-0FF vs. > e1000e: Intel(R) PRO/1000 Network Driver - 1.4.4-NAPI > ... > e1000e 0000:00:19.0: eth0: MAC: 11, PHY: 11, PBA No: FFFFFF-0FF That's MAC 10 vs. 11, printed by netdev.c:e1000_print_device_info(). But this looks like it's just 0 or 1-based values of enum e1000_mac_type in hw.h, giving different numbers for e1000_pch2lan. So long as everything in the driver references that by enum names and not raw values, it shouldn't matter.
I have tried the work arounds recommended by others but get the same issue. 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04) Subsystem: Dell Device 047e Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 44 Region 0: Memory at e1500000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at e1580000 (32-bit, non-prefetchable) [size=4K] Region 2: I/O ports at 4040 [size=32] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee00338 Data: 0000 Capabilities: [e0] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: e1000e Kernel modules: e1000e [root@beans ~]# ethtool -k em1 Offload parameters for em1: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp-segmentation-offload: off udp-fragmentation-offload: off generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off receive-hashing: off I always get a steady 11-12% packet loss.
I have exactly the same problem. 10 - 12% dropped packets with a 82574L Gigabit network controller. I can't do anything with my server right now because the drops are enough to cause incoming MySQL traffic feeding the slave to get corrupted. Receiving data seems almost impossible at this point. I've been trying every suggestion above me, I'm running Kernel 3.0.3 stable which comes with 1.3.10-k2 I believe, I've tried 1.5.1, 1.4.4, I tried setting it to 10 mbit, I tried forcing it to 100 mbit FD but nothing works, it keeps dropping packets. Problem is, the host doesn't see any drops at all on the switch. I can only see them on the Linux server itself when I type ifconfig. ~10% is dropped at a constant rate. I'm gonna try 1.3.17 now, only thing I haven't done yet but I don't think it will work.
Ok so 1.3.17 doesn't even install on 3.0.3. I'm gonna try a late 2.6 kernel now. Makefile:179: *** *** Aborting the build. *** This driver is not supported on kernel versions older than 2.4.0. Stop.
The external driver works perfectly for me. My only suggestion is to double-check that the loaded module is really the version that you mean to try, as told in "/sys/module/e1000e/version". Other than that, your model is slightly different, so it could be some different issue for you.
"/sys/module/e1000e/version" says 1.5.1-NAPI so does ethtool -i ethX. driver: e1000e version: 1.5.1-NAPI firmware-version: 1.9-0 bus-info: 0000:05:00.0 I have another dedicated server with the same NIC running CentOS 5.6 and that one is fine. It even runs an older kernel. 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux driver: e1000e version: 1.0.2-k3 firmware-version: 1.9-0 bus-info: 0000:02:00.0 That same OS was on my now 'problem' server and still same issue of packet drops so it might not be the driver alone but maybe a combination of things.
OMG I solved it for me :D Using this driver: driver: e1000e version: 1.0.2-k2 firmware-version: 1.9-0 bus-info: 0000:05:00.0 Which is shipped with Kernel 2.6.33.1 I tried 2.6.39.4, 3.0.3 and this finally works! I'm a happy camper right now :)
I should note that I've also ran into this problem on my Mageia Linux 1 system, reached this page by a web-search and the workaround of replacing the built-in driver with the one in http://sourceforge.net/projects/e1000/ worked. Here is the Linux-IL thread with the problem I've ran into (lagging ssh), its investigation and its resolution: http://www.mail-archive.com/linux-il@cs.huji.ac.il/msg61605.html Perhaps it is a global problem with the Linus-released kernel. Regards, -- Shlomi Fish
This is also a problem on RHEL 6. I had to use dkms to build and install the latest e1000e module, currently 1.6.2. # lspci | grep Ethernet 00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04) It is now working: # ethtool -i em1 driver: e1000e version: 1.6.2-NAPI firmware-version: 0.13-4 bus-info: 0000:00:19.0
Yay! I just updated to Fedora 16, and kernel-3.1.0-0.rc8.git0.1.fc16.x86_64 with the included e1000e 1.4.4-k is working fine so far. There are 23 kernel commits to drivers/net/e1000e/ since v3.0. These two in particular stand out as probably related to this bug: 0ed013e e1000e: workaround for packet drop on 82579 at 100Mbps 1d2101a e1000e: Spurious interrupts & dropped packets with 82577/8/9 in half-duplex
I can confirm that many Intel 82579 series NICs (including 82579V and 82579LM) have a hardware fault at 100Mbps that results in packet loss that is worked around by the patch identified above: http://patchwork.ozlabs.org/patch/109926/ Upstream commit: 0ed013e28fe853244f4972cf18d8e2bd62eeb8fc It would be useful for this patch to be cherry picked for the distribution kernel in current supported releases of RHEL/Fedora as this fault exists on a large number of recent devices. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/870127/ Thanks, Terry
just added this patch to f15. it'll show up in the next build. It's already in f16.
Thanks.
kernel-2.6.40.7-0.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/kernel-2.6.40.7-0.fc15
Package kernel-2.6.40.7-0.fc15: * should fix your issue, * was pushed to the Fedora 15 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-2.6.40.7-0.fc15' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2011-14513 then log in and leave karma (feedback).
kernel-2.6.40.7-3.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/kernel-2.6.40.7-3.fc15
kernel-2.6.40.8-2.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/kernel-2.6.40.8-2.fc15
I upgraded to F16 and installed the latest kernel today, just testing the connectivity now, will report back if the issue still exists or is now resolved.
2789 packets transmitted, 2488 received, 10% packet loss, time 2789816ms rtt min/avg/max/mdev = 0.633/2.477/444.845/10.815 ms Linux beans.is.ham.uk.betfair 3.1.0-5.fc16.x86_64 #1 SMP Thu Oct 27 03:46:50 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Unfortunately still seeing packet loss. Host was rebooted after the patches were applied.
kernel-2.6.40.8-4.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/kernel-2.6.40.8-4.fc15
on ALTLinux with kernel 3.0.1 (3.1.0-un-def-alt1) I have exactly the same problem. dirvers from http://sourceforge.net/projects/e1000/ does not fix problem ethtool -i eth1 driver: e1000e version: 1.6.3-NAPI firmware-version: 0.13-4 bus-info: 0000:00:19.0 ethtool -S eth1 NIC statistics: rx_packets: 345 tx_packets: 379 rx_bytes: 50365 tx_bytes: 40413 rx_broadcast: 0 tx_broadcast: 1 rx_multicast: 0 tx_multicast: 6 rx_errors: 0 tx_errors: 0 tx_dropped: 0 multicast: 0 collisions: 0 rx_length_errors: 0 rx_over_errors: 0 rx_crc_errors: 0 rx_frame_errors: 0 rx_no_buffer_count: 0 rx_missed_errors: 0 tx_aborted_errors: 0 tx_carrier_errors: 0 tx_fifo_errors: 0 tx_heartbeat_errors: 0 tx_window_errors: 0 tx_abort_late_coll: 0 tx_deferred_ok: 0 tx_single_coll_ok: 0 tx_multi_coll_ok: 0 tx_timeout_count: 0 tx_restart_queue: 0 rx_long_length_errors: 0 rx_short_length_errors: 0 rx_align_errors: 0 tx_tcp_seg_good: 0 tx_tcp_seg_failed: 0 rx_flow_control_xon: 0 rx_flow_control_xoff: 0 tx_flow_control_xon: 0 tx_flow_control_xoff: 0 rx_long_byte_count: 50365 rx_csum_offload_good: 0 rx_csum_offload_errors: 0 rx_header_split: 0 alloc_rx_buff_failed: 0 tx_smbus: 0 rx_smbus: 0 dropped_smbus: 0 rx_dma_failed: 0 tx_dma_failed: 0
switch to gigabit network not help. in PPPoE link have some loss
kernel-2.6.41.1-1.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/kernel-2.6.41.1-1.fc15
kernel-2.6.41.1-1.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report.
I find I am still getting this problem on a 2.6.43.5-2.fc15.x86_64 kernel. Connection is flapping about. >$ dmesg|tail >[10450.496974] e1000e 0000:00:19.0: em1: 10/100 speed: disabling TSO >[10451.468391] e1000e: em1 NIC Link is Down >[10453.259366] e1000e: em1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx >[10453.259374] e1000e 0000:00:19.0: em1: 10/100 speed: disabling TSO >[10457.286911] e1000e: em1 NIC Link is Down >[10459.095868] e1000e: em1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx >[10459.095875] e1000e 0000:00:19.0: em1: 10/100 speed: disabling TSO >[10459.607726] e1000e: em1 NIC Link is Down >[10461.507611] e1000e: em1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx >[10461.507618] e1000e 0000:00:19.0: em1: 10/100 speed: disabling TSO >$ ethtool -i em1 >driver: e1000e >version: 1.5.1-k >firmware-version: 0.13-4 >bus-info: 0000:00:19.0 >supports-statistics: yes >supports-test: yes >supports-eeprom-access: yes >supports-register-dump: yes >$ lspci -v |grep Ethernet -A 8 >00:19.0 Ethernet controller: Intel Corporation 82579V Gigabit Network Connection (rev 05) > Subsystem: ASUSTeK Computer Inc. P8P67 Deluxe Motherboard > Flags: bus master, fast devsel, latency 0, IRQ 61 > Memory at fe700000 (32-bit, non-prefetchable) [size=128K] > Memory at fe728000 (32-bit, non-prefetchable) [size=4K] > I/O ports at f040 [size=32] > Capabilities: <access denied> > Kernel driver in use: e1000e > Kernel modules: e1000e I'm just going to look at building the driver from sourceforge to see if I can get the driver version higher.
Ok upgraded to the latest driver from the sourceforge page (http://sourceforge.net/projects/e1000/files/e1000e%20stable/2.0.0/) and it works sweet. >$ethtool -i em1 >driver: e1000e >version: 2.0.0-NAPI >firmware-version: 0.13-4 >bus-info: 0000:00:19.0 >[11721.226105] e1000e: em1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: Rx/Tx >[11721.226112] e1000e 0000:00:19.0: em1: 10/100 speed: disabling TSO >[11721.226860] ADDRCONF(NETDEV_CHANGE): em1: link becomes ready >[11731.749329] em1: no IPv6 routers present By doing: sudo yum install kernel-devel kernel-headers cd /usr/tmp wget http://sourceforge.net/projects/e1000/files/e1000e%20stable/2.0.0/e1000e-2.0.0.tar.gz/download tar -xzvf e1000e-2.0.0.tar.gz cd e1000e-2.0.0/src make sudo make install sudo rmmod e1000e sudo insmod /lib/modules/`uname -r`/kernel/drivers/net/e1000e/e1000e.ko And bobs your uncle... But is there a package in any repo that will install the 2.0.0-NAPI driver?