Bug 517107 - Ethernet link briefly lost when reading or writing at high speed
Summary: Ethernet link briefly lost when reading or writing at high speed
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 11
Hardware: i686
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-12 16:03 UTC by John Duffell
Modified: 2009-12-02 16:50 UTC (History)
4 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2009-12-02 16:50:09 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description John Duffell 2009-08-12 16:03:38 UTC
Description of problem:
I have an IBM thinkpad T60p (2008-93G with 3GB ram) running Fedora 11.  When I copy a large file to or from my machine using scp the copy starts fine, but during or after the copy network manager reports that the link has been lost and reconnects to the network.  This usually repeats again a minute or two later, even if the transfer has already completed.  The link light on the network interface actually goes off briefly (<1s) and the following messages appear in /var/log/messages
Aug  3 14:45:00 localhost NetworkManager: <info>  (eth0): carrier now OFF (device state 8)
Aug  3 14:45:00 localhost NetworkManager: <info>  (eth0): device state change: 8 -> 2 (reason 40)
Aug  3 14:45:00 localhost NetworkManager: <info>  (eth0): deactivating device (reason: 40).
Aug  3 14:45:00 localhost NetworkManager: <info>  eth0: canceled DHCP transaction, dhcp client pid 405
Aug  3 14:45:00 localhost NetworkManager: <WARN>  check_one_route(): (eth0) error -34 returned from rtnl_route_del(): Sucess#012
Aug  3 14:45:00 localhost dnsmasq[2344]: no servers found in /etc/resolv.conf, will retry
Aug  3 14:45:01 localhost kernel: e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: None
Aug  3 14:45:01 localhost kernel: 0000:02:00.0: eth0: 10/100 speed: disabling TSO
Aug  3 14:45:01 localhost NetworkManager: <info>  (eth0): carrier now ON (device state 2)
Aug  3 14:45:01 localhost NetworkManager: <info>  (eth0): device state change: 2 -> 3 (reason 40)
Aug  3 14:45:01 localhost NetworkManager: <info>  Activation (eth0) starting connection 'ethernet dhcp'


things I tried that didn't fix it:
- connecting the network cable to the laptop directly when on battery power
- downgrading my kernel to the F11 release version
- connecting directly to my desktop machine rather than via the switch (although it did seem to improve matters and it used 1Gb/s rather than 100Mb/s)

Version-Release number of selected component (if applicable):
both of:
kernel-2.6.29.4-167.fc11.i586
kernel-2.6.29.6-217.2.3.fc11.i586

How reproducible:
By copying a 100MB file, every time.

Steps to Reproduce:
1. sftp remotemachine (via ethernet)
2. get largefile (arrives at around 8MByte/sec for me
3. keep an eye on /var/log/messages and throughput
  
Actual results:
After less than a minute, the above messages are printed, the network link light goes off briefly, network manager says the network connection was disconnected and begins reconnecting, and the transfer stalls until the interface is configured again.  This usually repeats itself a minute or two later.

Expected results:
There are no messages printed in the log, the network link stays up and the file is transferred.

Additional info:
This wasn't a problem when I was on fedora 10, and I upgraded to fedora 11 within a week of its release.  However I don't do large file transfers that often, but I have only noticed it in the last month or two.  So there is a small chance that it's a hardware failure but I can't find any more detailed logging to enable.

02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller

[root@john-laptop ~]# ethtool eth0
Settings for eth0:
	Supported ports: [ TP ]
	Supported link modes:   10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Full
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	Advertised auto-negotiation: Yes
	Speed: 100Mb/s
	Duplex: Full
	Port: Twisted Pair
	PHYAD: 1
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: pumbag
	Wake-on: g
	Current message level: 0x00000001 (1)
	Link detected: yes

eth0      Link encap:Ethernet  HWaddr 00:16:41:A7:69:46
           inet addr:9.20.209.223  Bcast:9.20.209.255  Mask:255.255.255.0
           inet6 addr: fe80::216:41ff:fea7:6946/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:235175 errors:0 dropped:0 overruns:0 frame:0
           TX packets:287639 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:100
           RX bytes:49798262 (47.4 MiB)  TX bytes:193502419 (184.5 MiB)
           Memory:ee000000-ee020000

Comment 1 Mike C 2009-09-16 09:12:00 UTC
I am seeing something similar with F11 (fully up to date) on my machine - with smolt info:
http://www.smolts.org/client/show/pub_34ba7c2a-3891-44a5-b011-98990af7b916

Sep 16 10:05:21 physics30 kernel: b44: eth0: powering down PHY
Sep 16 10:05:21 physics30 NetworkManager: <info>  (eth0): carrier now OFF (device state 8)
Sep 16 10:05:21 physics30 kernel: b44: eth0: Link is down.
Sep 16 10:05:21 physics30 NetworkManager: <info>  (eth0): device state change: 8 -> 2 (reason 40)
Sep 16 10:05:21 physics30 NetworkManager: <info>  (eth0): deactivating device (reason: 40).
Sep 16 10:05:21 physics30 NetworkManager: <info>  eth0: canceled DHCP transaction, dhcp client pid 2317
Sep 16 10:05:21 physics30 NetworkManager: <WARN>  check_one_route(): (eth0) error -34 returned from rtnl_route_del(): Sucess#012
Sep 16 10:05:21 physics30 avahi-daemon[1117]: Withdrawing address record for 144.32.35.20 on eth0.
Sep 16 10:05:21 physics30 avahi-daemon[1117]: Leaving mDNS multicast group on interface eth0.IPv4 with address 144.32.35.20.
Sep 16 10:05:21 physics30 avahi-daemon[1117]: Interface eth0.IPv4 no longer relevant for mDNS.
Sep 16 10:05:22 physics30 ntpd[1467]: Deleting interface #6 eth0, 144.32.35.20#123, interface stats: received=24, sent=24, dropped=0, active_time=787 secs
Sep 16 10:05:24 physics30 NetworkManager: <info>  (eth0): carrier now ON (device state 2)
Sep 16 10:05:24 physics30 NetworkManager: <info>  (eth0): device state change: 2 -> 3 (reason 40)
Sep 16 10:05:24 physics30 kernel: b44: eth0: Link is up at 100 Mbps, full duplex.
Sep 16 10:05:24 physics30 kernel: b44: eth0: Flow control is off for TX and off for RX.
Sep 16 10:05:24 physics30 NetworkManager: <info>  Activation (eth0) starting connection 'Auto Ethernet'
Sep 16 10:05:24 physics30 NetworkManager: <info>  (eth0): device state change: 3 -> 4 (reason 0)
Sep 16 10:05:24 physics30 NetworkManager: <info>  Activation (eth0) Stage 1 of 5 (Device Prepare) scheduled...
Sep 16 10:05:24 physics30 NetworkManager: <info>  Activation (eth0) Stage 1 of 5 (Device Prepare) started...
Sep 16 10:05:24 physics30 NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) scheduled...
Sep 16 10:05:24 physics30 NetworkManager: <info>  Activation (eth0) Stage 1 of 5 (Device Prepare) complete.
Sep 16 10:05:24 physics30 NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) starting...
Sep 16 10:05:24 physics30 NetworkManager: <info>  (eth0): device state change: 4 -> 5 (reason 0)
Sep 16 10:05:24 physics30 NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) successful.
Sep 16 10:05:24 physics30 NetworkManager: <info>  Activation (eth0) Stage 3 of 5 (IP Configure Start) scheduled.
Sep 16 10:05:24 physics30 NetworkManager: <info>  Activation (eth0) Stage 2 of 5 (Device Configure) complete.
Sep 16 10:05:24 physics30 NetworkManager: <info>  Activation (eth0) Stage 3 of 5 (IP Configure Start) started...
Sep 16 10:05:24 physics30 NetworkManager: <info>  (eth0): device state change: 5 -> 7 (reason 0)
Sep 16 10:05:24 physics30 NetworkManager: <info>  Activation (eth0) Beginning DHCP transaction.
Sep 16 10:05:24 physics30 NetworkManager: <info>  dhclient started with pid 2753
Sep 16 10:05:24 physics30 NetworkManager: <info>  Activation (eth0) Stage 3 of 5 (IP Configure Start) complete.
Sep 16 10:05:24 physics30 dhclient: Internet Systems Consortium DHCP Client 4.1.0p1
Sep 16 10:05:24 physics30 dhclient: Copyright 2004-2009 Internet Systems Consortium.
Sep 16 10:05:24 physics30 dhclient: All rights reserved.
Sep 16 10:05:24 physics30 dhclient: For info, please visit http://www.isc.org/sw/dhcp/
Sep 16 10:05:24 physics30 dhclient: 
Sep 16 10:05:24 physics30 NetworkManager: <info>  DHCP: device eth0 state changed normal exit -> preinit
Sep 16 10:05:24 physics30 dhclient: Listening on LPF/eth0/00:02:b3:a6:04:42
Sep 16 10:05:24 physics30 dhclient: Sending on   LPF/eth0/00:02:b3:a6:04:42
Sep 16 10:05:24 physics30 dhclient: Sending on   Socket/fallback
Sep 16 10:05:26 physics30 dhclient: DHCPREQUEST on eth0 to 255.255.255.255 port 67
Sep 16 10:05:26 physics30 dhclient: DHCPACK from 144.32.128.17
Sep 16 10:05:26 physics30 NetworkManager: <info>  DHCP: device eth0 state changed preinit -> reboot
Sep 16 10:05:26 physics30 NetworkManager: <info>  Activation (eth0) Stage 4 of 5 (IP Configure Get) scheduled...
Sep 16 10:05:26 physics30 NetworkManager: <info>  Activation (eth0) Stage 4 of 5 (IP Configure Get) started...
Sep 16 10:05:26 physics30 NetworkManager: <info>    address 144.32.35.20
Sep 16 10:05:26 physics30 NetworkManager: <info>    prefix 23 (255.255.254.0)
Sep 16 10:05:26 physics30 NetworkManager: <info>    gateway 144.32.34.253
Sep 16 10:05:26 physics30 NetworkManager: <info>    hostname 'physics30'
Sep 16 10:05:26 physics30 NetworkManager: <info>    nameserver '144.32.128.242'
Sep 16 10:05:26 physics30 NetworkManager: <info>    nameserver '144.32.128.243'
Sep 16 10:05:26 physics30 NetworkManager: <info>    domain name 'csrv.ad.york.ac.uk'
Sep 16 10:05:26 physics30 NetworkManager: <info>  Activation (eth0) Stage 5 of 5 (IP Configure Commit) scheduled...
Sep 16 10:05:26 physics30 NetworkManager: <info>  Activation (eth0) Stage 4 of 5 (IP Configure Get) complete.
Sep 16 10:05:26 physics30 NetworkManager: <info>  Activation (eth0) Stage 5 of 5 (IP Configure Commit) started...
Sep 16 10:05:26 physics30 avahi-daemon[1117]: Joining mDNS multicast group on interface eth0.IPv4 with address 144.32.35.20.
Sep 16 10:05:26 physics30 avahi-daemon[1117]: New relevant interface eth0.IPv4 for mDNS.
Sep 16 10:05:26 physics30 avahi-daemon[1117]: Registering new address record for 144.32.35.20 on eth0.IPv4.
Sep 16 10:05:26 physics30 dhclient: bound to 144.32.35.20 -- renewal in 6661 seconds.
Sep 16 10:05:27 physics30 NetworkManager: <info>  (eth0): device state change: 7 -> 8 (reason 0)
Sep 16 10:05:27 physics30 NetworkManager: <info>  Policy set 'Auto Ethernet' (eth0) as default for routing and DNS.
Sep 16 10:05:27 physics30 NetworkManager: <info>  Activation (eth0) successful, device activated.
Sep 16 10:05:27 physics30 NetworkManager: <info>  Activation (eth0) Stage 5 of 5 (IP Configure Commit) complete.

The disconnect/reconnect happens at frequent intervals when the network is heavily loaded.

Version of kernel and NM are:
2.6.30.5-43.fc11.i686.PAE
NetworkManager-0.7.1-8.git20090708.fc11.i586

Comment 2 Mike C 2009-09-28 10:44:54 UTC
Ping!  Any news on this bug - it still plagues me!

Comment 3 Mike C 2009-09-28 10:50:47 UTC
If it is any help my NIC is:
03:08.0 Ethernet controller: Intel Corporation 82801G (ICH7 Family) LAN Controller (rev 01)
	Subsystem: Dell Device 01ab
	Flags: bus master, medium devsel, latency 64, IRQ 20
	Memory at ebeff000 (32-bit, non-prefetchable) [size=4K]
	I/O ports at ccc0 [size=64]
	Capabilities: <access denied>
	Kernel driver in use: e100
	Kernel modules: e100

Comment 4 Mike C 2009-09-28 11:01:02 UTC
The previous info was the ethernet controller - physical NIC is 
                description: Ethernet interface
                product: BCM4401 100Base-T
                vendor: Broadcom Corporation
                physical id: 9
                bus info: pci@0000:01:09.0
                logical name: eth0
                version: 01

However I don't know if this is something that is NM related, kernel related or NIC specific?

Comment 5 Mike C 2009-09-29 19:15:27 UTC
I have updated the kernel to the latest in F11 and this problem persists - I wonder if the component may be set wrong and it is possible the problem is a bug in NetworkManager?

Comment 6 John Duffell 2009-12-02 16:50:09 UTC
this seems to work fine on the same hardware on fedora 12.  Switching back to 11 shows the bug again.  Presumably this has been fixed along the way so I'm closing this.

I suspect there was some problem caused after the network hardware moved from some low power mode to a higher power mode used when traffic was high.   Because there were always 2 disconnects, and one would always happen soon after commencing traffic and one soon after stopping it, no matter how long the transfer lasted.


Note You need to log in before you can comment on or make changes to this bug.