603284 – Randomly deactivates device, triggering network reconnection

Bug 603284 - Randomly deactivates device, triggering network reconnection

Summary: Randomly deactivates device, triggering network reconnection

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	NetworkManager
Sub Component:
Version:	13
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Dan Williams
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-06-12 07:20 UTC by Tore Anderson
Modified:	2011-06-27 18:03 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2011-06-27 18:03:53 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
/var/log/messages (83.10 KB, text/plain) 2010-06-12 07:20 UTC, Tore Anderson	no flags	Details
/var/log/messages (with --log-level=debug) (68.80 KB, text/plain) 2010-06-15 21:00 UTC, Tore Anderson	no flags	Details
Random disconnection happening on wireless LAN (84.07 KB, application/x-gzip) 2010-07-08 07:15 UTC, Tore Anderson	no flags	Details
Show Obsolete (1) View All

Description Tore Anderson 2010-06-12 07:20:51 UTC

Created attachment 423453 [details]
/var/log/messages

Description of problem:

Sometimes, NetworkManager randomly deactivates the device and starts to connect to the network from the beginning again.  It seems to happen quite often right after a reboot, but I've seen it happen randomly after the connection has been working fine for hours.  Usually it's stable though.  It might be connected to IPv6 being enabled.

Version-Release number of selected component (if applicable):

NetworkManager-0.8.1-0.1.git20100510.fc13.x86_64

How reproducible:

Not 100%, but happens often enough.

Steps to Reproduce:
1. Reboot or connect to the network
  
Actual results:

NM connects fine, marking the connection as up.  A few seconds later it disconnects, and tries again.  Usually it stays up after attempt #2.

Expected results:

That the first connection that was made stayed up.

Additional info:

Attaching /var/log/messages from a complete boot where it happened.  I haven't edited it, but the fun part seems to be around 09:01:40 - 09:02:00, where you can see the device activation happening twice:

Jun 12 09:01:40 wrath NetworkManager[1462]: <info> Activation (eth0) Stage 3 of 5 (IP Configure Start) started...
Jun 12 09:01:52 wrath NetworkManager[1462]: <warn> Activation (eth0) failed.
Jun 12 09:01:56 wrath NetworkManager[1462]: <info> Activation (eth0) Stage 3 of 5 (IP Configure Start) started...

Comment 1 Dan Williams 2010-06-15 18:24:09 UTC

Can you try to trigger the problem again, but grab logs like so:

service NetworkManager stop
/usr/sbin/NetworkManager --no-daemon --log-level=debug

and we'll see what's going on there? Thanks!

Comment 2 Tore Anderson 2010-06-15 21:00:56 UTC

Created attachment 424296 [details]
/var/log/messages (with --log-level=debug)

I added --log-level=debug to the init script instead, hope that's no problem.  I'm attaching the content of /var/log/messages starting when NM was started and continuing until the connection was stable.

Tore

Comment 3 Dan Williams 2010-06-28 20:20:03 UTC

Jun 15 22:53:24 wrath NetworkManager[4102]: <debug> [1276635204.717822] [nm-system.c:223] sync_addresses(): (eth0): syncing addresses (family 10)
Jun 15 22:53:24 wrath NetworkManager[4102]: <debug> [1276635204.717858] [nm-system.c:276] sync_addresses(): (eth0): removing address '2001:16d8:ee47:0:230:1bff:febc:7f23/64'
Jun 15 22:53:24 wrath NetworkManager[4102]: <debug> [1276635204.717905] [nm-system.c:259] sync_addresses(): (eth0): ignoring IPv6 link-local address
Jun 15 22:53:24 wrath NetworkManager[4102]: <debug> [1276635204.717921] [nm-system.c:312] sync_addresses(): (eth0): adding address '2001:16d8:ee47::bbb6/0'
Jun 15 22:53:24 wrath avahi-daemon[1295]: Withdrawing address record for 2001:16d8:ee47:0:230:1bff:febc:7f23 on eth0.
Jun 15 22:53:24 wrath avahi-daemon[1295]: Registering new address record for fe80::230:1bff:febc:7f23 on eth0.*.
Jun 15 22:53:25 wrath NetworkManager[4102]: <info> Policy set 'System eth0' (eth0) as default for IPv4 routing and DNS.
Jun 15 22:53:25 wrath NetworkManager[4102]: <info> Policy set 'System eth0' (eth0) as default for IPv6 routing and DNS.
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.748807] [nm-ip6-manager.c:756] netlink_notification(): netlink notificate type 21
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.748826] [nm-ip6-manager.c:472] process_addr(): processing netlink new/del address message
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.748945] [nm-ip6-manager.c:784] netlink_notification(): (eth0): syncing device with netlink changes
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.748958] [nm-ip6-manager.c:349] nm_ip6_device_sync_from_netlink(): (eth0): syncing with netlink (ra_flags 0x800000F0) (state/target 'got-address'/'got-address')
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.749018] [nm-ip6-manager.c:368] nm_ip6_device_sync_from_netlink(): (eth0): netlink address: fe80::230:1bff:febc:7f23
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.749070] [nm-ip6-manager.c:390] nm_ip6_device_sync_from_netlink(): (eth0): addresses synced (state got-address)
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.749582] [nm-ip6-manager.c:401] nm_ip6_device_sync_from_netlink(): router advertisement deferred to DHCPv6
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.749595] [nm-ip6-manager.c:442] nm_ip6_device_sync_from_netlink(): (eth0): RA-provided address no longer valid
Jun 15 22:53:25 wrath NetworkManager[4102]: <info> (eth0): device state change: 8 -> 9 (reason 5)
Jun 15 22:53:25 wrath NetworkManager[4102]: <warn> Activation (eth0) failed.

So this looks like the address '2001:16d8:ee47::bbb6/0' gets added to the interface, which is supposed to be the address from DHCP.  I'm a bit worried about the '0' there, but on a closer look that may be harmless.

But then just a second later that address is not present on the interface; during nm_ip6_device_sync_from_netlink() all of the interface's IP addresses are printed out and that address isn't there...  Need some more investigation.

Comment 4 Tore Anderson 2010-07-08 07:15:05 UTC

Created attachment 430256 [details]
Random disconnection happening on wireless LAN

I've got this problem happening on my laptop, too.  (Or perhaps it could be a separate issue?)  Attaching the debug log (it's quite large as it took some time before it happened.  Looking at it, I see it happening here:


Jul  8 09:05:17 envy NetworkManager[4545]: <debug> [1278572717.1868] [nm-ip6-manager.c:252] rdnss_expired(): (wlan0): IPv6 RDNSS information expired
Jul  8 09:05:17 envy NetworkManager[4545]: <debug> [1278572717.1932] [nm-ip6-manager.c:286] set_rdnss_timeout(): (wlan0): removing expired RA-provided nameserver 2001:16d8:ee47::
Jul  8 09:05:17 envy NetworkManager[4545]: <info> (wlan0): device state change: 8 -> 9 (reason 5)

Perhaps packet loss on the WLAN (I was running a yum upgrade at the time, so the bandwidth was saturated) made the ICMPv6 RA to become lost, and that there's a bug that makes NM tear down the entire connection (including IPv4 connectivity) if the RDNSS server expire?

Tore

Comment 5 Dan Williams 2010-07-20 20:14:07 UTC

We could make the timer for RDNSS longer, but if you don't have usable IPv6 nameservers, you don't really have full IPv6 connectivity...  and since we're supposed to honor the RDNSS timeout, I'm not entirely sure what to do here.  There's obviously still the issue of the IPv6 and IPv4 being tied together to a degree, but when the RDNSS info expires then we're really expected to clear that out.

Now, since an admin changing your DNS servers is a lot less likely than WLAN packet loss, perhaps we should increase the timeout before we really consider the RDNSS servers expired...

Comment 6 Tore Anderson 2010-07-26 14:59:33 UTC

I think the RDNSS lifetime is supposed to be independent of the prefix lifetime.  By default radvd will announce a router lifetime of 1800s, a prefix lifetime of 86400 (preferred 14400), and RDNSS lifetime 600.  I think you can also add on several other RDNSSes and prefixes with completely different lifetimes.

IMHO, if the RDNSS expire it should be removed from resolv.conf but it should not interrupt other things like the configured IPv6 (or IPv4) addresses or routers.

However, in the case that a RDNSS expiry would bring the number of servers in resolv.conf down to zero, it could be wise to delay removing it until a new server is available (regardless of how it's been learnt).

Tore

Comment 7 Bug Zapper 2011-06-02 11:09:10 UTC

This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 8 Bug Zapper 2011-06-27 18:03:53 UTC

Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.