Bug 603284 - Randomly deactivates device, triggering network reconnection
Randomly deactivates device, triggering network reconnection
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: NetworkManager (Show other bugs)
13
All Linux
low Severity medium
: ---
: ---
Assigned To: Dan Williams
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-12 03:20 EDT by Tore Anderson
Modified: 2011-06-27 14:03 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-06-27 14:03:53 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
/var/log/messages (83.10 KB, text/plain)
2010-06-12 03:20 EDT, Tore Anderson
no flags Details
/var/log/messages (with --log-level=debug) (68.80 KB, text/plain)
2010-06-15 17:00 EDT, Tore Anderson
no flags Details
Random disconnection happening on wireless LAN (84.07 KB, application/x-gzip)
2010-07-08 03:15 EDT, Tore Anderson
no flags Details

  None (edit)
Description Tore Anderson 2010-06-12 03:20:51 EDT
Created attachment 423453 [details]
/var/log/messages

Description of problem:

Sometimes, NetworkManager randomly deactivates the device and starts to connect to the network from the beginning again.  It seems to happen quite often right after a reboot, but I've seen it happen randomly after the connection has been working fine for hours.  Usually it's stable though.  It might be connected to IPv6 being enabled.

Version-Release number of selected component (if applicable):

NetworkManager-0.8.1-0.1.git20100510.fc13.x86_64

How reproducible:

Not 100%, but happens often enough.

Steps to Reproduce:
1. Reboot or connect to the network
  
Actual results:

NM connects fine, marking the connection as up.  A few seconds later it disconnects, and tries again.  Usually it stays up after attempt #2.

Expected results:

That the first connection that was made stayed up.

Additional info:

Attaching /var/log/messages from a complete boot where it happened.  I haven't edited it, but the fun part seems to be around 09:01:40 - 09:02:00, where you can see the device activation happening twice:

Jun 12 09:01:40 wrath NetworkManager[1462]: <info> Activation (eth0) Stage 3 of 5 (IP Configure Start) started...
Jun 12 09:01:52 wrath NetworkManager[1462]: <warn> Activation (eth0) failed.
Jun 12 09:01:56 wrath NetworkManager[1462]: <info> Activation (eth0) Stage 3 of 5 (IP Configure Start) started...
Comment 1 Dan Williams 2010-06-15 14:24:09 EDT
Can you try to trigger the problem again, but grab logs like so:

service NetworkManager stop
/usr/sbin/NetworkManager --no-daemon --log-level=debug

and we'll see what's going on there? Thanks!
Comment 2 Tore Anderson 2010-06-15 17:00:56 EDT
Created attachment 424296 [details]
/var/log/messages (with --log-level=debug)

I added --log-level=debug to the init script instead, hope that's no problem.  I'm attaching the content of /var/log/messages starting when NM was started and continuing until the connection was stable.

Tore
Comment 3 Dan Williams 2010-06-28 16:20:03 EDT
Jun 15 22:53:24 wrath NetworkManager[4102]: <debug> [1276635204.717822] [nm-system.c:223] sync_addresses(): (eth0): syncing addresses (family 10)
Jun 15 22:53:24 wrath NetworkManager[4102]: <debug> [1276635204.717858] [nm-system.c:276] sync_addresses(): (eth0): removing address '2001:16d8:ee47:0:230:1bff:febc:7f23/64'
Jun 15 22:53:24 wrath NetworkManager[4102]: <debug> [1276635204.717905] [nm-system.c:259] sync_addresses(): (eth0): ignoring IPv6 link-local address
Jun 15 22:53:24 wrath NetworkManager[4102]: <debug> [1276635204.717921] [nm-system.c:312] sync_addresses(): (eth0): adding address '2001:16d8:ee47::bbb6/0'
Jun 15 22:53:24 wrath avahi-daemon[1295]: Withdrawing address record for 2001:16d8:ee47:0:230:1bff:febc:7f23 on eth0.
Jun 15 22:53:24 wrath avahi-daemon[1295]: Registering new address record for fe80::230:1bff:febc:7f23 on eth0.*.
Jun 15 22:53:25 wrath NetworkManager[4102]: <info> Policy set 'System eth0' (eth0) as default for IPv4 routing and DNS.
Jun 15 22:53:25 wrath NetworkManager[4102]: <info> Policy set 'System eth0' (eth0) as default for IPv6 routing and DNS.
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.748807] [nm-ip6-manager.c:756] netlink_notification(): netlink notificate type 21
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.748826] [nm-ip6-manager.c:472] process_addr(): processing netlink new/del address message
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.748945] [nm-ip6-manager.c:784] netlink_notification(): (eth0): syncing device with netlink changes
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.748958] [nm-ip6-manager.c:349] nm_ip6_device_sync_from_netlink(): (eth0): syncing with netlink (ra_flags 0x800000F0) (state/target 'got-address'/'got-address')
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.749018] [nm-ip6-manager.c:368] nm_ip6_device_sync_from_netlink(): (eth0): netlink address: fe80::230:1bff:febc:7f23
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.749070] [nm-ip6-manager.c:390] nm_ip6_device_sync_from_netlink(): (eth0): addresses synced (state got-address)
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.749582] [nm-ip6-manager.c:401] nm_ip6_device_sync_from_netlink(): router advertisement deferred to DHCPv6
Jun 15 22:53:25 wrath NetworkManager[4102]: <debug> [1276635205.749595] [nm-ip6-manager.c:442] nm_ip6_device_sync_from_netlink(): (eth0): RA-provided address no longer valid
Jun 15 22:53:25 wrath NetworkManager[4102]: <info> (eth0): device state change: 8 -> 9 (reason 5)
Jun 15 22:53:25 wrath NetworkManager[4102]: <warn> Activation (eth0) failed.

So this looks like the address '2001:16d8:ee47::bbb6/0' gets added to the interface, which is supposed to be the address from DHCP.  I'm a bit worried about the '0' there, but on a closer look that may be harmless.

But then just a second later that address is not present on the interface; during nm_ip6_device_sync_from_netlink() all of the interface's IP addresses are printed out and that address isn't there...  Need some more investigation.
Comment 4 Tore Anderson 2010-07-08 03:15:05 EDT
Created attachment 430256 [details]
Random disconnection happening on wireless LAN

I've got this problem happening on my laptop, too.  (Or perhaps it could be a separate issue?)  Attaching the debug log (it's quite large as it took some time before it happened.  Looking at it, I see it happening here:


Jul  8 09:05:17 envy NetworkManager[4545]: <debug> [1278572717.1868] [nm-ip6-manager.c:252] rdnss_expired(): (wlan0): IPv6 RDNSS information expired
Jul  8 09:05:17 envy NetworkManager[4545]: <debug> [1278572717.1932] [nm-ip6-manager.c:286] set_rdnss_timeout(): (wlan0): removing expired RA-provided nameserver 2001:16d8:ee47::
Jul  8 09:05:17 envy NetworkManager[4545]: <info> (wlan0): device state change: 8 -> 9 (reason 5)

Perhaps packet loss on the WLAN (I was running a yum upgrade at the time, so the bandwidth was saturated) made the ICMPv6 RA to become lost, and that there's a bug that makes NM tear down the entire connection (including IPv4 connectivity) if the RDNSS server expire?

Tore
Comment 5 Dan Williams 2010-07-20 16:14:07 EDT
We could make the timer for RDNSS longer, but if you don't have usable IPv6 nameservers, you don't really have full IPv6 connectivity...  and since we're supposed to honor the RDNSS timeout, I'm not entirely sure what to do here.  There's obviously still the issue of the IPv6 and IPv4 being tied together to a degree, but when the RDNSS info expires then we're really expected to clear that out.

Now, since an admin changing your DNS servers is a lot less likely than WLAN packet loss, perhaps we should increase the timeout before we really consider the RDNSS servers expired...
Comment 6 Tore Anderson 2010-07-26 10:59:33 EDT
I think the RDNSS lifetime is supposed to be independent of the prefix lifetime.  By default radvd will announce a router lifetime of 1800s, a prefix lifetime of 86400 (preferred 14400), and RDNSS lifetime 600.  I think you can also add on several other RDNSSes and prefixes with completely different lifetimes.

IMHO, if the RDNSS expire it should be removed from resolv.conf but it should not interrupt other things like the configured IPv6 (or IPv4) addresses or routers.

However, in the case that a RDNSS expiry would bring the number of servers in resolv.conf down to zero, it could be wise to delay removing it until a new server is available (regardless of how it's been learnt).

Tore
Comment 7 Bug Zapper 2011-06-02 07:09:10 EDT
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 8 Bug Zapper 2011-06-27 14:03:53 EDT
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.