Bug 675226 - NetworkManager times out too early when waiting for IPv6 address autoconfiguration on wireless networks
Summary: NetworkManager times out too early when waiting for IPv6 address autoconfigur...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 19
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Dan Williams
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-02-04 16:46 UTC by Neil Horman
Modified: 2015-02-17 13:36 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-02-17 13:36:47 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
GNOME Bugzilla 669302 0 Normal RESOLVED NetworkManager times out too early when waiting for IPv6 address autoconfiguration on wireless networks 2020-02-12 07:38:09 UTC

Description Neil Horman 2011-02-04 16:46:11 UTC
Description of problem:
I rolled out IPv6 the other day at my house and found it worked great on all my wired systems, but my wireless ipv6 enabled systems all of a sudden started getting disconnected from the wireless network,  I did some digging and found that NetworkManager seems to have 2 shortcommings in how it handles wireless IPv6 autoconfiguration:

1) The timeout that it sets for address autoconfiguration is hard coded to 20 seconds.  This may be acceptible for some situations, but it ignores the fact that router advertisement daemons have a configurable advertisement period.  If the advert period on radvd is greater than than 20 seconds, we risk getting erroneously disconnected from the network simply because the host system hasn't seen an RA frame yet.  Granted we nominally should send out an RS on link up, but if that gets lots or is otherwise not responded to, this breaks.  That timeout should be configurable so that it can be set in accordance with the local router advert daemon.

2) The timeout in question doesn't take DAD time into account.  It seems NM moves its addrconf state for in interface to a completed state only after it passes duplicate address detection.  Dad can last for a configurable length of time, and on wireless networks, because of section 4.1 here:
http://tools.ietf.org/html/draft-daniel-ipv6-over-wifi-01
DAD _must_ run for the maximum possible time on that interface, which, in conjunction with the above, can easily surpass the NM hard coded addr autoconf timeout. Once we have an address, and DAD has begun (this can be determined by thje interrogation of the tentative flag in rtnetlink for an address), NM should wait indefinately, untill the address is removed (the dad failed case), or the tentative flag is cleared (the dad succeded case).


Version-Release number of selected component (if applicable):


How reproducible:
frequently

Steps to Reproduce:
1. configure a local radvd daemon to advertize a global prefix, and configure its minimum advertisement interval to be > 20 seconds

2. configure a wireless client to enable DAD on the wireless interface via echo 1 > /proc/sys/net/ipv6/<ifc>/accept_dad

3.Make sure NM is configured to attach to a wireless AP that can forward RA's from the router over the wireless link, and make sure that that networks is set to automatic in the IPv6 tab in NM.

4. Bring up that wireless link in NM
  
Actual results:
NM will indicate autoconfiguration has begun, and will time out.


Expected results:
NM will wait at least as long as the Routers RA min interval before timing out, and as long as necessecary after dad has begin before deciding address autoconfig has failed

Additional info:
Setting a wireless networks IPv6 config to "Ignore" in NM is a decent workaround in the short term, but it creates the converse problem in that, if people want their link to go down, if IPv6 connectivity fails, then they can't do that.

Comment 1 joshua 2011-05-09 21:17:40 UTC
This is an issue even without NetworkManager.  IPv6 router advertisements should cause the IPv6 stack on the interface to assign itself a non-link-local address... which doesn't happen.  Gentoo and Ubuntu, without NM, both do... something not necessarily NM related is broken in Fedora 14 here.

Comment 2 Tomasz Torcz 2011-05-10 11:36:50 UTC
Joshua, the cause could be other. RA are not taken into account when sysctl net.ipv6.conf.all.forwarding is set to "1". It often happens when running some virtualisation.  For RA to be obeyed with .forwarding=1, you need net.ipv6.conf.INTERFACE.accept_ra set to "2". Please check if it is the case.

Comment 3 joshua 2011-05-16 19:58:26 UTC
Sorry, IPv6_AUTOCONF wasn't set to "1" in /etc/sysconfig/network ... my fault

Comment 4 Neil Horman 2011-12-20 19:21:49 UTC
Dan, whats going on with this, it seems like it should be pretty easy to fix.

Comment 5 Neil Horman 2012-02-03 12:02:19 UTC
This has been a problem for several releases now, moving this to rawhide.  I'll file an upstream bug too.

Comment 6 Pavel Šimerda (pavlix) 2012-03-19 08:58:45 UTC
I believe many things in NetworkManager IPv6 support should be reconsidered, especially timeouts and disconnections because of various non-fatal timeouts.

Comment 7 Dan Winship 2012-05-03 18:31:20 UTC
(In reply to comment #0)
> Granted we nominally should send out an RS on link up,
> but if that gets lots or is otherwise not responded to, this breaks.  That
> timeout should be configurable so that it can be set in accordance with the
> local router advert daemon.

Ew. This is *auto*-configuration. Having to fine-tune it for the exact details of your router kinda ruins the point.

The kernel does send out an RS on link up. And if it doesn't get back an answer, it sends out more. So in theory, if you actually do have usable network connectivity, you should get an RA back within a few seconds.

> 2) The timeout in question doesn't take DAD time into account.  It seems NM
> moves its addrconf state for in interface to a completed state only after it
> passes duplicate address detection.

Yes. The kernel only announces the existence of the address after DAD has completed. (Though we could infer that DAD has started when we get an RTM_NEWPREFIX.)

> DAD can last for a configurable length of time

Well, the recommended default time is 1 second, so that's probably not a big issue. But yes, we probably should stop the timer before DAD anyway; the timer is so that we don't wait forever on a network that *doesn't* have IPv6; once we get an RA, we know that's not the case.

> 1. configure a local radvd daemon to advertize a global prefix, and configure
> its minimum advertisement interval to be > 20 seconds

radvd's default value for MinRtrAdvInterval is 198 (0.33 * 600), and I have no problem getting an IPv6 address over wifi with that value.


I suspect this is a dup of either bug 785772 (which is now fixed in F17) or bug 753482 (which might be fixed by packages in koji linked from there).

Comment 8 Neil Horman 2012-05-03 18:56:52 UTC
Please read this more closely, this isn't a dup of 785772.  I'm not having problems with the kernel trying to add default routes.  Nor am I having problems with periodic loss of IPv6.  I'm having an issue with NetworkManager taking down an interface with DAD not completing within the hard coded limit that NetworkManager enforces on it.  If a different soilcitation interval is defined on a host, or if more dad probes are configured, its possible for DAD to outlast what NM considers "too" long, and the interface is taken down.  See the upstream bug for details.

Comment 9 Dan Williams 2012-05-03 22:35:47 UTC
Part of the timer's purpose is to bound the entire process of IPv6 autoconfiguration so that it doesn't continue on forever for some reason.  There are more interactions in the process with IPv6 than with IPv4, so we can't just have a timeout on DHCP and call it a day.

The problem with stopping the timer at RTM_NEWPREFIX is that there's an arbitrary amount of time between RTM_NEWPREFIX and the address actually showing up on the interface.  If for some reason this fails to happen we do need to fail the activation.  If the DAD cycle's length is completely silly then I'm not particularly inclined to accommodate that.

But all that said the current value of 25 seconds is probably too short.  That value was chosen before we had the parallel configuration that landed in 0.9.4.  So even if we do bump the value up, as long as you have IPv4 connectivity you won't notice a thing if IPv6 takes longer to fail because you don't have IPv6 enabled on the network.  That (as danw suggests) was the main reason.  And if you have IPv6 only and it takes 45 seconds, well, perhaps that's your problem :)

I suggest a value of perhaps 45 seconds, including DAD.  If it takes longer than that, then we need a really, really good reason to up the timeout, and "I configured more DAD probes" is not a really good reason.  Neither is "my router is configured to send RAs at most once every 60 seconds."

Comment 10 Fedora End Of Life 2013-04-03 14:21:30 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 11 Pavel Šimerda (pavlix) 2013-05-07 15:37:42 UTC
(In reply to comment #9)
> I suggest a value of perhaps 45 seconds, including DAD.  If it takes longer
> than that, then we need a really, really good reason to up the timeout, and
> "I configured more DAD probes" is not a really good reason.  Neither is "my
> router is configured to send RAs at most once every 60 seconds."

Has this been fixed?

Comment 12 Dan Williams 2013-07-23 13:46:14 UTC
Since 0.9.8 NM has sent Router Solicitations to ensure that the router sends an RA while NM is performing addressing.  So at least #1 should be fixed already.

Comment 13 Fedora End Of Life 2015-01-09 16:32:50 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 14 Fedora End Of Life 2015-02-17 13:36:47 UTC
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.