Red Hat Bugzilla – Bug 720188
Managed DHCPv6 should not short-circuit SLAAC (causes a loop of device activation attempts/failures)
Last modified: 2012-08-07 12:02:15 EDT
Created attachment 512107 [details]
NetworkManager debug output
Description of problem:
When attempting to connect to my network, which are using SLAAC and stateful DHCPv6 simultaneously for configuration, NetworkManager will activate the device, only to instantly fail it while logging a failure (reason 'ip-config-unavailable'), and start over. Given enough attempts (I've seen 50+ being necessary), it will finally connect and stay connected.
Version-Release number of selected component (if applicable):
Happens more often than 9 out of 10 times, I'd say.
Steps to Reproduce:
1. Attempt to connect to the network with IPv6 mode automatic.
The activation will fail right after the systray applet has reported that the activation was successful, then the activation process wil restart again and again and again until it finally succeeds.
That the network connection would reliably be activated on the first attempt.
I'm attaching a syslog containing debug output from NetworkManager while reproducing the problem (it required 38 attempts to finally connect). I disabled IPv4 to reduce the noise in the logs, but it happens when the IPv4 mode is set to «Automatic», too (the network is dual-stacked with DHCPv4 service). I'm also attaching a tcpdump containing all ICMPv6 RS/RA and DHCPv6 packets seen on the wire during the same time.
I find one log message that is appearing right before the device is failing particularly suspicious, considering that the Valid Lifetime of the RA-provided address is exactly 30 days:
_ip6_device_sync_from_netlink(): (eth0): RA-provided address no longer valid
The router on the network is a ZyXEL P-2812HNU-F3. I also tried connecting Windows 7 to the same network and had no similar problems.
I'll be happy to provide any further information if necessary.
Created attachment 512108 [details]
ICMPv6 RS/RA and DHCPv6 packet dump
FYI, I still get this behaviour with NetworkManager-0.8.9997-6.git20110721.fc15.x86_64.
I've been trying to figure out what's going on here, and I see that when the connections fail, NetworkManager is first removing the kernel-configured SLAAC addresses from the device. From the first attempt in the attached syslog:
<debug> [nm-system.c:222] sync_addresses(): (eth0): syncing addresses (family 10)
<debug> [nm-system.c:275] sync_addresses(): (eth0): removing address 2001:840:3033:10:230:1bff:febc:7f23/64'
Later, nm_ip6_device_sync_from_netlink() is called. It has a loop with the comment «Look for any IPv6 addresses the kernel may have set for the device» that walks the list of addresses on the device:
<debug> [nm-ip6-manager.c:417] nm_ip6_device_sync_from_netlink(): (eth0): syncing with netlink (ra_flags 0x80000070) (state/target 'got-address'/'got-address')
<debug> [nm-ip6-manager.c:436] nm_ip6_device_sync_from_netlink(): (eth0): netlink address: fe80::230:1bff:febc:7f23
<debug> [nm-ip6-manager.c:458] nm_ip6_device_sync_from_netlink(): (eth0): addresses synced (state got-address)
Since the SLAAC-assigned address was removed a bit earlier, the loop doesn't run across it, and therefore never sets the «found_other» boolean to TRUE.
However, a bit further down in the function, the «found_other» boolean is checked, and if it isn't set, NM considers it to «have disappeared for some reason», and therefore fails the connection:
/* If for some reason an RA-provided address disappeared, we need
* to make sure we fail the connection as it's no longer valid.
<debug> [nm-ip6-manager.c:510] nm_ip6_device_sync_from_netlink(): (eth0): RA-provided address no longer valid
The reason why the RA-provided address disappeared was because NM explicitly removed it moments earlier, so it all doesn't make much sense...
Question is, *why* did NM remove the RA-provided address in the first place? I haven't figured that out yet, but I will continue looking the next time I get the time to debug further.
When the connection finally succeeds, the RA-provided address isn't removed by NM. I don't know what is different about that activation that allows it to succeed. I suspect some kind of a race condition, though.
I just posted a patch to the networkmanager mailing list, with the following description (will also attach the patch here for reference):
NetworkManager currently operates on the assumption that Managed
(Stateful) DHCPv6 preempts SLAAC. This is not the case; Managed DHCPv6
and SLAAC are completely orthogonal. My consumer-grade xDSL CPE (a ZyXEL
P-2812HNU-F3) does both at the same time by default, which is a
necessity to trigger the following bug:
Currently NetworkManager will abandon SLAAC activation if it sees that
Managed DHCPv6 is requested by the RA. As far as I have been able to
understand, this makes NetworkManager overlook the kernel-configured
SLAAC address, which in turn makes sync_addresses() remove it again at
a later stage, as it's being considered as an "unwanted alien" of some
However, right after the device activation has finished,
nm_ip6_device_sync_from_netlink() is run, which notices that the SLAAC
address has vanished, and figures (incorrectly) that it must have been
because the Valid Lifetime has reached zero and that the kernel has
therefore removed it. In response, nm_ip6_device_sync_from_netlink()
deactivates the entire interface, and the activation process starts over
again. Given enough attempts (more than a dozen most of the time, and
sometimes more than fifty has been necessary) NM will eventually manage
to permanently activate the interface, though I don't know exactly what
conditions are necessary for the activation to be a lasting success.
This patch fixes the problem completely for me, the device is now being
successfully activated on the first attempt every single time. It simply
removes the flawed assumption that Managed DHCPv6 short-circuits SLAAC,
and makes NM complete the SLAAC process regardless of Managed DHCPv6
being requested or not.
Created attachment 517703 [details]
Don't let managed DHCPv6 preempt SLAAC
Comment on attachment 517703 [details]
Don't let managed DHCPv6 preempt SLAAC
The patch turned out to be no good, as it breaks Managed DHCPv6 operation when there's no SLAAC at all.
I'm pretty sure I'm correct about the root cause why SLAAC+DHCPv6 operation is so unreliable, though...
This message is a notice that Fedora 15 is now at end of life. Fedora
has stopped maintaining and issuing updates for Fedora 15. It is
Fedora's policy to close all bug reports from releases that are no
longer maintained. At this time, all open bugs with a Fedora 'version'
of '15' have been closed as WONTFIX.
(Please note: Our normal process is to give advanced warning of this
occurring, but we forgot to do that. A thousand apologies.)
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, feel free to reopen
this bug and simply change the 'version' to a later Fedora version.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we were unable to fix it before Fedora 15 reached end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora, you are encouraged to click on
"Clone This Bug" (top right of this page) and open it against that
version of Fedora.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here: