Created attachment 838675 [details] ipv6-solicit-loop.txt NM can get into a state where it continuously solicits the router for an advertisement, which gets delivered, which causes another solicitation, repeat forever.
Tore, can you attach your radvd.conf, or otherwise inform me what the expected DNS server lifetime is from a wireshark capture? There may be something wrong with NM's handling of the RDNSS and DNSSL lifetimes and timeouts that is causing the continuous solicitation.
I can confirm seeing this, and that removing RDNSS and DNSSL settings stops it. The radvd.conf was: interface br0 { AdvSendAdvert on; prefix 2001:8b0:bd:1::/64 { AdvOnLink on; AdvAutonomous on; }; RDNSS 2001:8b0:bd:1:204:e2ff:fe22:eba9 2001:8b0:bd:1:2b0:c2ff:fe02:4cf3 2001:8b0:bd:1:be5f:f4ff:fe45:81fd { }; DNSSL compton.nu { }; }; Wireshark reports 600 as the advertised lifetime of the DNS details.
Created attachment 838983 [details] /var/log/messages with NM debugging Attached is /var/log/messages with NM debugging output enabled from when I've reproduced the bug. I'm using NetworkManager-0.9.9.0-20.git20131003.fc20.x86_64 on wired ethernet. The log starts as I'm enabling networking at 10:06:30. It successfully connects and activates IPv6 using SLAAC and information-only DHCPv6, and for a while it is smooth sailing. There are some errors during that time, however: [rdisc/nm-lndp-rdisc.c:194] send_rs(): (enp2s0): cannot send router solicitation: -99. [platform/nm-linux-platform.c:1127] add_object(): Netlink error: Invalid address for specified address family :: inet6 dev 2 scope nowhere created boot-time+ <99>t<D8>C^? last-updated boot-time+ <99>t<D8>C^? I don't know if these have any relevance on this bug, just thought it prudent to point them out. At 11:12:01 the real fun starts. At this point it starts sending RSes and receives RAs in a close loop, making NM hog the CPU and fill the logs. It continues like this until I manually disabled networking again at 11:21:18. In those 10 minutes of madness NM produced about 1GiB with debugging output, so make sure you have sufficient space where you uncompress the file. Tore
Created attachment 838984 [details] PCAP of RS/RA/DHCPv6 messages This file contains a PCAP of all the RS/RA/DHCPv6 activity seen on the host triggering the bug. It's all smooth sailing until frame 16 at 11:21:01, beginning with an RS is being sent by NM, and the looping begins. Note that there are some unrelated RS/RA messages before the bug was triggered (e.g. frames 5 and 6), these were caused by another host on the LAN and can probably be ignored. The router in question is an AVM FritzBox 7390, so I cannot get the radvd.conf (or whatever it's using) config file. However, to respond specifically to comment #1, the PCAP shows that the RDNSS lifetime is consistently being advertised at 1200. Tore
Fixes pushed to upstream branch dcbw/rdisc-fixes. See upstream bug report (noted above) for review progress.
Created attachment 839155 [details] rh1044757-ipv6-solicit-infinity.patch
Tore (or Tom!), could you try the attached patch? It should apply successfully to the current NetworkManager Fedora 20 SRPM.
(In reply to Dan Williams from comment #7) > Tore (or Tom!), could you try the attached patch? It should apply > successfully to the current NetworkManager Fedora 20 SRPM. It's been stable for two hours now on both my laptop (using WiFi) and my desktop (wired). When I reproduced it earlier the looping began about 65 minutes after activating the connection, so if the bug indeed is based on the timers expiring due to not being reset, I guess it should have been triggered after the same amount of time. Which means that the fact that it has not yet is a good sign. On the other hand, that it was the RDNSS timer expiring that triggered it doesn't make fully sense to me either. My router advertises a RDNSS lifetime of 1200 seconds, but it took over an hour for the bug to trigger. If it was the RDNSS lifetime, shouldn't the madness have started after only 20 minutes, instead? Tore
To ensure stability of the connection, NetworkManager enforces a minimum RDNSS and DNSSL lifetime of 7200 seconds. That does seem a bit long, so we could reduce that substantially. But this is why you didn't see the issue after 20 minutes. This *is* why you saw the issue after 60+ minutes, because... NetworkManager sends the Router Solicitation at half the lifetime of the DNSSL or RDNSS option, and this is why you only saw the solicit-loop after a significant amount of time had passed. The intention of this minimum time is to counter a problem we had earlier this year with routers that had significantly shorter RDNSS lifetimes than address lifetimes. If the router did not send advertisements often, or was misconfigured to send them less often than the RDNSS timeout, NetworkManager would tear down the connection because the DNS servers expired. NM should probably stop doing this, and just let users suffer without DNS rather than terminating the connection entirely. Now that we have implemented router solicitations when the DNSSL and RDNSS options are near their expiration (much like DHCP rebinds before the lease has expired), this should be much less of an issue.
I should note that NM *should* correctly handle RDNSS/DNSSL lifetimes of '0', meaning "stop using this [server | domain]". Only if the lifetime is > 0 is it padded to 7200.
dcbw/rdisc-fixes merged to master upstream after reviews.
NetworkManager-0.9.9.0-22.git20131003.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-22.git20131003.fc20
Package NetworkManager-0.9.9.0-22.git20131003.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing NetworkManager-0.9.9.0-22.git20131003.fc20' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-23712/NetworkManager-0.9.9.0-22.git20131003.fc20 then log in and leave karma (feedback).
NetworkManager-0.9.9.0-22.git20131003.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
*** Bug 1045860 has been marked as a duplicate of this bug. ***
This is still happening with NetworkManager-0.9.9.0-22.git20131003.fc20.x86_64. I'm using a DLink 825 configured to use a Hurricane Electric IPv6 tunnel, for what it's worth.
On Fedora 20 with also IPv6 via Hurricane Electric tunnel this is also happening could this be more related to HE ipv6? Version: NetworkManager.x86_64 1:0.9.9.0-22.git20131003.fc20 Regards Robert
Can anyone with a Hurricane tunnel get a wireshark capture of the Router Advertisement for me? Also, can you grab log messages from the journal from before the problem starts so I can see what's going on?
In my case this IPv6 router solicitation loop *started* with NetworkManager-0.9.9.0-22.git20131003.fc20.x86_64 (0.9.9.0-21 is OK). I can see <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) scheduled... <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) started... <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) complete. repeating every 3s. Strangely I don't see any RS/RA with my ll address in wireshark.
(In reply to Jiri Popelka from comment #19) > In my case this IPv6 router solicitation loop *started* with > NetworkManager-0.9.9.0-22.git20131003.fc20.x86_64 (0.9.9.0-21 is OK). > > I can see > <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) scheduled... > <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) started... > <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) complete. > repeating every 3s. > > Strangely I don't see any RS/RA with my ll address in wireshark. This would be happening when the kernel makes a change to the routing table, this is NetworkManager reading the kernel configuration and updating its internal structures. These messages should likely be suppressed for external changes.
I also see this kind of message, every 5-10s in my case. NetworkManager[605]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) scheduled... NetworkManager[605]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) started... NetworkManager[605]: <error> [1390480120.831917] [platform/nm-linux-platform.c:1127] add_object(): Netlink error: Invalid address for specified address family NetworkManager[605]: :: inet6 dev 2 scope nowhere NetworkManager[605]: valid-lifetime forever preferred-lifetime forever NetworkManager[605]: [54B blob data] NetworkManager[605]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) complete. rpm -q NetworkManager NetworkManager-0.9.9.0-26.git20131003.fc20.x86_64
(In reply to hansvon from comment #21) > I also see this kind of message, every 5-10s in my case. > > NetworkManager[605]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) > scheduled... > NetworkManager[605]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) > started... > NetworkManager[605]: <error> [1390480120.831917] > [platform/nm-linux-platform.c:1127] add_object(): Netlink error: Invalid > address for specified address family > NetworkManager[605]: :: inet6 dev 2 scope nowhere > NetworkManager[605]: valid-lifetime forever preferred-lifetime forever > NetworkManager[605]: [54B blob data] > NetworkManager[605]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) > complete. > > > rpm -q NetworkManager > NetworkManager-0.9.9.0-26.git20131003.fc20.x86_64 This should be covered by bug 1048046 and is fixed upstream.
Thank you, I confirm, the patch fixes the "Netlink" error. For now, I also commented the "nm_log_info" calls to get rid of the three "Stage 5 of 5 (IPv6 Commit)" messages.
(In reply to hansvon from comment #23) > Thank you, I confirm, the patch fixes the "Netlink" error. For now, I also > commented the "nm_log_info" calls to get rid of the three "Stage 5 of 5 > (IPv6 Commit)" messages. Those "Stage 5 of 5" messages are also now suppressed upstream and should make it to an F20 build at some point here.
NetworkManager-0.9.9.0-27.git20131003.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-27.git20131003.fc20
NetworkManager-0.9.9.0-27.git20131003.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
(In reply to Dan Williams from comment #24) > Those "Stage 5 of 5" messages are also now suppressed upstream and should > make it to an F20 build at some point here. I can confirm this with NetworkManager-0.9.9.0-27.git20131003.fc20.x86_64 Thanks.