Bug 1044757 - Continuous IPv6 router solicitation loop
Summary: Continuous IPv6 router solicitation loop
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Dan Williams
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1045860 (view as bug list)
Depends On:
Blocks: 1207730
TreeView+ depends on / blocked
 
Reported: 2013-12-18 22:40 UTC by Dan Williams
Modified: 2015-03-31 15:06 UTC (History)
13 users (show)

Fixed In Version: NetworkManager-0.9.9.0-27.git20131003.fc20
Clone Of:
: 1207730 (view as bug list)
Environment:
Last Closed: 2013-12-22 05:36:13 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
ipv6-solicit-loop.txt (5.58 KB, text/plain)
2013-12-18 22:40 UTC, Dan Williams
no flags Details
/var/log/messages with NM debugging (16.93 MB, application/x-xz)
2013-12-19 13:24 UTC, Tore Anderson
no flags Details
PCAP of RS/RA/DHCPv6 messages (983.82 KB, application/x-xz)
2013-12-19 13:33 UTC, Tore Anderson
no flags Details
rh1044757-ipv6-solicit-infinity.patch (13.92 KB, patch)
2013-12-19 18:32 UTC, Dan Williams
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
GNOME Bugzilla 720760 0 None None None Never

Description Dan Williams 2013-12-18 22:40:33 UTC
Created attachment 838675 [details]
ipv6-solicit-loop.txt

NM can get into a state where it continuously solicits the router for an advertisement, which gets delivered, which causes another solicitation, repeat forever.

Comment 1 Dan Williams 2013-12-18 22:46:12 UTC
Tore, can you attach your radvd.conf, or otherwise inform me what the expected DNS server lifetime is from a wireshark capture?  There may be something wrong with NM's handling of the RDNSS and DNSSL lifetimes and timeouts that is causing the continuous solicitation.

Comment 2 Tom Hughes 2013-12-19 00:54:50 UTC
I can confirm seeing this, and that removing RDNSS and DNSSL settings stops it. The radvd.conf was:

interface br0
{
	AdvSendAdvert on;

	prefix 2001:8b0:bd:1::/64
	{
		AdvOnLink on;
		AdvAutonomous on;
	};

	RDNSS 2001:8b0:bd:1:204:e2ff:fe22:eba9 2001:8b0:bd:1:2b0:c2ff:fe02:4cf3 2001:8b0:bd:1:be5f:f4ff:fe45:81fd {
	};

	DNSSL compton.nu {
	};
};

Wireshark reports 600 as the advertised lifetime of the DNS details.

Comment 3 Tore Anderson 2013-12-19 13:24:13 UTC
Created attachment 838983 [details]
/var/log/messages with NM debugging

Attached is /var/log/messages with NM debugging output enabled from when I've reproduced the bug. I'm using NetworkManager-0.9.9.0-20.git20131003.fc20.x86_64 on wired ethernet.

The log starts as I'm enabling networking at 10:06:30. It successfully connects and activates IPv6 using SLAAC and information-only DHCPv6, and for a while it is smooth sailing. There are some errors during that time, however:

[rdisc/nm-lndp-rdisc.c:194] send_rs(): (enp2s0): cannot send router solicitation: -99.
[platform/nm-linux-platform.c:1127] add_object(): Netlink error: Invalid address for specified address family
:: inet6 dev 2 scope nowhere
created boot-time+ <99>t<D8>C^? last-updated boot-time+ <99>t<D8>C^?

I don't know if these have any relevance on this bug, just thought it prudent to point them out.

At 11:12:01 the real fun starts. At this point it starts sending RSes and receives RAs in a close loop, making NM hog the CPU and fill the logs. It continues like this until I manually disabled networking again at 11:21:18. In those 10 minutes of madness NM produced about 1GiB with debugging output, so make sure you have sufficient space where you uncompress the file.

Tore

Comment 4 Tore Anderson 2013-12-19 13:33:17 UTC
Created attachment 838984 [details]
PCAP of RS/RA/DHCPv6 messages

This file contains a PCAP of all the RS/RA/DHCPv6 activity seen on the host triggering the bug. It's all smooth sailing until frame 16 at 11:21:01, beginning with an RS is being sent by NM, and the looping begins. Note that there are some unrelated RS/RA messages before the bug was triggered (e.g. frames 5 and 6), these were caused by another host on the LAN and can probably be ignored.

The router in question is an AVM FritzBox 7390, so I cannot get the radvd.conf (or whatever it's using) config file. However, to respond specifically to comment #1, the PCAP shows that the RDNSS lifetime is consistently being advertised at 1200.

Tore

Comment 5 Dan Williams 2013-12-19 18:15:36 UTC
Fixes pushed to upstream branch dcbw/rdisc-fixes.  See upstream bug report (noted above) for review progress.

Comment 6 Dan Williams 2013-12-19 18:32:29 UTC
Created attachment 839155 [details]
rh1044757-ipv6-solicit-infinity.patch

Comment 7 Dan Williams 2013-12-19 18:33:21 UTC
Tore (or Tom!), could you try the attached patch?  It should apply successfully to the current NetworkManager Fedora 20 SRPM.

Comment 8 Tore Anderson 2013-12-19 22:38:26 UTC
(In reply to Dan Williams from comment #7)
> Tore (or Tom!), could you try the attached patch?  It should apply
> successfully to the current NetworkManager Fedora 20 SRPM.

It's been stable for two hours now on both my laptop (using WiFi) and my desktop (wired). When I reproduced it earlier the looping began about 65 minutes after activating the connection, so if the bug indeed is based on the timers expiring due to not being reset, I guess it should have been triggered after the same amount of time. Which means that the fact that it has not yet is a good sign.

On the other hand, that it was the RDNSS timer expiring that triggered it doesn't make fully sense to me either. My router advertises a RDNSS lifetime of 1200 seconds, but it took over an hour for the bug to trigger. If it was the RDNSS lifetime, shouldn't the madness have started after only 20 minutes, instead?

Tore

Comment 9 Dan Williams 2013-12-20 00:00:33 UTC
To ensure stability of the connection, NetworkManager enforces a minimum RDNSS and DNSSL lifetime of 7200 seconds.  That does seem a bit long, so we could reduce that substantially.  But this is why you didn't see the issue after 20 minutes.  This *is* why you saw the issue after 60+ minutes, because...

NetworkManager sends the Router Solicitation at half the lifetime of the DNSSL or RDNSS option, and this is why you only saw the solicit-loop after a significant amount of time had passed.

The intention of this minimum time is to counter a problem we had earlier this year with routers that had significantly shorter RDNSS lifetimes than address lifetimes.  If the router did not send advertisements often, or was misconfigured to send them less often than the RDNSS timeout, NetworkManager would tear down the connection because the DNS servers expired.

NM should probably stop doing this, and just let users suffer without DNS rather than terminating the connection entirely.  Now that we have implemented router solicitations when the DNSSL and RDNSS options are near their expiration (much like DHCP rebinds before the lease has expired), this should be much less of an issue.

Comment 10 Dan Williams 2013-12-20 00:01:41 UTC
I should note that NM *should* correctly handle RDNSS/DNSSL lifetimes of '0', meaning "stop using this [server | domain]".  Only if the lifetime is > 0 is it padded to 7200.

Comment 11 Dan Williams 2013-12-20 00:03:55 UTC
dcbw/rdisc-fixes  merged to master upstream after reviews.

Comment 12 Fedora Update System 2013-12-20 03:03:06 UTC
NetworkManager-0.9.9.0-22.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-22.git20131003.fc20

Comment 13 Fedora Update System 2013-12-21 02:15:36 UTC
Package NetworkManager-0.9.9.0-22.git20131003.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing NetworkManager-0.9.9.0-22.git20131003.fc20'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-23712/NetworkManager-0.9.9.0-22.git20131003.fc20
then log in and leave karma (feedback).

Comment 14 Fedora Update System 2013-12-22 05:36:13 UTC
NetworkManager-0.9.9.0-22.git20131003.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 15 Seb L. 2014-01-04 18:08:36 UTC
*** Bug 1045860 has been marked as a duplicate of this bug. ***

Comment 16 Jonathan Abbey 2014-01-05 03:31:07 UTC
This is still happening with NetworkManager-0.9.9.0-22.git20131003.fc20.x86_64.  I'm using a DLink 825 configured to use a Hurricane Electric IPv6 tunnel, for what it's worth.

Comment 17 robert.blacquiere 2014-01-05 17:31:02 UTC
On Fedora 20 with also IPv6 via Hurricane Electric tunnel this is also happening could this be more related to HE ipv6?  Version: NetworkManager.x86_64           1:0.9.9.0-22.git20131003.fc20

Regards

Robert

Comment 18 Dan Williams 2014-01-08 21:45:42 UTC
Can anyone with a Hurricane tunnel get a wireshark capture of the Router Advertisement for me?  Also, can you grab log messages from the journal from before the problem starts so I can see what's going on?

Comment 19 Jiri Popelka 2014-01-20 14:25:31 UTC
In my case this IPv6 router solicitation loop *started* with NetworkManager-0.9.9.0-22.git20131003.fc20.x86_64 (0.9.9.0-21 is OK).

I can see
<info> Activation (em1) Stage 5 of 5 (IPv6 Commit) scheduled...
<info> Activation (em1) Stage 5 of 5 (IPv6 Commit) started...
<info> Activation (em1) Stage 5 of 5 (IPv6 Commit) complete.
repeating every 3s.

Strangely I don't see any RS/RA with my ll address in wireshark.

Comment 20 Dan Williams 2014-01-20 18:43:02 UTC
(In reply to Jiri Popelka from comment #19)
> In my case this IPv6 router solicitation loop *started* with
> NetworkManager-0.9.9.0-22.git20131003.fc20.x86_64 (0.9.9.0-21 is OK).
> 
> I can see
> <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) scheduled...
> <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) started...
> <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) complete.
> repeating every 3s.
> 
> Strangely I don't see any RS/RA with my ll address in wireshark.

This would be happening when the kernel makes a change to the routing table, this is NetworkManager reading the kernel configuration and updating its internal structures.  These messages should likely be suppressed for external changes.

Comment 21 hansvon 2014-01-23 13:00:37 UTC
I also see this kind of message, every 5-10s in my case.

NetworkManager[605]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) scheduled...
NetworkManager[605]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) started...
NetworkManager[605]: <error> [1390480120.831917] [platform/nm-linux-platform.c:1127] add_object(): Netlink error: Invalid address for specified address family
NetworkManager[605]: :: inet6 dev 2 scope nowhere
NetworkManager[605]: valid-lifetime forever preferred-lifetime forever
NetworkManager[605]: [54B blob data]
NetworkManager[605]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit) complete.


rpm -q NetworkManager
NetworkManager-0.9.9.0-26.git20131003.fc20.x86_64

Comment 22 Dan Williams 2014-01-24 19:39:27 UTC
(In reply to hansvon from comment #21)
> I also see this kind of message, every 5-10s in my case.
> 
> NetworkManager[605]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit)
> scheduled...
> NetworkManager[605]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit)
> started...
> NetworkManager[605]: <error> [1390480120.831917]
> [platform/nm-linux-platform.c:1127] add_object(): Netlink error: Invalid
> address for specified address family
> NetworkManager[605]: :: inet6 dev 2 scope nowhere
> NetworkManager[605]: valid-lifetime forever preferred-lifetime forever
> NetworkManager[605]: [54B blob data]
> NetworkManager[605]: <info> Activation (em1) Stage 5 of 5 (IPv6 Commit)
> complete.
> 
> 
> rpm -q NetworkManager
> NetworkManager-0.9.9.0-26.git20131003.fc20.x86_64

This should be covered by bug 1048046 and is fixed upstream.

Comment 23 hansvon 2014-01-27 09:36:29 UTC
Thank you, I confirm, the patch fixes the "Netlink" error. For now, I also commented the "nm_log_info" calls to get rid of the three "Stage 5 of 5 (IPv6 Commit)" messages.

Comment 24 Dan Williams 2014-01-27 21:28:47 UTC
(In reply to hansvon from comment #23)
> Thank you, I confirm, the patch fixes the "Netlink" error. For now, I also
> commented the "nm_log_info" calls to get rid of the three "Stage 5 of 5
> (IPv6 Commit)" messages.

Those "Stage 5 of 5" messages are also now suppressed upstream and should make it to an F20 build at some point here.

Comment 25 Fedora Update System 2014-01-28 15:23:43 UTC
NetworkManager-0.9.9.0-27.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-27.git20131003.fc20

Comment 26 Fedora Update System 2014-01-30 03:41:45 UTC
NetworkManager-0.9.9.0-27.git20131003.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 27 Jiri Popelka 2014-01-30 08:48:40 UTC
(In reply to Dan Williams from comment #24)
> Those "Stage 5 of 5" messages are also now suppressed upstream and should
> make it to an F20 build at some point here.

I can confirm this with NetworkManager-0.9.9.0-27.git20131003.fc20.x86_64
Thanks.


Note You need to log in before you can comment on or make changes to this bug.