Bug 588529 - IPv6 (SLAAC+DHCPv6) interface activation is needlessly "wobbly"
Summary: IPv6 (SLAAC+DHCPv6) interface activation is needlessly "wobbly"
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 14
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Dan Williams
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 538499
TreeView+ depends on / blocked
 
Reported: 2010-05-03 21:27 UTC by Tore Anderson
Modified: 2012-03-15 14:49 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-03-15 14:49:11 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
/var/log/messages (11.27 KB, text/plain)
2010-05-03 21:27 UTC, Tore Anderson
no flags Details
/var/log/messages with debug output from NM (39.56 KB, text/plain)
2011-04-03 11:11 UTC, Tore Anderson
no flags Details
Traffic on the wire (3.03 KB, application/x-pcap)
2011-04-03 11:13 UTC, Tore Anderson
no flags Details
Networking config changes second by second (4.82 KB, text/plain)
2011-04-03 11:16 UTC, Tore Anderson
no flags Details
radvd config file (195 bytes, text/plain)
2011-04-03 11:18 UTC, Tore Anderson
no flags Details
dhcp server config file (344 bytes, text/plain)
2011-04-03 11:18 UTC, Tore Anderson
no flags Details
/var/log/messages from device activation that bounces the device (69.60 KB, text/plain)
2011-04-11 17:34 UTC, Tore Anderson
no flags Details
"while true; do date; ip a show dev wlan0; done" trimmed to only show output at state changes (2.80 KB, text/plain)
2011-07-19 05:11 UTC, Scott Schmit
no flags Details
NetworkManager-specific /var/log/message entries (11.99 KB, text/plain)
2011-07-19 05:13 UTC, Scott Schmit
no flags Details

Description Tore Anderson 2010-05-03 21:27:28 UTC
Created attachment 411134 [details]
/var/log/messages

(NetworkManager-0.8.0-11.git20100503.fc12.x86_64)

When activating an interface with IPv6 service, both SLAAC and DHCPv6, the activation is somewhat "wobbly" - addresses and nameservers gets added, removed, then re-added.  It ends up in a stable state eventually, but it seems to be done in a rather akward way.  The router is set up with the following:

[root@lust ~]# cat /etc/radvd.conf
interface eth0 {
	AdvSendAdvert on;
	AdvOtherConfigFlag on;
	AdvManagedFlag on;
	MaxRtrAdvInterval 30;
	prefix 2001:16d8:ee47::/64 {
		AdvAutonomous on;
	};
	RDNSS 2001:16d8:ee47::5:1aac {};
};
[root@lust ~]# cat /etc/dhcp/dhcpd6.conf
default-lease-time 2592000;
preferred-lifetime 604800;
option dhcp-renewal-time 3600;
option dhcp-rebinding-time 7200;
allow leasequery;
option dhcp6.name-servers  2001:16d8:ee47::d4cb;
option dhcp6.domain-search "fud.no";
option dhcp6.info-refresh-time 21600;
subnet6 2001:16d8:ee47::/64 {
	range6 2001:16d8:ee47::aaaa 2001:16d8:ee47::bbbb;
}

On the host, I did tcpdump, looking for RS/RA, and also did "ip -6 a l dev eth0" and "cat /etc/resolv.conf" every second, saving the result.  This was how the activation happened in chronological order - also see attached /var/log/messages:

Initial state:
- Networking disabled using nm-applet menu, IPv4 and IPv6 state "automatic"
- No IPv6 addresses on eth0
- No nameservers in /etc/resolv.conf

21:09:58:
- Networking enabled using nm-applet menu.

21:09:59:
- fe80::230:1bff:febc:7f23/64 is added to eth0

21:10:01:
- ICMPv6 Router Solicitation & Advertisement
- 2001:16d8:ee47:0:230:1bff:febc:7f23/64 is added to eth0

21:10:02:
- 2001:16d8:ee47:0:230:1bff:febc:7f23/64 is removed from eth0
- IPv4 nameserver (10.0.0.2) is added to resolv.conf

21:10:03:
- 2001:16d8:ee47::bbba/64 is added to eth0

21:10:05:
- 2001:16d8:ee47::5:1aac and 2001:16d8:ee47::d4cb are added to resolv.conf

21:10:17:
- ICMPv6 Router Advertisement
- 2001:16d8:ee47:0:230:1bff:febc:7f23/64 is added to eth0

21:10:19:
- 2001:16d8:ee47::5:1aac and 2001:16d8:ee47::d4cb are removed from resolv.conf

21:10:20:
- 2001:16d8:ee47::5:1aac and 2001:16d8:ee47::d4cb are added to resolv.conf

What I don't think is right here is the removal of the SLAAC address at 21:10:02 and the IPv6 name servers at 21:10:19.

This is not a very important bug, as everything ends up OK in the end (both IPv6 name servers are found in /etc/resolv.conf, and both IPv6 addresses are configured on eth0), but still it seems like it's not working quite the way it should.  It seems that the SLAAC address aren't re-added until a periodic RA arrives, which might be many minutes in between on a normal network, which would cause the network activation to take much longer than necessary to complete 100%.

Tore

Comment 1 Dan Williams 2010-05-04 18:38:57 UTC
Yeah we shouldn't be bouncing addresses if they are valid.  At the moment (like you say) I don't consider this an F-13 blocker so it'll probably get pushed out in a post-freeze update.  Thanks for the great debugging info.

Comment 2 Bug Zapper 2010-11-03 15:43:50 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 3 Bug Zapper 2010-12-03 15:07:59 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 4 Tore Anderson 2010-12-04 20:18:46 UTC
This bug is still present in Fedora 14 / NetworkManager-0.8.1-10.git20100831.fc14.x86_64.  Reopening.

Tore

Comment 5 Scott Schmit 2011-03-20 19:14:59 UTC
Ok, this is annoying. My radvd.conf currently sets the unsolicited RA time to be every 10 minutes (the default), so if I enable dhcpv6 (but don't hand out addresses that way), I don't get a non-link-local IPv6 address for about 10 minutes. Sure, I can turn up the RA rate, but only because I manage this network. What about on a network I don't control?

By the way, Bug 676957 sounds related.

Comment 6 Dan Williams 2011-04-02 14:25:53 UTC
NM should be bouncing the link and that causes the kernel to send out a router solicitation request, which the router should reply to.  Any chance you can wireshark the ethernet interface and check whether this is actually happening?  There's also tweaks in the radvd config to control how often the router solicitation requests are answered, and of course if the solicitations are only answered every 2 minutes and the unsolicited messages are sent every 10, is clearly not going to work well.  That may not be the problem, but it's an option.  But the wireshark traces would help us figure out if NM is not doing what's required to ensure that the solicitation requests go out.

Comment 7 Tore Anderson 2011-04-03 11:09:20 UTC
I tried to reproduce this bug now, and it appears to work much better. The SLAAC-learned IPv6 address doesn't get removed like it did when I originally submitted the bug.

However I did notice that the IPv6 nameservers disappear from /etc/resolv.conf for a second (at 12:18:08 in the debug info I'll send in a second).

Tore

Comment 8 Tore Anderson 2011-04-03 11:11:41 UTC
Created attachment 489637 [details]
/var/log/messages with debug output from NM

Contains the debug output NM printed to /var/log/syslog when the bug was being reproced. I started out in a state with networking disabled.

Comment 9 Tore Anderson 2011-04-03 11:13:33 UTC
Created attachment 489638 [details]
Traffic on the wire

This is the relevant traffic seen on the wire (from the client machine running NM's point of view).

Comment 10 Tore Anderson 2011-04-03 11:16:43 UTC
Created attachment 489639 [details]
Networking config changes second by second

This shows how the output from "ip a l dev eth0" and "cat /etc/resolv.conf" changed second by second, going from an initial state (from before NM was started), until it stabilised in the end.

Especially note the change between 12:18:07 and 12:18:08 - at this point, both IPv6 name servers was removed from /etc/resolv.conf, which has to be a bug.

Comment 11 Tore Anderson 2011-04-03 11:18:21 UTC
Created attachment 489640 [details]
radvd config file

This is the radvd.conf in use on the router at the time

Comment 12 Tore Anderson 2011-04-03 11:18:55 UTC
Created attachment 489641 [details]
dhcp server config file

This is the dhcpd6.conf in use on the router at the time

Comment 13 Dan Williams 2011-04-05 16:50:04 UTC
I think I see what the problem is, and it's going to take a bit to fix up; might not make 0.9/F15 final but it'll certainly get fixed.  The code internally does a remove-then-add, and resolv.conf gets written out both when addresses are removed and when they are added back, so there's a brief window of time where we could have no IPv6 addresses in resolv.conf.

Comment 14 Tore Anderson 2011-04-11 17:34:11 UTC
Created attachment 491287 [details]
/var/log/messages from device activation that bounces the device

I've noticed another case of "wobblyness" when activating a device. The activation appears to finish, but then the device is brought down again and re-activated again. I'm attaching a log of the device being bumped once before it settles, but I have seen it happening twice in a row too.

The bumping happens at 18:59:07. It does log "RA-provided address no longer valid" then, but I cannot see how that can be right. The lowest lifetime found in the RA is 180 seconds, but the bumping happens only nine seconds after the device was enabled in the first place. tcpdump of the RAs on the network in question:

IP6 (hlim 255, next-header ICMPv6 (58) payload length: 80) fe80::208:a1ff:fec9:2381 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 80
	hop limit 64, Flags [managed, other stateful], pref medium, router lifetime 540s, reachable time 0ms, retrans time 0ms
	  prefix info option (3), length 32 (4): 2001:16d8:ee47::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s
	  rdnss option (25), length 24 (3):  lifetime 180s, addr: 2001:16d8:ee47::5:1aac
	  source link-address option (1), length 8 (1): 00:08:a1:c9:23:81

Tore

Comment 15 Scott Schmit 2011-07-15 05:17:25 UTC
I currently have my network configured like so: RA with 2 prefixes (1 ULA, 1 public) and O & M flag set, as well as RDNSS. The advertisement interval is 600 seconds. DHCPv6 hands out a single ULA address & DNS info.

Recently, it seems like about 50% of the time, I get all of my addresses (2 ULA, 1 public), and 50% of the time, I get just the 1 DHCPv6-assigned ULA until my router gets around to announcing prefixes again. (Statistics are gut-feel.)

Looking through my package updates, I can't see anything that pops out as an explanation for the change in behavior.

I'm running with F14, NetworkManager-0.8.4-1.fc14.x86_64. This is a laptop with wireless.

Comment 16 Dan Williams 2011-07-19 03:14:16 UTC
If you're running RA, and the O & M flags are set, nothing on your network should care about the RA because you've told it to get everything from DHCP via the M flag.  Is that what you intend?

Comment 17 Scott Schmit 2011-07-19 05:09:32 UTC
No, it's entirely possible to get addresses from both DHCPv6 and RA SLAAC. All the M flag says is that the host should use stateful DHCPv6 to request addresses. Hosts are also supposed to do SLAAC for each RA prefix that has the autonomous flag set.

That does work, but something NetworkManager is doing is (sometimes, especially from a fresh start) causing the SLAAC addresses to get dropped when the DHCPv6 provided address is added. The interface picks up the advertised addresses again at the next router advertisement, but there's no reason for NetworkManager to drop the addresses acquired after router solicitation on the floor.

Comment 18 Scott Schmit 2011-07-19 05:11:32 UTC
Created attachment 513708 [details]
"while true; do date; ip a show dev wlan0; done" trimmed to only show output at state changes

This shows how the addresses configured change over time.

Comment 19 Scott Schmit 2011-07-19 05:13:34 UTC
Created attachment 513709 [details]
NetworkManager-specific /var/log/message entries

This goes along with the previous attachment.

Comment 20 Tore Anderson 2011-07-23 09:13:35 UTC
(In reply to comment #14)

> I've noticed another case of "wobblyness" when activating a device. The
> activation appears to finish, but then the device is brought down again and
> re-activated again. I'm attaching a log of the device being bumped once before
> it settles, but I have seen it happening twice in a row too.

I've created a separate bug (bug #720188) for this issue.

I've not seen the original issue (as described in comment #0) for quite some time now, so I think that it might have been fixed.

Tore

Comment 21 Dan Williams 2012-03-15 14:49:11 UTC
Yeah, it should be in any 0.9.3+ snapshot for F17.


Note You need to log in before you can comment on or make changes to this bug.