1074171 – IPv6 connections broken in recent kernels.

Bug 1074171 - IPv6 connections broken in recent kernels.

Summary: IPv6 connections broken in recent kernels.

Keywords:
Status:	CLOSED DUPLICATE of bug 1068632
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	20
Hardware:	i686
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Neil Horman
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-03-08 17:05 UTC by Reilly Hall
Modified:	2014-03-16 11:49 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-03-16 11:49:59 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
IPv6 RS/RA exchange (17.98 MB, application/vnd.tcpdump.pcap) 2014-03-11 01:59 UTC, Reilly Hall	no flags	Details
sosreport (7.00 MB, application/x-xz) 2014-03-11 02:23 UTC, Reilly Hall	no flags	Details
First half of failed IPv6 SCP transfer (19.07 MB, application/octet-stream) 2014-03-11 02:31 UTC, Reilly Hall	no flags	Details
Second half of failed IPv6 SCP transfer (4.85 MB, application/octet-stream) 2014-03-11 02:33 UTC, Reilly Hall	no flags	Details
Changed MAC address still fails. (33.64 KB, application/vnd.tcpdump.pcap) 2014-03-12 03:48 UTC, Reilly Hall	no flags	Details
Short SCP transfer under kernel-3.11.10-301.fc20.i686 (13.41 KB, application/vnd.tcpdump.pcap) 2014-03-12 03:58 UTC, Reilly Hall	no flags	Details
View All

Description Reilly Hall 2014-03-08 17:05:33 UTC

Description of problem:
Recently updated to the kernel-3.15.5-202 kernel and realized on my laptop's wireless connection I cannot use anything that is IPv6 based as the connection starts and then dies within a minute. SSH shell sessions stop responding to keystrokes and within a couple minutes times out with a "Timeout, server pe860 not responding." error message. NFS over IPv6 mounts, but becomes unresponsive almost immediately and brings the machine to it's knees with "Mar 2 21:32:44 ThinkPad-T43 kernel: [ 5536.801176] nfs: server pe860 not responding, still trying"

Version-Release number of selected component (if applicable):
Definitely noticed with the latest kernel-3.13.5-202.fc20.i686+PAE
Also tested with 3.13.5-200 as well as 3.13.4-200
Was only able to install 3.11.10-301 where the problem does not exhibit itself.

How reproducible:
Boot into recent kernel on an IPv6 enabled LAN or WLAN and attempt to use any connections using IPv6...SSH, SCP, NFS, iperf -V anything especially to machines on a local LAN not through a router. Connections to another Fedora 20 3.13.5-202 machine die, as do connections to a FreeBSD 10 machine...both 64bit and on wired LAN connections. Same connections over IPv4 run flawlessly. Boot into a much older 3.11.10-301 kernel and IPv6 runs fine again.

A tcpdump only shows a sequence of successful packet interchange, and then sudden TCP retries with no other indication as to why.

Steps to Reproduce:
1. Boot recent 3.13.x kernel and test live or long IPv6 based transfers.
2. Watch as your live session dies or transfer stalls.
3. Connection lost, but new connections can be instantly made (no actual loss of connectivity).

Actual results:
IPv6 connections on my 32bit laptop on recent kernels are nearly useless.

Expected results:
Ipv6 connections, on an otherwise very stable LAN without even going through a router should run flawlessly for hours and not die within minutes with no real error message.

Additional info:
I have not yet pinned down between which 2 kernel versions the issue started happening and cannot yet explain neither why the connections fail (no obvious errors seen in tcpdump/wireshark) nor why my desktop running the same kernel appears unaffected just because it's 64bit and wired.

Comment 1 Reilly Hall 2014-03-08 17:16:45 UTC

I can supply tcpdumps from both the client (laptop) perspective and the server (FreeBSD server).  To help diagnose this issue.

Comment 2 Reilly Hall 2014-03-08 17:23:09 UTC

I'm also now noticing that IPv6 SSH session to my OpenBSD router is not dying with a timeout.  It is running OpenBSD 5.4 32bit.  Only other thing is that it's the device offering up the RAs to the LAN.  Unsure if that somehow can be affecting things as the other devices are merely IPv6 clients on the LAN.

Comment 3 Reilly Hall 2014-03-08 17:38:43 UTC

Ok now it gets even more bizarre, IPv6 connections over Link-Local addresses don't seem to be dying.  I don't understand why though.

Comment 4 Michele Baldessari 2014-03-09 20:48:18 UTC

Does this happen with 3.12.X as well or does it work in this case?

Comment 5 Reilly Hall 2014-03-10 02:06:49 UTC

Ok I finally found where I could download older kernels (koji?) and found out that many of the 3.12.x kernels ARE affected, but it appears that 3.12.8-300.fc20.i686+PAE is NOT affected.  I was able to download a full 800MB+ file that passed checksum and all via SCP.  I boot back into 3.12.9-301.fc20.i686+PAE and confirm that I can't get the download to proceed much more than somewhere around 47MB (it varies a bit but hasn't made it to 100MB yet before failing so it's fairly quick but that it will fail is consistent).

So the 3.12.x line is affected, but it appears to have started somewhere between 3.12.8 and 3.12.9.  Hope this helps!  Also final detail, this only seems to be affecting accessing machines on the local LAN that are all provisioned public IPv6 addresses via SLAAC autoconfiguration via Router Advertisements.  However accessing the same machines via IPv6 link local addresses does NOT fail.  Even more curious, accessing the router via it's public IPv6 address and Internet addresses (that require traversing my router) also do NOT fail.  This only appears to be affecting connections to other servers on the local LAN, which might explain why I hadn't noticed this sooner.

Comment 6 Neil Horman 2014-03-10 17:35:20 UTC

Please send in a tcpdump that includes an RS/RA exchange on your subnet, and a failed transfer.  A sosreport taken before the failure would be good too please.

Comment 7 Reilly Hall 2014-03-11 01:59:09 UTC

Created attachment 872930 [details]
IPv6 RS/RA exchange

This is the tcpdump of when I bring the interface up under kernel-PAE-3.12.9-301.fc20.i686.  It shows the IPv6 initialization of the interface including some quick interchanges as some background programs detect a live Internet connection.

Comment 8 Reilly Hall 2014-03-11 02:12:20 UTC

This is disappointing, I got lucky the transfer failed earlier this time than usual 22MB rather than 47MB I remember last time I tried it.  But I was just alerted that I cannot upload files larger than 20MB, 2MB too large.  Am I allowed to upload it to one of those file hosting sites and if so any suggestions (I have never needed to use them before).

Also what is an sosreport so that I can try to take one?

Comment 9 Reilly Hall 2014-03-11 02:23:41 UTC

Created attachment 872933 [details]
sosreport

Never mind, I found out what sosreport is, and ran the command.  Here is the file that it generated.

Comment 10 Reilly Hall 2014-03-11 02:31:28 UTC

Created attachment 872934 [details]
First half of failed IPv6 SCP transfer

I just decided to split the file. Hope this works.  This is the first half of the file.

Comment 11 Reilly Hall 2014-03-11 02:33:50 UTC

Created attachment 872935 [details]
Second half of failed IPv6 SCP transfer

Just concatenate the files together.

Comment 12 Neil Horman 2014-03-11 13:47:02 UTC

Hmm, interesting. I'm not sure your system is at fault on this issue. Looking at your tcpdump I see the following points of interest:

1) Early on in the connection (example: frame 18), we get lots of redirect messages indicating that the router at 00:1b:21:4e:d4:a2 shouldn't be used, and 00:15:c5:f5:0d:b4 should be used in its place. It takes a few responses, but your client eventually starts using the right router.

2) Everything looks ok until frame 1471. This frame is sent from the remote system fef5:db4 to the loacal system fe89:dcf3. It is a valid frame, but its sequence number indicates some prior data was lost in flight. The local systems response to the remote system with a duplicate ACK of the last sequence number it received (this is correct behavior).

3) Normally, a duplicate ACK is sufficient to fix this periodic lost data, but the subseuqent frame 1473 is another frame from the peer fef5:db4 with yet another skipped sequence number, causing another dupilcate ACK.

4) Frame 1475 finally shows that a frame that overlaps the succesfully received sequence number in part is received.

This entire pattern continues for several hundered frames until frame 1900 where the duplicate ACK messages finally get through and the missing segment is transmitted, and things get back to normal for a period, where the redirected server needs to re-established and transmission resumes normally

The same pattern occurs again at frame 2923, and several other points in the tcpdump until transmission is finally terminated when the client gives up.

The overall pattern here is that its not so much ipv6 thats failing, but something at your wireless router (or whatever routing device it is that has the mac address 00:15:c5:f5:0d:b4. Based on the above notes, frames are getting through, just not consistently, and for long periods of time are getting discarded. As to why ipv6 is affected more than ipv4 I'm not sure, but the fact that you're loosing bits of received data here, and getting others suggests it not your client system at fault here.

I would look at your router with the mac address 00:15:c5:f5:0d:b4. Look at its drop stats, and possibly try swapping the device out to see if you get better wireless reception.

Comment 13 Reilly Hall 2014-03-11 15:00:44 UTC

Thank you for clarifying what I too was seeing in the tcpdumps. However I regret to inform that you misunderstood the setup.

The machine with the MAC address of 00:13:ce:89:dc:f3 is the affected client machine running Fedora 20 kernel 3.12.8-300.fc20.i686+PAE, yes it is a wireless connection.

The machine with MAC address of 00:15:c5:f5:0d:b4 is the FreeBSD server on the same LAN segment from which I attempt the SCP transfer that fails (or any other data transfer).

The machine with MAC address of 00:1b:21:4e:d4:a2 is my OpenBSD router that acts as both the IPv4 NAT gateway and IPv6 router for my LAN to the public Internet. Accessing it for some reason from the client via both the public IPv6 and the link-local addresses does not fail.

The machine with MAC address of 00:30:1b:bc:c6:8e is my Fedora 20 desktop on a wired connection running kernel 3.13.5-202.fc20.x86_64. SCP transfers or SSH shell sessions to it from the machine with MAC ending in dc:f3 also fail to this machine on the same LAN segment in the same exact manner. But curiously, again, not when using link-local addresses, only the auto-configured publicly routable IPv6 addresses.

I do not have any other machines on my LAN that are IPv6 aware other than my smart phones and my wife's Windows 8.1 laptop. They are unaffected trying to access any systems via IPv6 over the same wireless that the affected client machine uses.

Again what I find the most curious, if this was a simple case of packet loss due to the nature of wireless, this should be consistent no matter what kernel version I boot up in, as well as not matter whether I am using global or link-local addresses, IPv4 or IPv6. It should just cause delays or slow downs in the transfer not complete failure.

And remember, booted up in the affected kernel I can transfer the same 800MB+ file at full line (or wireless) speed no problems over IPv4, or via IPv6 link local. Something with the way it is handled over the global address even though it is on the same LAN segment and does not need to traverse a router is messing it up.

I will analyze the tcpdumps further in a bit to see if I missed something last night.

Comment 14 Reilly Hall 2014-03-11 15:18:44 UTC

Ok, I see what it's doing.

For some reason, the on-link aspect or nature of the addresses assigned via stateless auto-configuration are broken after kernel 3.12.8

Refer to the packet dump broken_ipv6-2.pcap

And look starting at frame 13 where the client attempts a syn to the server on the same local subnet.

I now see what it's doing wrong.  And why this isn't affected on IPv4 or link-local IPv6 addresses.

Frame 13 shows the connection from the client:
MAC 00:13:ce:89:dc:f3
IPv6 2601:3:600:18b:213:ceff:fe89:dcf3

to server:
MAC 00:1b:21:4e:d4:a2
IPv6 2601:3:600:18b:215:c5ff:fef5:db4

The MAC address of the destination is wrong, should be 00:15:c5:f5:0d:b4.
Also, there's no record of a neighbor solicitation so that the client can know where the destination is, it assumes the destination must traverse the router, which the router finds confusing, as it's internal routing tables tells it the destination address is on the same port/subnet that the source is on.

Please confirm you see that behavior as well as I do.

Comment 15 Reilly Hall 2014-03-11 15:21:12 UTC

The MAC of 00:1b:21:4e:d4:a2 is the default gateway, but that should not be used if the two machines are on the same /64 LAN segment, or am I wrong?

Comment 16 Neil Horman 2014-03-11 17:29:48 UTC

I disagree that this is a SLAAC problem, given that theres no evidence for it in the tcpdump. That said however, there does seem to be some odd behavior in the linux stack based on your description above about the hosts in play here.

First to answer your question, typically you are correct in that the default gateway should not be hit for local addresses. As of 3.12-rc3 however that changed (see commit 550bab42f83308c9d6ab04a980cc4333cef1c8fa). It seems as of that point we always lookup the nexthop and rely on icmpv6 erdirects to steer us to the appropriate host on the local subnet. That seems silly to me, and I'll look into why we're doing that, though it shouldn't in and of itself cause this problem.

Add to that the tcpdump - specifically in frames 1630, 1631, and 1632. When you mentioned no neighbor solicits, I looked myself, and I did notice something. There is a solicitation for fe89:dcf3 from 2601:3:600:18b::1 (the default gateway, based on the mac address). The solicitation gets responded to properly by the host with an NA, but immediately after that in frame 1632, the fe89:dcf3 host starts using the default gateway mac address again. Whats really wierd is that it doesn't precede or follow lost frame behavior, so I can't really draw a clear relationship between the two events.

Is it possible that the fe89:dcf3 host (with the mac address 00:13:ce:89:dc:f3) has a doppelganger on the network? it doesn't clearly explain why ipv4 is unaffected (unless the host using the same mac is an ipv6 only device). And the behavior in regards to to the tcpdump explicitly would be explained by a duplicate mac messing with the arp tables on the switches between the two hosts.

If you could please, while I look into the reasoning behind that commit above and the odd neighbor lookup behavior after the neighbor soliciation above, could you please try something:

1) Assign an LAA to the fe89:dcf3 host. It will change the SLAAC address of the system of course, but you can either add the global address by hand, or you can update your testing to accommodate the new address. It will be interesting to see if that clears up the problem (I am assuming here that transfers to/from other hosts on the wireless network operate properly). If the problem stops manifesting we can be confident that you have a duplicate MAC somewhere

Comment 17 Reilly Hall 2014-03-11 17:57:57 UTC

Ok, I see what your're saying, I did not mean to say SLAAC is the problem or that there is a problem with the SLAAC implementation. But yes, the need to go to the nexthop for every address seems stupid at first glance unless there's a good reason for it that I just cannot think of. In any case, the ICMPv6 redirect if properly supported by every OS implementing IPv6 would mean this shouldn't be a problem. But looking at the tcpdump, either OpenBSD doesn't anticipate this behavior in their IPv6 stack or the client machine isn't respecting the redirect for several packets (I see they seem to be ignored for quite a while and then apparently out of nowhere the client respects them and starts communicating directly with the server without going through the router). But I still think it's inefficient and I much prefer the IPv4 like behavior of systems communicating on the same subnet together without the need of first hitting the default gateway (as in my opinion that just puts unnecessary extra load on the router even if negligible).

I would like to perform the test you're asking me, but I apologize, I'm not sure what you meant by assigning an 'LAA'? What is that? I'm not familiar with that particular acronym.

Also you don't have to assume, I have already confirmed other machines on the wireless are able to perform IPv6 transfers without any hiccups (these being Android smartphones as well as a Windows 8.1 laptop). Also I have to disagree with the duplicate MAC issue as if that were the case, wouldn't IPv4 and link-local IPv6 addresses still be affected? These have been confirmed unaffected. I don't have that many machines on my network and I highly doubt a duplicate MAC issue, but I am willing to test as soon as I find out what you mean by assigning an LAA.

Thanks!

Comment 18 Reilly Hall 2014-03-11 18:17:51 UTC

Oh I see what you mean, a "Locally Administered Address" to change the MAC address to see if it's a duplicate MAC issue.  Let me see if I can do that when I get home and run that test if I can figure out how to change the MAC.

Comment 19 Neil Horman 2014-03-11 20:11:59 UTC

yes, locally administered mac is what I meant by that, sorry.  I understand your disbelief regarding this as a possibility, but its an easy thing to test (you can use the ip link command to do so), and it seems to at least in part fit the tcpdump, and might give us another datapoint

The latency in the recognition of the redirect I think is expected, due to the fact that by the time the redirect is processed several frames have already been queued to the hardware.

I'll keep digging as to why we do the silly nexthop thing.

Comment 20 Reilly Hall 2014-03-12 03:48:50 UTC

Created attachment 873281 [details]
Changed MAC address still fails.

I changed the MAC address to an obviously bogus 00:11:22:33:44:55.  This time rather than do an SCP, I just did an SSH and ran top with in the shell session.  I also did it to my Fedora desktop this time rather than my FreeBSD server.

Still failed within a minute or two.

Comment 21 Reilly Hall 2014-03-12 03:58:06 UTC

Created attachment 873283 [details]
Short SCP transfer under kernel-3.11.10-301.fc20.i686

This is a short SCP transfer under kernel-3.11.10-301.fc20.i686 where all my on-link sessions live indefinitely and I can complete whole SCP transfers.

The only thing I see immediately different are the use of neighbor solicitations to connect directly to on-link (same subnet) hosts rather than using nexthop by default and relying on ICMPv6 redirects.

The previous test of changing the MAC address proved it was not a duplicate MAC.  And booting back into kernel-3.11.10 fixes the issue consistently, that strongly leads me to believe it cannot be a duplicate MAC or packet loss on my wireless.  Unless booting into the newer kernel is causing packets to be dropped at the interface for some unknown reason.

Comment 22 Neil Horman 2014-03-12 10:57:27 UTC

I agree, not a duplicate MAC, and by your description, somehow closely related to the fact that on later kernels we redirect from the router address to local subnet hosts.  The question of course is, why?  It still seems like a really dumb thing to do for local subnet hosts, but I'd like to have a root cause for why it causes this lost connection behavior on your system before posting a fix.

In this latest trace it appears that around frame 101, the linux hosts just goes back to using the router mac address, creating the need for further redirects, and never recognizes the need to update its mac address, which happens shortly before the connection starts to fail.  At least the behavior is in order here.

I'll look into this more today.

Comment 23 Neil Horman 2014-03-12 11:17:15 UTC

Can you please do me a favor and post the results of this command:

ip -6 neigh show

from your fe33:4455 system (the one initiating the scp transfer).  After the failure occurs?

Comment 24 Neil Horman 2014-03-12 18:05:09 UTC

additinoally, you may want to take a look at bug 1068632 - I was discussnig this problem with a co-worker today and he noted that he saw simmilar behavior, and apparently NetworkManager has fixed this by adding an explicit subnet route when an interface has an address assigned to it.  I don't really agree in the correctness of that, but it seems like it should work.  You may want to update your NetworkManager package to the one in the testing repository so you can try the fix from that bug.

Comment 25 Reilly Hall 2014-03-12 19:39:22 UTC

Oh...reading that other bug I just looked and now I see something.  I don't know if this matters, but remember how I mentioned that it was very puzzling to me that my laptop appeared to be the only affected system and that my desktop running the exact same version of Fedora software was not experiencing this issue.  Originally I thought maybe there was a bug in the i686 builds as my laptop is not 64bit capable and my desktop runs x86_64.

But now reading bug 1068632 I decided to look at both my computers and realized something.  My desktop's NetworkManager config for IPv6 is set to Ignore, while my laptop has it set to Automatic.  I did not know that.  Would that somehow be the issue?  But why would booting a kernel 3.12.8 fix the issue and any kernel after  and including 3.12.9 cause the bad behavior?

I would like your insight on that.  Once I get home I will do get the ip -6 neigh show on both systems to compare as well as a ip -6 route show.

Comment 26 Neil Horman 2014-03-12 20:11:25 UTC

I've never been sure as to the difference between NM's Ignore and Automatic settings, as SLAAC is controlled mostly by the kernel anyway, making those two settings largely synonymous.  I'm guessing a kernel update changes a kernel behavior that NetworkManager simply ignores.  Either way, I think its prudent to show the neighbor table, and to try upgrading NM on your affected systems to the one in the testing repository.

Comment 27 Reilly Hall 2014-03-13 01:40:07 UTC

Just after the link is up:
% ip -6 neigh show
fe80::16d6:4dff:fe27:4ffa dev wlp4s2 lladdr 14:d6:4d:27:4f:fa router STALE
fe80::21b:21ff:fe4e:d4a2 dev wlp4s2 lladdr 00:1b:21:4e:d4:a2 router STALE

Just after initiating a connection to my Fedora desktop
% ip -6 neigh show
2601:3:600:18b:230:1bff:febc:c68e dev wlp4s2 lladdr 00:30:1b:bc:c6:8e REACHABLE
fe80::16d6:4dff:fe27:4ffa dev wlp4s2 lladdr 14:d6:4d:27:4f:fa router STALE
fe80::21b:21ff:fe4e:d4a2 dev wlp4s2 lladdr 00:1b:21:4e:d4:a2 router REACHABLE
2601:3:600:18b::1 dev wlp4s2 lladdr 00:1b:21:4e:d4:a2 DELAY

Route just after the link is up:
% ip -6 route show
fe80::/64 dev wlp4s2  proto kernel  metric 256 
default via fe80::21b:21ff:fe4e:d4a2 dev wlp4s2  proto static  metric 1024 

Shortly after initiating the session:
% ip -6 route show
2601:3:600:18b:230:1bff:febc:c68e dev wlp4s2  metric 0 
    cache 
fe80::/64 dev wlp4s2  proto kernel  metric 256 
default via fe80::21b:21ff:fe4e:d4a2 dev wlp4s2  proto static  metric 1024 

I do not see the route that should be there for the local subnet.
On my desktop that is curiously not exhibiting this behavior I can see the route:
% ip -6 route show
2601:3:600:18b:213:ceff:fe89:dcf3 dev p17p1  metric 0 
    cache 
2601:3:600:18b::/64 dev p17p1  proto kernel  metric 256  expires 2591686sec
fe80::/64 dev p17p1  proto kernel  metric 256 
default via fe80::21b:21ff:fe4e:d4a2 dev p17p1  proto ra  metric 1024  expires 1486sec

Comment 28 Neil Horman 2014-03-13 10:54:23 UTC

Then you definitely need to try upgrading your network manager package.  The fix was in networkmanager, and it was to add that missing route.

Comment 29 Reilly Hall 2014-03-15 01:13:31 UTC

Ok I'm sorry I didn't get back to you yesterday.

I decided to set my laptop's wireless profile for my network at home the same as my desktop with the IPv6 setting set to "Ignored".

Suddenly the problem is gone, observe: 

% ip -6 neigh show
fe80::21b:21ff:fe4e:d4a2 dev wlp4s2 lladdr 00:1b:21:4e:d4:a2 router STALE

% ip -6 route show
2601:3:600:18b::/64 dev wlp4s2  proto kernel  metric 256  expires 2147455sec
fe80::/64 dev wlp4s2  proto kernel  metric 256 
default via fe80::21b:21ff:fe4e:d4a2 dev wlp4s2  proto ra  metric 1024  expires 1772sec

% uname -a
Linux ThinkPad-T43 3.13.6-200.fc20.i686+PAE #1 SMP Fri Mar 7 17:17:53 UTC 2014 i686 i686 i386 GNU/Linux

After initiating a connection with my FreeBSD server again...this time it's not timing out either.

% ip -6 neigh show                               
fe80::16d6:4dff:fe27:4ffa dev wlp4s2 lladdr 14:d6:4d:27:4f:fa router STALE
fe80::21b:21ff:fe4e:d4a2 dev wlp4s2 lladdr 00:1b:21:4e:d4:a2 router STALE
2601:3:600:18b:215:c5ff:fef5:db4 dev wlp4s2 lladdr 00:15:c5:f5:0d:b4 REACHABLE

% ip -6 route show
2601:3:600:18b:215:c5ff:fef5:db4 dev wlp4s2  metric 0 
    cache 
2601:3:600:18b::/64 dev wlp4s2  proto kernel  metric 256  expires 2147334sec
fe80::/64 dev wlp4s2  proto kernel  metric 256 
default via fe80::21b:21ff:fe4e:d4a2 dev wlp4s2  proto ra  metric 1024  expires 1651sec

Keep in mind I have NOT upgraded my NetworkManager to the beta version mentioned in bug 1068632.  I wanted to first test the result of just setting it to Ignore vs Automatic to see what would happen.  And I am pleasantly surprised to find that was sufficient to fix my issue.  Bizarre but I don't care.

What's still odd, is that all I did was set the setting to "Ignore".  So I'm assuming NetworkManager does nothing with regards to the setting up of the IPv6 addresses and routes now.  Even the output of the route show command leads me to believe the kernel is handling everything as the route says "proto kernel" and "proto ra" for the default route rather than "proto static".

So I must not be understanding the relationship between the kernel version and NetworkManager version.  Why having it set to Automatic under earlier kernels allowed it to work and then suddenly if the change was in the kernel, why now that NetworkManager is disabled for IPv6 the problem goes away when you let the kernel do all the work of setting up the IPv6 address and route?  Presumably if the kernel is responsible for setting up the address and routes from an RA, and the behavior changed after 3.12.9, why does it look like NetworkManager is now at fault for not respecting the settings in the RA?

I mean as long as setting NM to Ignore fixes it and the kernel still respects the RA settings exactly as specified, I don't care, just curious for how I should setup future systems or if leaving it to Ignore will break it in the future if "Automatic" will one day be required the setup of IPv6 at all.

Comment 30 Neil Horman 2014-03-15 14:38:06 UTC

The thing to do is check /proc/sys/ipv6/conf/<ifname>/* before and after setting the interface to Ingore vs Automatic.  See if NM changes any of the settings there or just leaves them all untouched.  That will give you a clue as to what the intended behavior is.

I'd still recommend upgrading NetworkManager as you should be able to set the ipv6 setting to automatic and have it work

Comment 31 Reilly Hall 2014-03-15 19:28:24 UTC

Ok, I just got the version of NetworkManager from the build of bug 106863 installed and set the IPv6 config to Automatic again.

It's working and it should be noted that I believe NetworkManager is the one handling of the RA acceptance as the routes from an %ip -6 route show now indicate proto static for both the local subnet as well as the default route rather than proto kernel and proto ra respectively.

This works for me.  I suppose there isn't really a problem in the kernel's handling of RAs and plumbing up of IPv6 addresses and routes when NM is not involved.  So I'm guessing we can close this bug out as not actually a bug.  I will wait for your opinion before doing so incase you would like some further testing.

Comment 32 Neil Horman 2014-03-16 11:49:59 UTC

I agree that the plumbing path for handling RA's shouldn't have to rely on a particular user space program, but it is what it is.  I'll close this as a dup of bug 1068632, and look into making the kernel path do the right thing on my own time.  Thanks!

*** This bug has been marked as a duplicate of bug 1068632 ***

Note You need to log in before you can comment on or make changes to this bug.