Bug 1068632 - nm doesn't set up ipv6 routing properly
Summary: nm doesn't set up ipv6 routing properly
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: NetworkManager
Version: 20
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Thomas Haller
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 1069421 1074171 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-21 14:45 UTC by Jeff Layton
Modified: 2014-06-18 07:43 UTC (History)
11 users (show)

Fixed In Version: NetworkManager-0.9.9.0-32.git20131003.fc20
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-03-21 09:35:08 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
nm debug log (498.08 KB, text/plain)
2014-02-21 15:12 UTC, Jeff Layton
no flags Details
journalctl --since today (1.71 MB, text/plain)
2014-02-21 15:19 UTC, Jeff Layton
no flags Details
pcap file of ipv6 RA (206 bytes, application/cap)
2014-03-03 12:12 UTC, Jeff Layton
no flags Details
journalctl --since 20:30 (476.48 KB, text/plain)
2014-03-04 01:39 UTC, Jeff Layton
no flags Details

Description Jeff Layton 2014-02-21 14:45:45 UTC
NetworkManager-0.9.9.0-30.git20131003.fc20 seems to have broken ipv6 routing. When the machine boots, it gets an address via SLAAC:

[jlayton@tlielax ~]$ ip -6 addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 fe80::3a60:77ff:fe93:a95d/64 scope link 
       valid_lft forever preferred_lft forever
3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 
    inet6 2001:470:8:d63:3a60:77ff:fe93:a95d/64 scope global dynamic 
       valid_lft 86356sec preferred_lft 86356sec
    inet6 fe80::18f9:77ff:fea4:a6fd/64 scope link 
       valid_lft forever preferred_lft forever

...but then doesn't set up the routing properly:

[jlayton@tlielax ~]$ ip -6 route show
fe80::/64 dev br0  proto kernel  metric 256 
fe80::/64 dev em1  proto kernel  metric 256 
default via fe80::224:a5ff:fed7:8dfa dev br0  proto static  metric 1024 

...note that everything is going to the router, even things that are on the local subnet. The upshot is that I can reach things that need to cross the router, but the router throws back "port unreachable" errors when I try to access anything on the local network segment.

Comment 1 Jeff Layton 2014-02-21 14:48:51 UTC
I should note that downgrading to libnl3-3.2.21-2.fc20.x86_64 and NetworkManager-0.9.9.0-28.git20131003.fc20.x86_64 fixes the issue.

Comment 2 Jeff Layton 2014-02-21 15:12:56 UTC
Created attachment 866051 [details]
nm debug log

Here's the debug log that Thomas requested (grabbed via journalctl --since 10:00).

Comment 3 Jeff Layton 2014-02-21 15:19:11 UTC
Created attachment 866053 [details]
journalctl --since today

Comment 4 Thomas Haller 2014-02-24 18:49:14 UTC
In the logfile it can be seen that you get a route to 2001:470:8:d63::/64 from RA.
Unfortunately, it does not get logged whether the route has a gateway or not.


According to your report, you would expect that the prefix 2001:470:8:d63::/64 has no gateway and is directly reachable on the link. Could you please run radvdump (or tcpdump) and post the RA, so that we see what actually gets advertised?

Because *iff* RA advertises "2001:470:8:d63::/64 via fe80::224:a5ff:fed7:8dfa", then the behaviour would be correct and as expected, because NM does not configure a route, if there is already a more general route (in your case (::/0 via fe80::224:a5ff:fed7:8dfa)

In that case, it would be only interesting, why it works for you with the older version (which it shouldn't).

Comment 5 Thomas Haller 2014-02-24 18:52:17 UTC
(In reply to Thomas Haller from comment #4)

btw. you don't have to do anything with NM, just get the output of radvdump ... provided that your router still announces the same routes

Comment 6 Jeff Layton 2014-02-25 00:54:51 UTC
Unfortunately, I'm travelling this week and won't be able to test this until I return. I'll be back next week though, so I can test it then.

Comment 7 Jeff Layton 2014-02-25 01:06:14 UTC
(In reply to Thomas Haller from comment #4)

> Because *iff* RA advertises "2001:470:8:d63::/64 via
> fe80::224:a5ff:fed7:8dfa", then the behaviour would be correct and as
> expected, because NM does not configure a route, if there is already a more
> general route (in your case (::/0 via fe80::224:a5ff:fed7:8dfa)
> 
> In that case, it would be only interesting, why it works for you with the
> older version (which it shouldn't).

Well, I'm pretty sure the RA's are going out from fe80::224:a5ff:fed7:8dfa. That's the router's address.

...and now that I have reread this, I have to disagree with what you're saying.

This behavior is not correct. The default route via fe80::224:a5ff:fed7:8dfa is not a usable route for hosts on the same network segment as the source host.

We don't want to bounce traffic off the router when we can just contact the host directly. Even if we wanted to, the router doesn't allow that anyway (I assume due to some sort of rpfiltering).

There should be a route something like this being set:

    2001:470:8:d63::/64 dev br0  proto kernel  metric 256  expires 86139sec

...and indeed, older versions of NM do that properly.

Comment 8 Thomas Haller 2014-02-25 12:22:38 UTC
(In reply to Jeff Layton from comment #7)
> (In reply to Thomas Haller from comment #4)
> 
> > Because *iff* RA advertises "2001:470:8:d63::/64 via
> > fe80::224:a5ff:fed7:8dfa", then the behaviour would be correct and as
> > expected, because NM does not configure a route, if there is already a more
> > general route (in your case (::/0 via fe80::224:a5ff:fed7:8dfa)
> > 
> > In that case, it would be only interesting, why it works for you with the
> > older version (which it shouldn't).
> 
> Well, I'm pretty sure the RA's are going out from fe80::224:a5ff:fed7:8dfa.
> That's the router's address.
> 
> ...and now that I have reread this, I have to disagree with what you're
> saying.
> 
> This behavior is not correct. The default route via fe80::224:a5ff:fed7:8dfa
> is not a usable route for hosts on the same network segment as the source
> host.
> 
> We don't want to bounce traffic off the router when we can just contact the
> host directly. Even if we wanted to, the router doesn't allow that anyway (I
> assume due to some sort of rpfiltering).

Normally yes, but router advertisements can have the property "on-link=false", which means, that the prefix is not reachable directly on the interface.

Comment 9 Jeff Layton 2014-03-03 12:12:32 UTC
Created attachment 869906 [details]
pcap file of ipv6 RA

Here is a pcap file that contains a RA. Looks like the on-link flag == true...

Comment 10 Jeff Layton 2014-03-03 12:17:55 UTC
...and here is the radvdump output:

$ sudo radvdump 
#
# radvd configuration generated by radvdump 1.9.2
# based on Router Advertisement from fe80::224:a5ff:fed7:8dfa
# received by interface br0
#

interface br0
{
	AdvSendAdvert on;
	# Note: {Min,Max}RtrAdvInterval cannot be obtained with radvdump
	AdvManagedFlag off;
	AdvOtherConfigFlag off;
	AdvReachableTime 0;
	AdvRetransTimer 0;
	AdvCurHopLimit 64;
	AdvDefaultLifetime 1800;
	AdvHomeAgentFlag off;
	AdvDefaultPreference medium;
	AdvSourceLLAddress on;

	prefix 2001:470:8:d63::ffff/64
	{
		AdvValidLifetime 86400;
		AdvPreferredLifetime 14400;
		AdvOnLink on;
		AdvAutonomous on;
		AdvRouterAddr off;
	}; # End of prefix definition


	RDNSS fe80::224:a5ff:fed7:8dfa
	{
		AdvRDNSSLifetime 600;
	}; # End of RDNSS definition


	DNSSL poochiereds.net
	{
		AdvDNSSLLifetime 600;
	}; # End of DNSSL definition

}; # End of interface definition

Comment 11 Dan Williams 2014-03-03 18:24:51 UTC
I believe that NM should be adding the prefix route "2001:470:8:d63::/64 via br0" here; the default route is not part of the decision to skip adding a route if a more general one already exists.  So the question is why isn't the route getting added, when NM is clearly parsing it from the RA.

(also odd is the DNSSL of only ".net" which should be "poochiereds.net", right?)

Comment 12 Jeff Layton 2014-03-03 18:39:14 UTC
(In reply to Dan Williams from comment #11)
> I believe that NM should be adding the prefix route "2001:470:8:d63::/64 via
> br0" here; the default route is not part of the decision to skip adding a
> route if a more general one already exists.  So the question is why isn't
> the route getting added, when NM is clearly parsing it from the RA.
> 

Agreed. That sounds correct to me.

> (also odd is the DNSSL of only ".net" which should be "poochiereds.net",
> right?)

I don't see this latter bit. The capture and the radvdump output seem to show a DNSSL of "poochiereds.net".

Comment 13 Thomas Haller 2014-03-03 19:43:31 UTC
Indeed the route should be added, but it is unclear why it doesn't. I was unable to reproduce this (and nobody else complained about it AFAIK).

@Jeff, can you still reproduce the issue?

It is very likely unrelated to libnl. Could you please as first step try to update to newest libnl and check that everything still works.

Then (if it still works), could you try newest f20 NetworkManager again? With debug logging enabled please. I assume that the router announcements are always the same (right)?

Sorry, and thanks in advance for you help :)

Comment 14 Jeff Layton 2014-03-04 01:06:00 UTC
I updated libnl3. At that point, I ended up running into bug 1063290, so the bridge didn't start at boot time. I restarted NetworkManager and it came back. The routes and addresses seem to be OK:

[root@tlielax ~]# ip -6 route show
2001:470:8:d63:b8b3:feff:fec8:5468 dev br0  proto kernel  metric 256  expires 86219sec
2001:470:8:d63::/64 dev br0  proto static  metric 20 
fe80::/64 dev br0  proto kernel  metric 256 
fe80::/64 dev em1  proto kernel  metric 256 
default via fe80::224:a5ff:fed7:8dfa dev br0  proto static  metric 1024 

[root@tlielax ~]# ip -6 addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 fe80::3a60:77ff:fe93:a95d/64 scope link 
       valid_lft forever preferred_lft forever
3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 
    inet6 2001:470:8:d63:b8b3:feff:fec8:5468/128 scope global dynamic 
       valid_lft 86208sec preferred_lft 86208sec
    inet6 fe80::b8b3:feff:fec8:5468/64 scope link 
       valid_lft forever preferred_lft forever

...though the /128 netmask on 2001:470:8:d63:b8b3:feff:fec8:5468 doesn't look quite right -- seems like that should be /64. What version of NM are you wanting me to try.

Just taking a quick glance at the changelog for the -31 version, I don't see any reason to believe that this would be substantially different from the one I tried on the 21st, and I've already collected debug logs for you for that version.

Is there some reason to believe that this is fixed in the new version, or that that version would provide better debug logs that would help resolve this?

Comment 15 Jeff Layton 2014-03-04 01:39:27 UTC
Created attachment 870202 [details]
journalctl --since 20:30

Ok, updated to latest NM and libnl3 again. Bug is still there so I gathered another debug log. Here it is.

Looks like I'll have to go back and re-downgrade by hand since the fedora repos seem to have purged the older version now. Hope this helps you track down the bug!

Comment 16 Jeff Layton 2014-03-04 12:10:07 UTC
Interestingly. On my rawhide VM running on the same machine and using the bridge, the prefixlen is being set to /64:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 52:54:00:9b:39:76 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.22/24 brd 192.168.1.255 scope global dynamic eth0
       valid_lft 42179sec preferred_lft 42179sec
    inet6 2001:470:8:d63:5054:ff:fe9b:3976/64 scope global dynamic 
       valid_lft 86248sec preferred_lft 14248sec
    inet6 fe80::5054:ff:fe9b:3976/64 scope link 
       valid_lft forever preferred_lft forever

...however, on the main host the bridge gets a prefixlen of /128 (even with the older packages):

3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 38:60:77:93:a9:5d brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.3/24 brd 192.168.1.255 scope global dynamic br0
       valid_lft 39781sec preferred_lft 39781sec
    inet 10.10.49.191/32 brd 10.10.49.191 scope global br0
       valid_lft forever preferred_lft forever
    inet6 2001:470:8:d63:3a60:77ff:fe93:a95d/128 scope global dynamic 
       valid_lft 86314sec preferred_lft 86314sec
    inet6 fe80::38b3:f4ff:feaf:97cf/64 scope link 
       valid_lft forever preferred_lft forever

...I have to wonder if the problem is related to that incorrect prefixlen.

Comment 17 Thomas Haller 2014-03-04 13:39:52 UTC
(In reply to Jeff Layton from comment #14)
> I updated libnl3. At that point, I ended up running into bug 1063290, so the
> bridge didn't start at boot time. I restarted NetworkManager and it came
> back. The routes and addresses seem to be OK:
> 
> [root@tlielax ~]# ip -6 route show
> 2001:470:8:d63:b8b3:feff:fec8:5468 dev br0  proto kernel  metric 256 
> expires 86219sec
> 2001:470:8:d63::/64 dev br0  proto static  metric 20 
> fe80::/64 dev br0  proto kernel  metric 256 
> fe80::/64 dev em1  proto kernel  metric 256 
> default via fe80::224:a5ff:fed7:8dfa dev br0  proto static  metric 1024 
> 
> [root@tlielax ~]# ip -6 addr show
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 
>     inet6 ::1/128 scope host 
>        valid_lft forever preferred_lft forever
> 2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
>     inet6 fe80::3a60:77ff:fe93:a95d/64 scope link 
>        valid_lft forever preferred_lft forever
> 3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 
>     inet6 2001:470:8:d63:b8b3:feff:fec8:5468/128 scope global dynamic 
>        valid_lft 86208sec preferred_lft 86208sec
>     inet6 fe80::b8b3:feff:fec8:5468/64 scope link 
>        valid_lft forever preferred_lft forever
> 
> ...though the /128 netmask on 2001:470:8:d63:b8b3:feff:fec8:5468 doesn't
> look quite right -- seems like that should be /64. What version of NM are
> you wanting me to try.
> 
> Just taking a quick glance at the changelog for the -31 version, I don't see
> any reason to believe that this would be substantially different from the
> one I tried on the 21st, and I've already collected debug logs for you for
> that version.
> 
> Is there some reason to believe that this is fixed in the new version, or
> that that version would provide better debug logs that would help resolve
> this?


I think I found the problem. If you could please verify that scratch build http://koji.fedoraproject.org/koji/taskinfo?taskID=6594618 fixes your problem.

Thank you.





Regarding /128 vs. /64:

NM adds autoconf address as /64 only when having at least:
  - NetworkManager-0.9.9.0-29.git20140131.fc20
  - libnl3-3.2.24-1.fc20
  - kernel-3.12.9-301.fc20
See bug 1045118. Without these versions it will add autoconf addresses as /128. However even then IPv6 should still work(!) (as it apparently did for you, seeing that you ran 0.9.9.0-28 -- which adds autoconf addresses as /128 too).

Your test with 0.9.9.0-31.git20131003.fc20 also sets the plen to /64, as does your test in rawhide. With debug logging you will see a line:

  print_support_extended_ifa_flags(): kernel and libnl support extended IFA_FLAGS (needed by NM for IPv6 private addresses)

So, that part is as expected and fine.

Comment 18 Jeff Layton 2014-03-04 14:03:18 UTC
Seems to work correctly with that version. Thanks! FWIW, the new version seems to set the prefix to /64 as well:

3: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 
    inet6 2001:470:8:d63:3a60:77ff:fe93:a95d/64 scope global dynamic 
       valid_lft 86318sec preferred_lft 86318sec
    inet6 fe80::3:f5ff:fe51:5f47/64 scope link 
       valid_lft forever preferred_lft forever

...and routing table now looks correct:

$ ip -6 route
2001:470:8:d63::/64 dev br0  proto static  metric 20 
2607:f8b0:4004:803::1006 via fe80::224:a5ff:fed7:8dfa dev br0  metric 0 
    cache 
fe80::224:a5ff:fed7:8dfa dev br0  metric 0 
    cache 
fe80::/64 dev br0  proto kernel  metric 256 
fe80::/64 dev em1  proto kernel  metric 256 
default via fe80::224:a5ff:fed7:8dfa dev br0  proto static  metric 1024

Comment 19 Thomas Haller 2014-03-04 14:13:41 UTC
(In reply to Jeff Layton from comment #18)

Great. Thanks for testing.

Btw, NetworkManager-0.9.9.0-31_1.git20131003.fc20 should work just fine for you until a new version is ready. I mean, no need for you to downgrade (unless you have other reasons to do so). Yes, /64 would be expected with this version too.




Branch "th/rh1068632_ipv6_device_route" contains a fix for review.

Comment 20 Thomas Haller 2014-03-04 14:19:54 UTC
Opened bug 1072410 for rhel-7, for this very same issue.

Comment 21 Dan Williams 2014-03-04 22:29:57 UTC
Branch looks good to me.

Comment 22 Jirka Klimes 2014-03-05 09:40:52 UTC
> core: fix adding gateway routes within the own subnet
Indentation of &&-conditions

Otherwise it seems fine.

Comment 23 Thomas Haller 2014-03-05 10:07:01 UTC
Pushed to master as

6f6cce core: fix adding gateway route for IPv6
8cd0de2 tivial/core: move common #defines to header file
4f7b1ca core: fix adding gateway routes within the own subnet
bd93117 trivial/core: remove duplicate #include

Comment 24 Neil Horman 2014-03-16 11:49:59 UTC
*** Bug 1074171 has been marked as a duplicate of this bug. ***

Comment 25 Bjoern Buerger 2014-03-17 17:20:43 UTC
Thomas, 

please be so kind and push this scratchbuild as official update. 
It fixed the issue for me as well and an update would have saved 
hours and hours of debugging :o)

Comment 26 Fedora Update System 2014-03-17 19:29:13 UTC
NetworkManager-0.9.9.0-32.git20131003.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/NetworkManager-0.9.9.0-32.git20131003.fc20

Comment 27 Reilly Hall 2014-03-18 14:13:03 UTC
I too can confirm that NetworkManager, after some kernel version started provisioning what the kernel handled automatically for many years now.  I don't necessarily mind that so long as the functionality in the kernel still remains and is merely being disabled by NM so it can handle it itself possibly allowing the user to customize what was previously done by the kernel without the user being able to customize or override that behavior.

After installing this scratchbuild I have working IPv6 within my subnet again using newer kernels.  Please push this out as I too was affected by this bug.  I do respectfully request that the functionality in the kernel to handle RS/RAs not be removed and be retained if for nothing else as a fall-back for systems where NetworkManager may not be desired or feasible to use.

Thanks.

Comment 28 Fedora Update System 2014-03-19 08:41:46 UTC
Package NetworkManager-0.9.9.0-32.git20131003.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing NetworkManager-0.9.9.0-32.git20131003.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-4039/NetworkManager-0.9.9.0-32.git20131003.fc20
then log in and leave karma (feedback).

Comment 29 Fedora Update System 2014-03-21 09:35:08 UTC
NetworkManager-0.9.9.0-32.git20131003.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 30 Thomas Haller 2014-04-03 07:45:30 UTC
*** Bug 1069421 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.