891784 – IPv6 RA router preference ignored

Bug 891784 - IPv6 RA router preference ignored

Summary: IPv6 RA router preference ignored

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	NetworkManager
Sub Component:
Version:	17
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Dan Williams
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	892059 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-01-04 00:29 UTC by Michael L
Modified:	2017-08-21 15:56 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-07-31 23:44:06 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
radvdump of two routers on eth2. (1.34 KB, text/plain) 2013-01-04 00:29 UTC, Michael L	no flags	Details
[PATCH] ipv6: Enforce RFC 4191 Default Gateway replacement (2.06 KB, patch) 2013-01-04 18:57 UTC, Neil Horman	no flags	Details \| Diff
Capture file showing adding second router with higher priority. (13.98 KB, application/vnd.tcpdump.pcap) 2013-01-06 00:49 UTC, Michael L	no flags	Details
View All

Description Michael L 2013-01-04 00:29:16 UTC

Created attachment 672267 [details]
radvdump of two routers on eth2.

Description of problem:  Two IPv6 Router on same network, one configured with Router Advertisements with a High priority, second with low priority.  First RA received was used as the default route instead of the higher priority router.

See RFC 4191:  https://tools.ietf.org/html/rfc4191


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.  Configure two IPv6 Router with different RA router priorities.
2.  Power up or connect lower priority router to the network first, wait for RA the connect second.
3.
  
Actual results:  First router remains the default route even if a higher priority router is present.


Expected results:  Highest priority router should become default route.


Additional info:  Dump of RA using radvdump is attached.

Comment 1 Josh Boyer 2013-01-04 01:02:29 UTC

Neil, do you want to take a look at this one?

Comment 2 Michael L 2013-01-04 01:34:27 UTC

When default gateway is removed, second router is not used.  Waited for RA lifetime to expire, and still did not switch to higher priority router.

Comment 3 Neil Horman 2013-01-04 15:13:12 UTC

Josh, sure, I'm on it

Comment 4 Neil Horman 2013-01-04 18:57:52 UTC

Created attachment 672505 [details]
[PATCH] ipv6: Enforce RFC 4191 Default Gateway replacement


RFC 4191 introduces the notion of router preference bits in router
advertizements, which we appear to handle incorrectly.  When two adverts arrive
at a host, if the advert contains a higher priority than the current existing
default router, we currently simply update the pref bits in the existing default
route to reflect the higher preference.  This operates under the assumption that
the advert is for the same gateway as what we already have configured.  What we
should be doing is keeping the old default route, but adding a new route for the
arriving adv with the higher preference value, so that we route to the proper
preferred gateway, in the case where two independent routers exist on the same
subnet.

Signed-off-by: Neil Horman <nhorman>
CC: "David S. Miller" <davem>
CC: Hideaki YOSHIFUJI <yoshfuji>
---
 net/ipv6/ndisc.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

Comment 5 Neil Horman 2013-01-04 20:14:36 UTC

http://koji.fedoraproject.org/koji/taskinfo?taskID=4841350

Theres a test build with the patch from comment 4 included.  Can you please test it and confirm that it fixes your problem?  Thanks!

Comment 6 Michael L 2013-01-05 01:12:49 UTC

Doesn't seem to fix the issue.  Just to make sure, I installed:

Jan 04 18:53:12 Updated: kernel-tools-libs-3.6.11-4.fc17.x86_64
Jan 04 18:53:15 Installed: kernel-3.6.11-4.fc17.x86_64
Jan 04 18:53:16 Installed: kernel-modules-extra-3.6.11-4.fc17.x86_64
Jan 04 18:53:16 Updated: kernel-tools-3.6.11-4.fc17.x86_64


After a reboot I did a traceroute6 with a medium priority router as the only router on the network:

traceroute to www.google.com (2607:f8b0:400c:c01::6a), 30 hops max, 80 byte packets
 1  2001:470:88ef:b::2 (2001:470:88ef:b::2)  0.762 ms  1.290 ms  1.522 ms
 2  2001:470:883f:c::1 (2001:470:883f:c::1)  1.210 ms  1.660 ms  2.085 ms
 3  2001:470:88ef:1::1 (2001:470:88ef:1::1)  2.528 ms  2.975 ms  3.421 ms
 4  41magnum-1.tunnel.tserv4.nyc4.ipv6.he.net (2001:470:1f06:3ad::1)  24.388 ms  33.945 ms  29.353 ms

Then I added a high priority router, verified receipt of a router advertisement with wireshark and did a traceroute6 again:

traceroute to www.google.com (2607:f8b0:400c:c01::6a), 30 hops max, 80 byte packets
 1  2001:470:88ef:b::2 (2001:470:88ef:b::2)  1.044 ms  1.294 ms  1.474 ms
 2  2001:470:883f:c::1 (2001:470:883f:c::1)  0.685 ms  0.864 ms  1.109 ms
 3  2001:470:88ef:1::1 (2001:470:88ef:1::1)  1.513 ms  1.739 ms  1.982 ms
 4  41magnum-1.tunnel.tserv4.nyc4.ipv6.he.net (2001:470:1f06:3ad::1)  33.957 ms  29.396 ms  24.841 ms

The high priority router is on both 2001:470:88ef:b::/64 and 2001:470:88ef:1::/64 so I shouldn't have seen the second hop in there.

Comment 7 Michael L 2013-01-05 14:00:12 UTC

Is it possible that there is a relationship with bug # 892059, in that the routing table is not being updated?

Comment 8 Neil Horman 2013-01-05 21:47:04 UTC

I don't know, plese send me copies of your routing table before and after adding the high priority router, as well as the tcpdump pcap files you captured.  I should be able to tell you.

Comment 9 Michael L 2013-01-06 00:46:31 UTC

Routing table with Medium priority router only:

route -n --inet6 | grep eth1
2001:470:88ef:b::/64           ::                         UAe  256 0     0 eth1
fd7f:853:da5d:2::/64           ::                         UAe  256 0     0 eth1
fe80::/64                      ::                         U    256 0     0 eth1
::/0                           fe80::1841:1               UGDAe 1024 0     0 eth1
ff00::/8                       ::                         U    256 0     0 eth1


Routing table after adding high priority router:

route -n --inet6 | grep eth1
2001:470:88ef:b::/64           ::                         UAe  256 0     0 eth1
fd7f:853:da5d:2::/64           ::                         UAe  256 0     0 eth1
fe80::/64                      ::                         U    256 0     0 eth1
::/0                           fe80::1841:1               UGDAe 1024 0     0 eth1
::/0                           fe80::3825                 UGDAe 1024 0     0 eth1
ff00::/8                       ::                         U    256 0     0 eth1

Comment 10 Michael L 2013-01-06 00:49:38 UTC

Created attachment 673170 [details]
Capture file showing adding second router with higher priority.

Comment 11 Neil Horman 2013-01-06 14:20:10 UTC

Thank you, I'm going to try set up a reproducer here

Comment 12 Neil Horman 2013-01-06 14:21:47 UTC

FWIW, I don't think this is related to your other bz, the routing table is clearly getting updated, as the new router shows up.  I'm wondering if perhaps the route scoring isn't working properly relative to the route priority bits in the rt6i_flags.

Comment 13 Neil Horman 2013-01-08 21:31:59 UTC

Ok, so I've got the problem (sort of) reproduced.  I have 3 qemu guests setup, 2 routers and a client, using your radvd configs attached.  I can get two default routes recognized properly.  The thing is, it works properly.  If I boot the high priority router after the client and low priority router have exchanged router adverts, I get the second default route, and it gets used.  So i'm not quite sure whats going on. I do occasionally see a failure in which the low priority router is selected, but it seems to be a result of NetworkManager adding a static route to the routing table pointing to the first router it finds, which certainly seems wrong.  

I'm still looking at it, but just to be on the safe side, could you please send me the unfiltered output of:
ip -6 route show

I'd like to make sure that you're not seeing the same issue.  If you are, then this is an NM problem, and I'll get those guys involved.

Thanks!

Comment 14 Michael L 2013-01-09 01:13:59 UTC

I have been running NM on all the systems I've checked this on, so I guess I can't rule that out.

This is with just the medium priority router (1841):
ip -6 route show
2001:470:88ef:1::1 via fe80::1841:1 dev eth1  metric 0 
    cache 
2001:470:88ef:1::1:0 via fe80::1841:1 dev eth1  metric 0 
    cache 
2001:470:88ef:b::/64 dev eth1  proto kernel  metric 256  expires 2591192sec
2001:41d0:1:335c::1 via fe80::1841:1 dev eth1  metric 0 
    cache 
2401:dd00:1::162 via fe80::1841:1 dev eth1  metric 0 
    cache 
2a00:13d0:101::7 via fe80::1841:1 dev eth1  metric 0 
    cache 
2a01:390::bbbb:3 via fe80::1841:1 dev eth1  metric 0 
    cache 
fd7f:853:da5d:2::/64 dev eth1  proto kernel  metric 256  expires 2591192sec
unreachable fe80::/64 dev lo  proto kernel  metric 256  error -101
fe80::/64 dev eth2  proto kernel  metric 256 
fe80::/64 dev eth1  proto kernel  metric 256 
default via fe80::1841:1 dev eth1  proto static  metric 1 
default via fe80::1841:1 dev eth1  proto ra  metric 1024  expires 1780sec


This is after adding the high priority router (3825):
ip -6 route show
2001:470:88ef:1::6 via fe80::1841:1 dev eth1  metric 0 
    cache 
2001:470:88ef:b::/64 dev eth1  proto kernel  metric 256  expires 2591172sec
2001:41d0:1:335c::1 via fe80::1841:1 dev eth1  metric 0 
    cache 
2607:f8b0:4004:801::1009 via fe80::1841:1 dev eth1  metric 0 
    cache 
2607:f8b0:400d:c01::65 via fe80::1841:1 dev eth1  metric 0 
    cache 
2607:f8b0:400d:c01::88 via fe80::1841:1 dev eth1  metric 0 
    cache 
fd7f:853:da5d:2::/64 dev eth1  proto kernel  metric 256  expires 2591172sec
unreachable fe80::/64 dev lo  proto kernel  metric 256  error -101
fe80::/64 dev eth2  proto kernel  metric 256 
fe80::/64 dev eth1  proto kernel  metric 256 
default via fe80::1841:1 dev eth1  proto static  metric 1 
default via fe80::1841:1 dev eth1  proto ra  metric 1024  expires 1685sec
default via fe80::3825 dev eth1  proto ra  metric 1024  expires 1759sec

Comment 15 Neil Horman 2013-01-09 19:59:01 UTC

yup, you appear to have the same symptom I did.  Namely, this route:
default via fe80::1841:1 dev eth1  proto static  metric 1 

Its a static route, added by NetworkManager, which overrides both of the dynamically added routes.  I'm not sure why NetworkManager does that, but it certainly seems wrong to me. I would suggest you do the following if possible as a test:

1) Disable the NetworkManager service on the client system, enable the network service

2) edit /etc/sysconfig/network-scripts/ifcfg-eth1.  Change NM_CONTROLLED="yes" to NM_CONTROLLED="no"

3) Reboot the client.

4) When it comes back up, you should see 2 default routes, instead of 3 (the static route will be gone).  And the high priority router will be used despite the order that their RA's arrive in.  

If that fixes the problem for you, we can reassign this over to the NetworkManager component, and I can take a look at how to fix this in NM.

Comment 16 Michael L 2013-01-10 02:40:10 UTC

You were right, it was a NM issue.

Med pri router only:
ip -6 route show 
2001:418:8405:4002::2 via fe80::1841:1 dev eth1  metric 0 
    cache 
2001:470:88ef:1::6 via fe80::1841:1 dev eth1  metric 0 
    cache 
2001:470:88ef:1::1:0 via fe80::1841:1 dev eth1  metric 0 
    cache 
2001:470:88ef:1:21b:21ff:fecd:95da via fe80::1841:1 dev eth1  metric 0 
    cache 
2001:470:88ef:b::/64 dev eth1  proto kernel  metric 256  expires 2591069sec
fd7f:853:da5d:2::/64 dev eth1  proto kernel  metric 256  expires 2591069sec
unreachable fe80::/64 dev lo  proto kernel  metric 256  error -101
fe80::/64 dev eth2  proto kernel  metric 256 
fe80::/64 dev eth1  proto kernel  metric 256 
default via fe80::1841:1 dev eth1  proto ra  metric 1024  expires 1657sec


Added High:
ip -6 route show 
2001:470:88ef:1::6 via fe80::3825 dev eth1  metric 0 
    cache 
2001:470:88ef:b::/64 dev eth1  proto kernel  metric 256  expires 2591189sec
fd7f:853:da5d:2::/64 dev eth1  proto kernel  metric 256  expires 2591189sec
unreachable fe80::/64 dev lo  proto kernel  metric 256  error -101
fe80::/64 dev eth2  proto kernel  metric 256 
fe80::/64 dev eth1  proto kernel  metric 256 
default via fe80::1841:1 dev eth1  proto ra  metric 1024  expires 1662sec
default via fe80::3825 dev eth1  proto ra  metric 1024  expires 1776sec

I do think that bug # 892059 is a duplicate of this due to the extra NM static route.  Neil, would you like to verify and mark as duplicate?

Comment 17 Neil Horman 2013-01-10 14:42:36 UTC

*** Bug 892059 has been marked as a duplicate of this bug. ***

Comment 18 Neil Horman 2013-01-10 14:45:09 UTC

Confirmed, If I set the lifetimes of the routers to very low values, I see them age out, but the static route remains.

Comment 19 Neil Horman 2013-01-10 15:10:41 UTC

Ok, Reassigning this over to NetworkManager. I'll work with the maintainer to Nail down a solution to this:

Dan, In summary of this issue, Michael is running an IPv6 network on which he has a subnet with 2 routers, 1 sending RA's with a high priority flag, and the other with a low priority flag. He observes that, if the low priority router issues an RA first, clients will use that router as their default gateway, even after the high priority router advertises its prefix. The problem stems from an odd behavior of NetworkManager. It seems, even though the default for the kernel is to use the router advertisements to generate default gateways, NetworkManager still adds a static route based on the first router advertisement that it gets a notification for via netlink. I can understand why it might want to do this if dhcpv6 is being used, but its incorrect to do when SLAAC is being used. NetworkManager I think should either:

1) Not add default gateways at all when SLAAC is being used, as the kernel will do this automatically

2) Disable the kernels ability to generate default routes based on router adverts. You can do this by:
echo 0 > /proc/sys/net/ipv6/conf/all/accept_ra_defrtr
echo 0 > /proc/sys/net/ipv6/conf/all/accept_ra_rtr_pref

I would think the first option would be the better approach, as pursuing the second would imply that NetworkManager would then have to understand and parse router preference bits in the RA frame to add default gateways properly, and the netlink interface doesn't currently export that information to the best of my recollection.

Comment 20 Dan Williams 2013-01-10 16:18:51 UTC

One thing to check if anyone has access to an F18 box is whether this is still an issue in F18.  We did some work after 0.9.6.4 on the static route thing.  But the second issue with RA priority may not be fixed yet.

Comment 21 Neil Horman 2013-01-10 18:08:20 UTC

I have the whole reproducer set up via qemu guests.  I can easily install an F18 image and validate if the problem is gone or not.  I'll let you know tomorrow afternoon

Comment 22 Neil Horman 2013-01-11 15:24:53 UTC

confirmed, with an F18 guest, NM does not add a default static route to the kernel routing table, and the router preference bits are honored correctly.  Not sure if you want to backport the NM changes to F17 or if you just want to cose this as NEXTRELEASE

Comment 23 Fedora End Of Life 2013-07-03 22:05:47 UTC

This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 24 Fedora End Of Life 2013-07-31 23:44:14 UTC

Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 25 Deomid Ryabkov 2017-08-21 15:56:22 UTC

4 years later, i'm running into the same issue: static ipv6 route added by (presumably) NetworkManager, for a low-pref router.

these are the two advertisements:

16:45:43.942103 00:0d:60:ff:05:55 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 110: (flowlabel 0x4dd1b, hlim 255, next-header ICMPv6 (58) payload length: 56) fe80::20d:60ff:feff:555 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 56
        hop limit 64, Flags [managed], pref high, router lifetime 1800s, reachable time 0s, retrans time 0s
          prefix info option (3), length 32 (4): 2001:470:1f08:bb6::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s
            0x0000:  40c0 0001 5180 0000 3840 0000 0000 2001
            0x0010:  0470 1f08 0bb6 0000 0000 0000 0000
          source link-address option (1), length 8 (1): 00:0d:60:ff:05:55
            0x0000:  000d 60ff 0555

16:45:45.677722 60:e3:27:49:69:47 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 78: (hlim 255, next-header ICMPv6 (58) payload length: 24) fe80::bc87:32ff:fe14:7678 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 24
        hop limit 64, Flags [managed], pref low, router lifetime 30s, reachable time 0s, retrans time 0s
          source link-address option (1), length 8 (1): 60:e3:27:49:69:47
            0x0000:  60e3 2749 6947

first is hi-pri, second is low-pri.
however, i end up with a default from the low-pri adv instead of hi-pri:

$ ip -6 route show
2001:470:1f08:bb6::/64 dev wlp3s0  proto kernel  metric 256  expires 86032sec pref medium
fe80::/64 dev vmnet1  proto kernel  metric 256  pref medium
fe80::/64 dev vmnet8  proto kernel  metric 256  pref medium
fe80::/64 dev docker0  proto kernel  metric 256 linkdown  pref medium
fe80::/64 dev wlp3s0  proto kernel  metric 256  pref medium
default via fe80::bc87:32ff:fe14:7678 dev wlp3s0  proto static  metric 600  pref medium


if is block ipv6-icmp from the low-pri router, everything is fine:

$ sudo ip6tables -A INPUT -p ipv6-icmp -m mac --mac-source 60:e3:27:49:69:47 -j DROP
$ nmcli c down Cesanta
Connection 'Cesanta' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/2)
$ nmcli c up Cesanta
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/4)
$ ip -6 route show
2001:470:1f08:bb6::/64 dev wlp3s0  proto ra  metric 600  pref medium
fe80::/64 dev vmnet1  proto kernel  metric 256  pref medium
fe80::/64 dev vmnet8  proto kernel  metric 256  pref medium
fe80::/64 dev docker0  proto kernel  metric 256 linkdown  pref medium
fe80::/64 dev wlp3s0  proto kernel  metric 256  pref medium
default via fe80::20d:60ff:feff:555 dev wlp3s0  proto static  metric 600  pref medium


in mn's debug log i see this:

NetworkManager[25266]: <info>  [1503330839.7675] policy: set 'Cesanta' (wlp3s0) as default for IPv6 routing and DNS

that's fine, but the router it picks is the wrong one.

Note You need to log in before you can comment on or make changes to this bug.