Created attachment 672267 [details] radvdump of two routers on eth2. Description of problem: Two IPv6 Router on same network, one configured with Router Advertisements with a High priority, second with low priority. First RA received was used as the default route instead of the higher priority router. See RFC 4191: https://tools.ietf.org/html/rfc4191 Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Configure two IPv6 Router with different RA router priorities. 2. Power up or connect lower priority router to the network first, wait for RA the connect second. 3. Actual results: First router remains the default route even if a higher priority router is present. Expected results: Highest priority router should become default route. Additional info: Dump of RA using radvdump is attached.
Neil, do you want to take a look at this one?
When default gateway is removed, second router is not used. Waited for RA lifetime to expire, and still did not switch to higher priority router.
Josh, sure, I'm on it
Created attachment 672505 [details] [PATCH] ipv6: Enforce RFC 4191 Default Gateway replacement RFC 4191 introduces the notion of router preference bits in router advertizements, which we appear to handle incorrectly. When two adverts arrive at a host, if the advert contains a higher priority than the current existing default router, we currently simply update the pref bits in the existing default route to reflect the higher preference. This operates under the assumption that the advert is for the same gateway as what we already have configured. What we should be doing is keeping the old default route, but adding a new route for the arriving adv with the higher preference value, so that we route to the proper preferred gateway, in the case where two independent routers exist on the same subnet. Signed-off-by: Neil Horman <nhorman> CC: "David S. Miller" <davem> CC: Hideaki YOSHIFUJI <yoshfuji> --- net/ipv6/ndisc.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)
http://koji.fedoraproject.org/koji/taskinfo?taskID=4841350 Theres a test build with the patch from comment 4 included. Can you please test it and confirm that it fixes your problem? Thanks!
Doesn't seem to fix the issue. Just to make sure, I installed: Jan 04 18:53:12 Updated: kernel-tools-libs-3.6.11-4.fc17.x86_64 Jan 04 18:53:15 Installed: kernel-3.6.11-4.fc17.x86_64 Jan 04 18:53:16 Installed: kernel-modules-extra-3.6.11-4.fc17.x86_64 Jan 04 18:53:16 Updated: kernel-tools-3.6.11-4.fc17.x86_64 After a reboot I did a traceroute6 with a medium priority router as the only router on the network: traceroute to www.google.com (2607:f8b0:400c:c01::6a), 30 hops max, 80 byte packets 1 2001:470:88ef:b::2 (2001:470:88ef:b::2) 0.762 ms 1.290 ms 1.522 ms 2 2001:470:883f:c::1 (2001:470:883f:c::1) 1.210 ms 1.660 ms 2.085 ms 3 2001:470:88ef:1::1 (2001:470:88ef:1::1) 2.528 ms 2.975 ms 3.421 ms 4 41magnum-1.tunnel.tserv4.nyc4.ipv6.he.net (2001:470:1f06:3ad::1) 24.388 ms 33.945 ms 29.353 ms Then I added a high priority router, verified receipt of a router advertisement with wireshark and did a traceroute6 again: traceroute to www.google.com (2607:f8b0:400c:c01::6a), 30 hops max, 80 byte packets 1 2001:470:88ef:b::2 (2001:470:88ef:b::2) 1.044 ms 1.294 ms 1.474 ms 2 2001:470:883f:c::1 (2001:470:883f:c::1) 0.685 ms 0.864 ms 1.109 ms 3 2001:470:88ef:1::1 (2001:470:88ef:1::1) 1.513 ms 1.739 ms 1.982 ms 4 41magnum-1.tunnel.tserv4.nyc4.ipv6.he.net (2001:470:1f06:3ad::1) 33.957 ms 29.396 ms 24.841 ms The high priority router is on both 2001:470:88ef:b::/64 and 2001:470:88ef:1::/64 so I shouldn't have seen the second hop in there.
Is it possible that there is a relationship with bug # 892059, in that the routing table is not being updated?
I don't know, plese send me copies of your routing table before and after adding the high priority router, as well as the tcpdump pcap files you captured. I should be able to tell you.
Routing table with Medium priority router only: route -n --inet6 | grep eth1 2001:470:88ef:b::/64 :: UAe 256 0 0 eth1 fd7f:853:da5d:2::/64 :: UAe 256 0 0 eth1 fe80::/64 :: U 256 0 0 eth1 ::/0 fe80::1841:1 UGDAe 1024 0 0 eth1 ff00::/8 :: U 256 0 0 eth1 Routing table after adding high priority router: route -n --inet6 | grep eth1 2001:470:88ef:b::/64 :: UAe 256 0 0 eth1 fd7f:853:da5d:2::/64 :: UAe 256 0 0 eth1 fe80::/64 :: U 256 0 0 eth1 ::/0 fe80::1841:1 UGDAe 1024 0 0 eth1 ::/0 fe80::3825 UGDAe 1024 0 0 eth1 ff00::/8 :: U 256 0 0 eth1
Created attachment 673170 [details] Capture file showing adding second router with higher priority.
Thank you, I'm going to try set up a reproducer here
FWIW, I don't think this is related to your other bz, the routing table is clearly getting updated, as the new router shows up. I'm wondering if perhaps the route scoring isn't working properly relative to the route priority bits in the rt6i_flags.
Ok, so I've got the problem (sort of) reproduced. I have 3 qemu guests setup, 2 routers and a client, using your radvd configs attached. I can get two default routes recognized properly. The thing is, it works properly. If I boot the high priority router after the client and low priority router have exchanged router adverts, I get the second default route, and it gets used. So i'm not quite sure whats going on. I do occasionally see a failure in which the low priority router is selected, but it seems to be a result of NetworkManager adding a static route to the routing table pointing to the first router it finds, which certainly seems wrong. I'm still looking at it, but just to be on the safe side, could you please send me the unfiltered output of: ip -6 route show I'd like to make sure that you're not seeing the same issue. If you are, then this is an NM problem, and I'll get those guys involved. Thanks!
I have been running NM on all the systems I've checked this on, so I guess I can't rule that out. This is with just the medium priority router (1841): ip -6 route show 2001:470:88ef:1::1 via fe80::1841:1 dev eth1 metric 0 cache 2001:470:88ef:1::1:0 via fe80::1841:1 dev eth1 metric 0 cache 2001:470:88ef:b::/64 dev eth1 proto kernel metric 256 expires 2591192sec 2001:41d0:1:335c::1 via fe80::1841:1 dev eth1 metric 0 cache 2401:dd00:1::162 via fe80::1841:1 dev eth1 metric 0 cache 2a00:13d0:101::7 via fe80::1841:1 dev eth1 metric 0 cache 2a01:390::bbbb:3 via fe80::1841:1 dev eth1 metric 0 cache fd7f:853:da5d:2::/64 dev eth1 proto kernel metric 256 expires 2591192sec unreachable fe80::/64 dev lo proto kernel metric 256 error -101 fe80::/64 dev eth2 proto kernel metric 256 fe80::/64 dev eth1 proto kernel metric 256 default via fe80::1841:1 dev eth1 proto static metric 1 default via fe80::1841:1 dev eth1 proto ra metric 1024 expires 1780sec This is after adding the high priority router (3825): ip -6 route show 2001:470:88ef:1::6 via fe80::1841:1 dev eth1 metric 0 cache 2001:470:88ef:b::/64 dev eth1 proto kernel metric 256 expires 2591172sec 2001:41d0:1:335c::1 via fe80::1841:1 dev eth1 metric 0 cache 2607:f8b0:4004:801::1009 via fe80::1841:1 dev eth1 metric 0 cache 2607:f8b0:400d:c01::65 via fe80::1841:1 dev eth1 metric 0 cache 2607:f8b0:400d:c01::88 via fe80::1841:1 dev eth1 metric 0 cache fd7f:853:da5d:2::/64 dev eth1 proto kernel metric 256 expires 2591172sec unreachable fe80::/64 dev lo proto kernel metric 256 error -101 fe80::/64 dev eth2 proto kernel metric 256 fe80::/64 dev eth1 proto kernel metric 256 default via fe80::1841:1 dev eth1 proto static metric 1 default via fe80::1841:1 dev eth1 proto ra metric 1024 expires 1685sec default via fe80::3825 dev eth1 proto ra metric 1024 expires 1759sec
yup, you appear to have the same symptom I did. Namely, this route: default via fe80::1841:1 dev eth1 proto static metric 1 Its a static route, added by NetworkManager, which overrides both of the dynamically added routes. I'm not sure why NetworkManager does that, but it certainly seems wrong to me. I would suggest you do the following if possible as a test: 1) Disable the NetworkManager service on the client system, enable the network service 2) edit /etc/sysconfig/network-scripts/ifcfg-eth1. Change NM_CONTROLLED="yes" to NM_CONTROLLED="no" 3) Reboot the client. 4) When it comes back up, you should see 2 default routes, instead of 3 (the static route will be gone). And the high priority router will be used despite the order that their RA's arrive in. If that fixes the problem for you, we can reassign this over to the NetworkManager component, and I can take a look at how to fix this in NM.
You were right, it was a NM issue. Med pri router only: ip -6 route show 2001:418:8405:4002::2 via fe80::1841:1 dev eth1 metric 0 cache 2001:470:88ef:1::6 via fe80::1841:1 dev eth1 metric 0 cache 2001:470:88ef:1::1:0 via fe80::1841:1 dev eth1 metric 0 cache 2001:470:88ef:1:21b:21ff:fecd:95da via fe80::1841:1 dev eth1 metric 0 cache 2001:470:88ef:b::/64 dev eth1 proto kernel metric 256 expires 2591069sec fd7f:853:da5d:2::/64 dev eth1 proto kernel metric 256 expires 2591069sec unreachable fe80::/64 dev lo proto kernel metric 256 error -101 fe80::/64 dev eth2 proto kernel metric 256 fe80::/64 dev eth1 proto kernel metric 256 default via fe80::1841:1 dev eth1 proto ra metric 1024 expires 1657sec Added High: ip -6 route show 2001:470:88ef:1::6 via fe80::3825 dev eth1 metric 0 cache 2001:470:88ef:b::/64 dev eth1 proto kernel metric 256 expires 2591189sec fd7f:853:da5d:2::/64 dev eth1 proto kernel metric 256 expires 2591189sec unreachable fe80::/64 dev lo proto kernel metric 256 error -101 fe80::/64 dev eth2 proto kernel metric 256 fe80::/64 dev eth1 proto kernel metric 256 default via fe80::1841:1 dev eth1 proto ra metric 1024 expires 1662sec default via fe80::3825 dev eth1 proto ra metric 1024 expires 1776sec I do think that bug # 892059 is a duplicate of this due to the extra NM static route. Neil, would you like to verify and mark as duplicate?
*** Bug 892059 has been marked as a duplicate of this bug. ***
Confirmed, If I set the lifetimes of the routers to very low values, I see them age out, but the static route remains.
Ok, Reassigning this over to NetworkManager. I'll work with the maintainer to Nail down a solution to this: Dan, In summary of this issue, Michael is running an IPv6 network on which he has a subnet with 2 routers, 1 sending RA's with a high priority flag, and the other with a low priority flag. He observes that, if the low priority router issues an RA first, clients will use that router as their default gateway, even after the high priority router advertises its prefix. The problem stems from an odd behavior of NetworkManager. It seems, even though the default for the kernel is to use the router advertisements to generate default gateways, NetworkManager still adds a static route based on the first router advertisement that it gets a notification for via netlink. I can understand why it might want to do this if dhcpv6 is being used, but its incorrect to do when SLAAC is being used. NetworkManager I think should either: 1) Not add default gateways at all when SLAAC is being used, as the kernel will do this automatically 2) Disable the kernels ability to generate default routes based on router adverts. You can do this by: echo 0 > /proc/sys/net/ipv6/conf/all/accept_ra_defrtr echo 0 > /proc/sys/net/ipv6/conf/all/accept_ra_rtr_pref I would think the first option would be the better approach, as pursuing the second would imply that NetworkManager would then have to understand and parse router preference bits in the RA frame to add default gateways properly, and the netlink interface doesn't currently export that information to the best of my recollection.
One thing to check if anyone has access to an F18 box is whether this is still an issue in F18. We did some work after 0.9.6.4 on the static route thing. But the second issue with RA priority may not be fixed yet.
I have the whole reproducer set up via qemu guests. I can easily install an F18 image and validate if the problem is gone or not. I'll let you know tomorrow afternoon
confirmed, with an F18 guest, NM does not add a default static route to the kernel routing table, and the router preference bits are honored correctly. Not sure if you want to backport the NM changes to F17 or if you just want to cose this as NEXTRELEASE
This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.
4 years later, i'm running into the same issue: static ipv6 route added by (presumably) NetworkManager, for a low-pref router. these are the two advertisements: 16:45:43.942103 00:0d:60:ff:05:55 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 110: (flowlabel 0x4dd1b, hlim 255, next-header ICMPv6 (58) payload length: 56) fe80::20d:60ff:feff:555 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 56 hop limit 64, Flags [managed], pref high, router lifetime 1800s, reachable time 0s, retrans time 0s prefix info option (3), length 32 (4): 2001:470:1f08:bb6::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s 0x0000: 40c0 0001 5180 0000 3840 0000 0000 2001 0x0010: 0470 1f08 0bb6 0000 0000 0000 0000 source link-address option (1), length 8 (1): 00:0d:60:ff:05:55 0x0000: 000d 60ff 0555 16:45:45.677722 60:e3:27:49:69:47 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 78: (hlim 255, next-header ICMPv6 (58) payload length: 24) fe80::bc87:32ff:fe14:7678 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 24 hop limit 64, Flags [managed], pref low, router lifetime 30s, reachable time 0s, retrans time 0s source link-address option (1), length 8 (1): 60:e3:27:49:69:47 0x0000: 60e3 2749 6947 first is hi-pri, second is low-pri. however, i end up with a default from the low-pri adv instead of hi-pri: $ ip -6 route show 2001:470:1f08:bb6::/64 dev wlp3s0 proto kernel metric 256 expires 86032sec pref medium fe80::/64 dev vmnet1 proto kernel metric 256 pref medium fe80::/64 dev vmnet8 proto kernel metric 256 pref medium fe80::/64 dev docker0 proto kernel metric 256 linkdown pref medium fe80::/64 dev wlp3s0 proto kernel metric 256 pref medium default via fe80::bc87:32ff:fe14:7678 dev wlp3s0 proto static metric 600 pref medium if is block ipv6-icmp from the low-pri router, everything is fine: $ sudo ip6tables -A INPUT -p ipv6-icmp -m mac --mac-source 60:e3:27:49:69:47 -j DROP $ nmcli c down Cesanta Connection 'Cesanta' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/2) $ nmcli c up Cesanta Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/4) $ ip -6 route show 2001:470:1f08:bb6::/64 dev wlp3s0 proto ra metric 600 pref medium fe80::/64 dev vmnet1 proto kernel metric 256 pref medium fe80::/64 dev vmnet8 proto kernel metric 256 pref medium fe80::/64 dev docker0 proto kernel metric 256 linkdown pref medium fe80::/64 dev wlp3s0 proto kernel metric 256 pref medium default via fe80::20d:60ff:feff:555 dev wlp3s0 proto static metric 600 pref medium in mn's debug log i see this: NetworkManager[25266]: <info> [1503330839.7675] policy: set 'Cesanta' (wlp3s0) as default for IPv6 routing and DNS that's fine, but the router it picks is the wrong one.