Bug 891784
| Summary: | IPv6 RA router preference ignored | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Michael L <mdl-mailing> | ||||||||
| Component: | NetworkManager | Assignee: | Dan Williams <dcbw> | ||||||||
| Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 17 | CC: | dcbw, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, mdl-mailing, nhorman, rojer | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2013-07-31 23:44:06 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Michael L
2013-01-04 00:29:16 UTC
Neil, do you want to take a look at this one? When default gateway is removed, second router is not used. Waited for RA lifetime to expire, and still did not switch to higher priority router. Josh, sure, I'm on it Created attachment 672505 [details]
[PATCH] ipv6: Enforce RFC 4191 Default Gateway replacement
RFC 4191 introduces the notion of router preference bits in router
advertizements, which we appear to handle incorrectly. When two adverts arrive
at a host, if the advert contains a higher priority than the current existing
default router, we currently simply update the pref bits in the existing default
route to reflect the higher preference. This operates under the assumption that
the advert is for the same gateway as what we already have configured. What we
should be doing is keeping the old default route, but adding a new route for the
arriving adv with the higher preference value, so that we route to the proper
preferred gateway, in the case where two independent routers exist on the same
subnet.
Signed-off-by: Neil Horman <nhorman>
CC: "David S. Miller" <davem>
CC: Hideaki YOSHIFUJI <yoshfuji>
---
net/ipv6/ndisc.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
http://koji.fedoraproject.org/koji/taskinfo?taskID=4841350 Theres a test build with the patch from comment 4 included. Can you please test it and confirm that it fixes your problem? Thanks! Doesn't seem to fix the issue. Just to make sure, I installed: Jan 04 18:53:12 Updated: kernel-tools-libs-3.6.11-4.fc17.x86_64 Jan 04 18:53:15 Installed: kernel-3.6.11-4.fc17.x86_64 Jan 04 18:53:16 Installed: kernel-modules-extra-3.6.11-4.fc17.x86_64 Jan 04 18:53:16 Updated: kernel-tools-3.6.11-4.fc17.x86_64 After a reboot I did a traceroute6 with a medium priority router as the only router on the network: traceroute to www.google.com (2607:f8b0:400c:c01::6a), 30 hops max, 80 byte packets 1 2001:470:88ef:b::2 (2001:470:88ef:b::2) 0.762 ms 1.290 ms 1.522 ms 2 2001:470:883f:c::1 (2001:470:883f:c::1) 1.210 ms 1.660 ms 2.085 ms 3 2001:470:88ef:1::1 (2001:470:88ef:1::1) 2.528 ms 2.975 ms 3.421 ms 4 41magnum-1.tunnel.tserv4.nyc4.ipv6.he.net (2001:470:1f06:3ad::1) 24.388 ms 33.945 ms 29.353 ms Then I added a high priority router, verified receipt of a router advertisement with wireshark and did a traceroute6 again: traceroute to www.google.com (2607:f8b0:400c:c01::6a), 30 hops max, 80 byte packets 1 2001:470:88ef:b::2 (2001:470:88ef:b::2) 1.044 ms 1.294 ms 1.474 ms 2 2001:470:883f:c::1 (2001:470:883f:c::1) 0.685 ms 0.864 ms 1.109 ms 3 2001:470:88ef:1::1 (2001:470:88ef:1::1) 1.513 ms 1.739 ms 1.982 ms 4 41magnum-1.tunnel.tserv4.nyc4.ipv6.he.net (2001:470:1f06:3ad::1) 33.957 ms 29.396 ms 24.841 ms The high priority router is on both 2001:470:88ef:b::/64 and 2001:470:88ef:1::/64 so I shouldn't have seen the second hop in there. Is it possible that there is a relationship with bug # 892059, in that the routing table is not being updated? I don't know, plese send me copies of your routing table before and after adding the high priority router, as well as the tcpdump pcap files you captured. I should be able to tell you. Routing table with Medium priority router only: route -n --inet6 | grep eth1 2001:470:88ef:b::/64 :: UAe 256 0 0 eth1 fd7f:853:da5d:2::/64 :: UAe 256 0 0 eth1 fe80::/64 :: U 256 0 0 eth1 ::/0 fe80::1841:1 UGDAe 1024 0 0 eth1 ff00::/8 :: U 256 0 0 eth1 Routing table after adding high priority router: route -n --inet6 | grep eth1 2001:470:88ef:b::/64 :: UAe 256 0 0 eth1 fd7f:853:da5d:2::/64 :: UAe 256 0 0 eth1 fe80::/64 :: U 256 0 0 eth1 ::/0 fe80::1841:1 UGDAe 1024 0 0 eth1 ::/0 fe80::3825 UGDAe 1024 0 0 eth1 ff00::/8 :: U 256 0 0 eth1 Created attachment 673170 [details]
Capture file showing adding second router with higher priority.
Thank you, I'm going to try set up a reproducer here FWIW, I don't think this is related to your other bz, the routing table is clearly getting updated, as the new router shows up. I'm wondering if perhaps the route scoring isn't working properly relative to the route priority bits in the rt6i_flags. Ok, so I've got the problem (sort of) reproduced. I have 3 qemu guests setup, 2 routers and a client, using your radvd configs attached. I can get two default routes recognized properly. The thing is, it works properly. If I boot the high priority router after the client and low priority router have exchanged router adverts, I get the second default route, and it gets used. So i'm not quite sure whats going on. I do occasionally see a failure in which the low priority router is selected, but it seems to be a result of NetworkManager adding a static route to the routing table pointing to the first router it finds, which certainly seems wrong. I'm still looking at it, but just to be on the safe side, could you please send me the unfiltered output of: ip -6 route show I'd like to make sure that you're not seeing the same issue. If you are, then this is an NM problem, and I'll get those guys involved. Thanks! I have been running NM on all the systems I've checked this on, so I guess I can't rule that out.
This is with just the medium priority router (1841):
ip -6 route show
2001:470:88ef:1::1 via fe80::1841:1 dev eth1 metric 0
cache
2001:470:88ef:1::1:0 via fe80::1841:1 dev eth1 metric 0
cache
2001:470:88ef:b::/64 dev eth1 proto kernel metric 256 expires 2591192sec
2001:41d0:1:335c::1 via fe80::1841:1 dev eth1 metric 0
cache
2401:dd00:1::162 via fe80::1841:1 dev eth1 metric 0
cache
2a00:13d0:101::7 via fe80::1841:1 dev eth1 metric 0
cache
2a01:390::bbbb:3 via fe80::1841:1 dev eth1 metric 0
cache
fd7f:853:da5d:2::/64 dev eth1 proto kernel metric 256 expires 2591192sec
unreachable fe80::/64 dev lo proto kernel metric 256 error -101
fe80::/64 dev eth2 proto kernel metric 256
fe80::/64 dev eth1 proto kernel metric 256
default via fe80::1841:1 dev eth1 proto static metric 1
default via fe80::1841:1 dev eth1 proto ra metric 1024 expires 1780sec
This is after adding the high priority router (3825):
ip -6 route show
2001:470:88ef:1::6 via fe80::1841:1 dev eth1 metric 0
cache
2001:470:88ef:b::/64 dev eth1 proto kernel metric 256 expires 2591172sec
2001:41d0:1:335c::1 via fe80::1841:1 dev eth1 metric 0
cache
2607:f8b0:4004:801::1009 via fe80::1841:1 dev eth1 metric 0
cache
2607:f8b0:400d:c01::65 via fe80::1841:1 dev eth1 metric 0
cache
2607:f8b0:400d:c01::88 via fe80::1841:1 dev eth1 metric 0
cache
fd7f:853:da5d:2::/64 dev eth1 proto kernel metric 256 expires 2591172sec
unreachable fe80::/64 dev lo proto kernel metric 256 error -101
fe80::/64 dev eth2 proto kernel metric 256
fe80::/64 dev eth1 proto kernel metric 256
default via fe80::1841:1 dev eth1 proto static metric 1
default via fe80::1841:1 dev eth1 proto ra metric 1024 expires 1685sec
default via fe80::3825 dev eth1 proto ra metric 1024 expires 1759sec
yup, you appear to have the same symptom I did. Namely, this route: default via fe80::1841:1 dev eth1 proto static metric 1 Its a static route, added by NetworkManager, which overrides both of the dynamically added routes. I'm not sure why NetworkManager does that, but it certainly seems wrong to me. I would suggest you do the following if possible as a test: 1) Disable the NetworkManager service on the client system, enable the network service 2) edit /etc/sysconfig/network-scripts/ifcfg-eth1. Change NM_CONTROLLED="yes" to NM_CONTROLLED="no" 3) Reboot the client. 4) When it comes back up, you should see 2 default routes, instead of 3 (the static route will be gone). And the high priority router will be used despite the order that their RA's arrive in. If that fixes the problem for you, we can reassign this over to the NetworkManager component, and I can take a look at how to fix this in NM. You were right, it was a NM issue.
Med pri router only:
ip -6 route show
2001:418:8405:4002::2 via fe80::1841:1 dev eth1 metric 0
cache
2001:470:88ef:1::6 via fe80::1841:1 dev eth1 metric 0
cache
2001:470:88ef:1::1:0 via fe80::1841:1 dev eth1 metric 0
cache
2001:470:88ef:1:21b:21ff:fecd:95da via fe80::1841:1 dev eth1 metric 0
cache
2001:470:88ef:b::/64 dev eth1 proto kernel metric 256 expires 2591069sec
fd7f:853:da5d:2::/64 dev eth1 proto kernel metric 256 expires 2591069sec
unreachable fe80::/64 dev lo proto kernel metric 256 error -101
fe80::/64 dev eth2 proto kernel metric 256
fe80::/64 dev eth1 proto kernel metric 256
default via fe80::1841:1 dev eth1 proto ra metric 1024 expires 1657sec
Added High:
ip -6 route show
2001:470:88ef:1::6 via fe80::3825 dev eth1 metric 0
cache
2001:470:88ef:b::/64 dev eth1 proto kernel metric 256 expires 2591189sec
fd7f:853:da5d:2::/64 dev eth1 proto kernel metric 256 expires 2591189sec
unreachable fe80::/64 dev lo proto kernel metric 256 error -101
fe80::/64 dev eth2 proto kernel metric 256
fe80::/64 dev eth1 proto kernel metric 256
default via fe80::1841:1 dev eth1 proto ra metric 1024 expires 1662sec
default via fe80::3825 dev eth1 proto ra metric 1024 expires 1776sec
I do think that bug # 892059 is a duplicate of this due to the extra NM static route. Neil, would you like to verify and mark as duplicate?
*** Bug 892059 has been marked as a duplicate of this bug. *** Confirmed, If I set the lifetimes of the routers to very low values, I see them age out, but the static route remains. Ok, Reassigning this over to NetworkManager. I'll work with the maintainer to Nail down a solution to this: Dan, In summary of this issue, Michael is running an IPv6 network on which he has a subnet with 2 routers, 1 sending RA's with a high priority flag, and the other with a low priority flag. He observes that, if the low priority router issues an RA first, clients will use that router as their default gateway, even after the high priority router advertises its prefix. The problem stems from an odd behavior of NetworkManager. It seems, even though the default for the kernel is to use the router advertisements to generate default gateways, NetworkManager still adds a static route based on the first router advertisement that it gets a notification for via netlink. I can understand why it might want to do this if dhcpv6 is being used, but its incorrect to do when SLAAC is being used. NetworkManager I think should either: 1) Not add default gateways at all when SLAAC is being used, as the kernel will do this automatically 2) Disable the kernels ability to generate default routes based on router adverts. You can do this by: echo 0 > /proc/sys/net/ipv6/conf/all/accept_ra_defrtr echo 0 > /proc/sys/net/ipv6/conf/all/accept_ra_rtr_pref I would think the first option would be the better approach, as pursuing the second would imply that NetworkManager would then have to understand and parse router preference bits in the RA frame to add default gateways properly, and the netlink interface doesn't currently export that information to the best of my recollection. One thing to check if anyone has access to an F18 box is whether this is still an issue in F18. We did some work after 0.9.6.4 on the static route thing. But the second issue with RA priority may not be fixed yet. I have the whole reproducer set up via qemu guests. I can easily install an F18 image and validate if the problem is gone or not. I'll let you know tomorrow afternoon confirmed, with an F18 guest, NM does not add a default static route to the kernel routing table, and the router preference bits are honored correctly. Not sure if you want to backport the NM changes to F17 or if you just want to cose this as NEXTRELEASE This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. 4 years later, i'm running into the same issue: static ipv6 route added by (presumably) NetworkManager, for a low-pref router.
these are the two advertisements:
16:45:43.942103 00:0d:60:ff:05:55 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 110: (flowlabel 0x4dd1b, hlim 255, next-header ICMPv6 (58) payload length: 56) fe80::20d:60ff:feff:555 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 56
hop limit 64, Flags [managed], pref high, router lifetime 1800s, reachable time 0s, retrans time 0s
prefix info option (3), length 32 (4): 2001:470:1f08:bb6::/64, Flags [onlink, auto], valid time 86400s, pref. time 14400s
0x0000: 40c0 0001 5180 0000 3840 0000 0000 2001
0x0010: 0470 1f08 0bb6 0000 0000 0000 0000
source link-address option (1), length 8 (1): 00:0d:60:ff:05:55
0x0000: 000d 60ff 0555
16:45:45.677722 60:e3:27:49:69:47 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 78: (hlim 255, next-header ICMPv6 (58) payload length: 24) fe80::bc87:32ff:fe14:7678 > ff02::1: [icmp6 sum ok] ICMP6, router advertisement, length 24
hop limit 64, Flags [managed], pref low, router lifetime 30s, reachable time 0s, retrans time 0s
source link-address option (1), length 8 (1): 60:e3:27:49:69:47
0x0000: 60e3 2749 6947
first is hi-pri, second is low-pri.
however, i end up with a default from the low-pri adv instead of hi-pri:
$ ip -6 route show
2001:470:1f08:bb6::/64 dev wlp3s0 proto kernel metric 256 expires 86032sec pref medium
fe80::/64 dev vmnet1 proto kernel metric 256 pref medium
fe80::/64 dev vmnet8 proto kernel metric 256 pref medium
fe80::/64 dev docker0 proto kernel metric 256 linkdown pref medium
fe80::/64 dev wlp3s0 proto kernel metric 256 pref medium
default via fe80::bc87:32ff:fe14:7678 dev wlp3s0 proto static metric 600 pref medium
if is block ipv6-icmp from the low-pri router, everything is fine:
$ sudo ip6tables -A INPUT -p ipv6-icmp -m mac --mac-source 60:e3:27:49:69:47 -j DROP
$ nmcli c down Cesanta
Connection 'Cesanta' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/2)
$ nmcli c up Cesanta
Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/4)
$ ip -6 route show
2001:470:1f08:bb6::/64 dev wlp3s0 proto ra metric 600 pref medium
fe80::/64 dev vmnet1 proto kernel metric 256 pref medium
fe80::/64 dev vmnet8 proto kernel metric 256 pref medium
fe80::/64 dev docker0 proto kernel metric 256 linkdown pref medium
fe80::/64 dev wlp3s0 proto kernel metric 256 pref medium
default via fe80::20d:60ff:feff:555 dev wlp3s0 proto static metric 600 pref medium
in mn's debug log i see this:
NetworkManager[25266]: <info> [1503330839.7675] policy: set 'Cesanta' (wlp3s0) as default for IPv6 routing and DNS
that's fine, but the router it picks is the wrong one.
|