Bug 2060684 - platform cache inconsistency with `ip route change` for IPv6 multipath routes
Summary: platform cache inconsistency with `ip route change` for IPv6 multipath routes
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: NetworkManager
Version: 8.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Ana Cabral
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-04 00:26 UTC by Thomas Haller
Modified: 2022-08-10 15:14 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-114437 0 None None None 2022-03-04 00:34:16 UTC
freedesktop.org Gitlab NetworkManager NetworkManager merge_requests 1210 0 None opened Draft: platform cache inconsistency with `ip route change` for IPv6 multipath routes 2022-05-05 10:01:56 UTC

Description Thomas Haller 2022-03-04 00:26:03 UTC
Try this:




>>>

ip netns del x
ip netns add x
ip -netns x link add v type veth peer w
ip -netns x link set v up
ip -netns x link set w up
ip -netns x addr add 1:2:3:4::100/64 dev v

ip netns exec x ip monitor route &

ip -netns x route append 5:1::1/128 nexthop via 1:2:3:4::1 dev v
ip -netns x route append 5:1::1/128 nexthop via 1:2:3:4::2 dev v
ip -netns x route append 5:1::1/128 nexthop via 1:2:3:4::3 dev v

ip -netns x route change 5:1::1/128 nexthop via 1:2:3:4::5 dev v

<<<


First we add 3 routes (which kernel all merges together into one multipath routes).

Then, `ip route change` will drop them all. The RTM_NEWROUTE message has a `NLM_F_REPLACE` flag, but kernel actually replaced the entire multipath route.

NetworkManage now splits such ECMP routes into multiple single hop routes. Our current handling of `NLM_F_REPLACE` will only remove the first route (in an ordered list). It thus will leak:

`./src/core/platform/tests/monitor -p` will show:

```
 <debug> [1646353215.3356] platform: (v) signal: route   6   added: type unicast 1:2:3:4::/64 dev 12 metric 256 mss 0 rt-src rt-kernel
 <debug> [1646353215.3357] platform: (v) signal: address 6   added: 1:2:3:4::100/64 lft forever pref forever lifetime 7-0[4294967295,4294967295] dev 12 flags tentative,permanent src kernel
 <debug> [1646353215.3602] platform: (v) signal: route   6   added: type unicast 5:1::1/128 via 1:2:3:4::1 dev 12 metric 1024 mss 0 rt-src rt-boot
 <debug> [1646353215.3681] platform: (v) signal: route   6   added: type unicast 5:1::1/128 via 1:2:3:4::2 dev 12 metric 1024 mss 0 rt-src rt-boot
 <debug> [1646353215.3749] platform: (v) signal: route   6   added: type unicast 5:1::1/128 via 1:2:3:4::3 dev 12 metric 1024 mss 0 rt-src rt-boot
 <debug> [1646353215.3866] platform: (v) signal: route   6   added: type unicast 5:1::1/128 via 1:2:3:4::5 dev 12 metric 1024 mss 0 rt-src rt-boot
 <debug> [1646353215.3867] platform: (v) signal: route   6 removed: type unicast 5:1::1/128 via 1:2:3:4::1 dev 12 metric 1024 mss 0 rt-src rt-boot
```



This is wrong.

Comment 1 Thomas Haller 2022-03-04 00:28:30 UTC
also, we need to take care that on-link hosts (without a gateway) are treated specially

# ip -netns x route append 5:1::1/128 nexthop dev v
Error: Device only routes can not be added for IPv6 using the multipath API.


they also don't get merged! So if the first route in the list of routes to be deleted is such a route, then we need to only delete that route (not all ECMP routes).

Comment 2 Thomas Haller 2022-03-04 00:40:09 UTC
try also:



```
ip netns del x
ip netns add x
ip -netns x link add v type veth peer w
ip -netns x link set v up
ip -netns x link set w up
ip -netns x addr add 1:2:3:4::100/64 dev v

ip netns exec x ip monitor route &

ip -netns x route append 5:1::1/128 nexthop via 1:2:3:4::1 dev v
ip -netns x route append 5:1::1/128 nexthop via 1:2:3:4::2 dev v
ip -netns x route append 5:1::1/128 nexthop via 1:2:3:4::3 dev v
ip -netns x route append 5:1::1/128 dev v

ip -netns x route change 5:1::1/128 nexthop via 1:2:3:4::5 dev v
```

and

```
ip netns del x
ip netns add x
ip -netns x link add v type veth peer w
ip -netns x link set v up
ip -netns x link set w up
ip -netns x addr add 1:2:3:4::100/64 dev v

ip netns exec x ip monitor route &

ip -netns x route append 5:1::1/128 nexthop via 1:2:3:4::1 dev v
ip -netns x route append 5:1::1/128 nexthop via 1:2:3:4::2 dev v
ip -netns x route append 5:1::1/128 nexthop via 1:2:3:4::3 dev v
ip -netns x route append 5:1::1/128 dev v

ip -netns x route change 5:1::1/128 nexthop via 1:2:3:4::5 dev v
```




See also:

```


ip netns del x
ip netns add x
ip -netns x link add v type veth peer w
ip -netns x link set v up
ip -netns x link set w up
ip -netns x addr add 1:2:3:4::100/64 dev v nodad

ip netns exec x ip monitor route &

sleep 3

ip -netns x route append 5:1::1/128 src 1:2:3:4::100 nexthop via 1:2:3:4::1 dev v
ip -netns x route append 5:1::1/128 src 1:2:3:4::100 nexthop via 1:2:3:4::2 dev v
ip -netns x route append 5:1::1/128 src 1:2:3:4::100 nexthop via 1:2:3:4::3 dev v

ip -netns x route append 5:1::1/128 nexthop via 1:2:3:4::11 dev v
ip -netns x route append 5:1::1/128 nexthop via 1:2:3:4::12 dev v
ip -netns x route append 5:1::1/128 nexthop via 1:2:3:4::13 dev v

ip -netns x route append 5:1::1/128 dev v

ip -netns x route change 5:1::1/128 nexthop via 1:2:3:4::5 dev v
```


*SIGH*

Comment 3 Till Maas 2022-03-04 07:02:26 UTC
Does this mean that https://bugzilla.redhat.com/show_bug.cgi?id=1837254 is actually not full fixed?

Comment 4 Thomas Haller 2022-03-04 07:11:57 UTC
(In reply to Till Maas from comment #3)
> Does this mean that https://bugzilla.redhat.com/show_bug.cgi?id=1837254 is
> actually not full fixed?

that depends on your definition of 1837254. But no, it's "for the most part" fixed.

Comment 5 Thomas Haller 2022-03-04 07:21:33 UTC
this issue is mostly about when using `ip route change` or `ip route replace` (or better: what that corresponds to on netlink).

NetworkManager doesn't use that (it uses the equivalent of `ip route append`), so by using NetworkManager alone on an interface, you cannot confuse the cache this  way.
(there are probably other, less understood ways to confuse the cache, if you try hard enough).

Also, bug 1837254 existed since forever, and it got now significantly harder to introduce a cache inconsistency. For that reason, the bug as far as 1837254 is concerned, is fixed.

Comment 6 David Jaša 2022-03-30 10:06:36 UTC
(In reply to Thomas Haller from comment #5)
> NetworkManager doesn't use that (it uses the equivalent of `ip route
> append`), so by using NetworkManager alone on an interface, you cannot
> confuse the cache this  way.

Just to confirm: the testing should be based on observing if NM reports routes correctly after externally run 'ip route change'?

Comment 8 Ana Cabral 2022-08-10 13:30:03 UTC
This was agreed to be in 8.8, in order to prioritize team commitments.

Comment 9 Ana Cabral 2022-08-10 14:44:23 UTC
We updated Jira but forgot to update here.


Note You need to log in before you can comment on or make changes to this bug.