Bug 2170513

Summary: Missing netlink event for removed route with src attribute
Product: Red Hat Enterprise Linux 9 Reporter: Thomas Haller <thaller>
Component: kernelAssignee: Hangbin Liu <haliu>
kernel sub component: Networking QA Contact: Jianlin Shi <jishi>
Status: CLOSED MIGRATED Docs Contact:
Severity: unspecified    
Priority: unspecified CC: jiji, kzhang, sukulkar
Version: 9.2Keywords: MigratedToJIRA
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-21 14:20:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Thomas Haller 2023-02-16 15:31:22 UTC
For the "src" attribute (RTA_SRC) of a route, kernel requires that such an address is actually configured. It doesn't need to be on the same interface though...

When the address gets removed, kernel will also remove the route with the corresponding "src" attribute. However, it fails to send RTM_DELROUTE notification:

>>>>
#!/bin/bash

set -ex

ip netns del x &>/dev/null || :
ip netns add x

ip -netns x link add net1 type dummy
ip -netns x link add net2 type dummy
ip -netns x link set net1 up
ip -netns x link set net2 up

ip -netns x addr add 192.168.5.5/24 dev net1
ip -netns x route append 7.7.7.0/24 dev net2 src 192.168.5.5

ip -netns x -4 addr
ip -netns x -4 route

ip -netns x monitor addr &
ip -netns x monitor route &

sleep 0.2

ip -netns x addr del 192.168.5.5/24 dev net1
ip -netns x -4 route

sleep 0.2

kill -- $(jobs -p)
wait
wait
<<<<


On rhel-9.2 (5.14.0-267.el9.x86_64) and Fedora 37 (6.1.8-200.fc37.x86_64) above script gives:

<<<<
+ ip netns del x
+ ip netns add x
+ ip -netns x link add net1 type dummy
+ ip -netns x link add net2 type dummy
+ ip -netns x link set net1 up
+ ip -netns x link set net2 up
+ ip -netns x addr add 192.168.5.5/24 dev net1
+ ip -netns x route append 7.7.7.0/24 dev net2 src 192.168.5.5
+ ip -netns x -4 addr
2: net1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 192.168.5.5/24 scope global net1
       valid_lft forever preferred_lft forever
+ ip -netns x -4 route
7.7.7.0/24 dev net2 scope link src 192.168.5.5 
192.168.5.0/24 dev net1 proto kernel scope link src 192.168.5.5 
+ ip -netns x monitor addr
+ sleep 0.2
+ ip -netns x monitor route
+ ip -netns x addr del 192.168.5.5/24 dev net1
Deleted 2: net1    inet 192.168.5.5/24 scope global net1
Deleted 192.168.5.0/24 dev net1 proto kernel scope link src 192.168.5.5 
       valid_lft forever preferred_lft forever
Deleted broadcast 192.168.5.255 dev net1 table local proto kernel scope link src 192.168.5.5 
Deleted local 192.168.5.5 dev net1 table local proto kernel scope host src 192.168.5.5 
+ ip -netns x -4 route
+ sleep 0.2
++ jobs -p
+ kill -- 18111 18112
+ wait
+ wait
>>>


Note that there is no notification about the route on net2 being removed, but it's not longer there.



Sending accurate events is important to NetworkManager, because it keeps a cache of the routes. Missing/Wrong events mean that the cache becomes inconsistent.

Comment 1 Thomas Haller 2023-02-17 08:22:43 UTC
for IPv6, the situation is slightly different.

Note that also for IPv6, the "src" address must be configured on an (any) interface. Otherwise, kernel will reject adding the route.




A note about this "validation" that kernel attempts here.
---------------------------------------------------------

This behavior pushes increadibly amount of complexity to applications, as kernel tries to reject "invalid" configuration. But in a case like this, a route (with a nonexisting "src") is deemed invalid, based on whether another entity (the address) exists. That means, it suddenly becomes important that route and addresses are added in a certain order and be present together. It would be so much better if kernel would allow configuring what the user asks, and possibly sets a flag that the route is temporarily unusable based on missing dependencies.

For example kernel will reject adding an IPv6 route with a "src", if the "src" address is still tentative (DAD) (bug 1457196). This means, you cannot just write a naive script:

  ip -netns x addr add 1:2:3:4::5/64 dev net1
  ip -netns x route append 7:7:7:0::1 dev net1 src 1:2:3:4::5

because  1:2:3:4::5/64 might still be tentative (below example avoids that problem by using "dummy" devices, and the address is immediately ready. You can reproduce that with the below script by replacing "dummy" with "veth"). Can you imagine the complexity that such constraints push to a user application? It has to recognize the non-obvious interdependencies between routes and address, recognize that it cannot add the route just yet, and retry at a suitable time later. This gets all the more complex, because the address in question might be on any interface (and because in NetworkManager you can activate/deactivate interfaces at any time).

---



OK, here an IPv6 reproducer:

>>>
#!/bin/bash

set -ex

ip netns del x &>/dev/null || :
ip netns add x

ip -netns x link add net1 type dummy
ip -netns x link add net2 type dummy
ip -netns x link set net1 up
ip -netns x link set net2 up

ip -netns x addr add 1:2:3:4::5/64 dev net1
ip -netns x route append 7:7:7:0::1 dev net1 src 1:2:3:4::5
ip -netns x route append 7:7:7:0::2 dev net2 src 1:2:3:4::5

ip -netns x -6 addr
ip -netns x -6 route

ip -netns x monitor addr &
ip -netns x monitor route &

sleep 0.2

ip -netns x addr del 1:2:3:4::5/64 dev net1

sleep 0.2

ip -netns x -6 route

kill -- $(jobs -p)
wait
wait
<<<


Output (with Fedora 37 kernel 6.1.8-200.fc37.x86_64):

>>>
+ ip netns del x
+ ip netns add x
+ ip -netns x link add net1 type dummy
+ ip -netns x link add net2 type dummy
+ ip -netns x link set net1 up
+ ip -netns x link set net2 up
+ ip -netns x addr add 1:2:3:4::5/64 dev net1
+ ip -netns x route append 7:7:7:0::1 dev net1 src 1:2:3:4::5
+ ip -netns x route append 7:7:7:0::2 dev net2 src 1:2:3:4::5
+ ip -netns x -6 addr
11: net1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 state UNKNOWN qlen 1000
    inet6 1:2:3:4::5/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::58b1:50ff:fe35:b101/64 scope link
       valid_lft forever preferred_lft forever
12: net2: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 state UNKNOWN qlen 1000
    inet6 fe80::1477:44ff:fe3f:8a54/64 scope link
       valid_lft forever preferred_lft forever
+ ip -netns x -6 route
1:2:3:4::/64 dev net1 proto kernel metric 256 pref medium
7:7:7::1 dev net1 src 1:2:3:4::5 metric 1024 pref medium
7:7:7::2 dev net2 src 1:2:3:4::5 metric 1024 pref medium
fe80::/64 dev net1 proto kernel metric 256 pref medium
fe80::/64 dev net2 proto kernel metric 256 pref medium
+ ip -netns x monitor addr
+ sleep 0.2
+ ip -netns x monitor route
+ ip -netns x addr del 1:2:3:4::5/64 dev net1
Deleted 11: net1    inet6 1:2:3:4::5/64 scope global
       valid_lft forever preferred_lft forever
Deleted local 1:2:3:4::5 dev net1 table local proto kernel metric 0 pref medium
Deleted 1:2:3:4::/64 dev net1 proto kernel metric 256 pref medium
+ sleep 0.2
+ ip -netns x -6 route
7:7:7::1 dev net1 metric 1024 pref medium
7:7:7::2 dev net2 src 1:2:3:4::5 metric 1024 pref medium
fe80::/64 dev net1 proto kernel metric 256 pref medium
fe80::/64 dev net2 proto kernel metric 256 pref medium
++ jobs -p
+ kill -- 3548896 3548897
+ wait
+ wait
<<<


Note:

- for the route "7:7:7::1 dev net2", the removal of the address did nothing. That seems inconsistent, because kernel didn't allow to add the route unless the "src" address exists, but then you can remove the address and kernel leaves the route there. So either kernel should not perform such cumbersome consistency checks at all (see above), or it should go all in (like for IPv4) and also remove the route.

- for the route "7:7:7::1 dev net1" the route was modified (the "src" attribute was cleared), but no netlink event was sent. Netlink events about changes are important and must be sent. But again, such an automatism (of automatically mangling user configuration when something changes) are very cumbersome to handle for an application. It's not clear to me how NetworkManager can meaningfully handle this.

Comment 2 Hangbin Liu 2023-07-07 08:39:39 UTC
(In reply to Thomas Haller from comment #0)
> For the "src" attribute (RTA_SRC) of a route, kernel requires that such an
> address is actually configured. It doesn't need to be on the same interface
> though...
> 
> When the address gets removed, kernel will also remove the route with the
> corresponding "src" attribute. However, it fails to send RTM_DELROUTE
> notification:

After deleting an interface address in fib_del_ifaddr(), the function 
scans the fib_info list for stray nexthop entries and calls fib_flush.
Then the stray entries will be deleted silently and no RTM_DELROUTE
notification will be sent.

- fib_del_ifaddr()
  - fib_sync_down_addr
    - fib_flush
      - fib_table_flush

Comment 3 RHEL Program Management 2023-09-21 12:51:28 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 4 RHEL Program Management 2023-09-21 14:20:00 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.