Bug 2165720 - kernel wrongly configures the same route twice with `ip route append`
Summary: kernel wrongly configures the same route twice with `ip route append`
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Hangbin Liu
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-30 20:49 UTC by Thomas Haller
Modified: 2023-08-02 07:21 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-146950 0 None None None 2023-01-30 20:51:00 UTC

Description Thomas Haller 2023-01-30 20:49:08 UTC
the following reproduces both on rhel-9.2 (5.14.0-244.el9.x86_64) and Fedora 37 (6.1.7-200.fc37.x86_64):


Script:
```
#!/bin/bash

set -ex

ip netns del x &>/dev/null || :
ip netns add x

_ip() {
    ip -netns x "$@" || :
}

_ip link add net1 type dummy
_ip link set net1 up

_ip route append 7.7.7.0/24 dev net1
_ip -4 route append  local default dev net1 table 10223
_ip -4 route append  192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1
_ip -4 route prepend default dev net1 proto kernel table 10223

_ip -d -4 route show table all

_ip -4 route append  192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1

echo ">>>>>>"
_ip -d -4 route show table all
```



Output:
```
+ ip netns del x
+ ip netns add x
+ _ip link add net1 type dummy
+ ip -netns x link add net1 type dummy
+ _ip link set net1 up
+ ip -netns x link set net1 up
+ _ip route append 7.7.7.0/24 dev net1
+ ip -netns x route append 7.7.7.0/24 dev net1
+ _ip -4 route append local default dev net1 table 10223
+ ip -netns x -4 route append local default dev net1 table 10223
+ _ip -4 route append 192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1
+ ip -netns x -4 route append 192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1
+ _ip -4 route prepend default dev net1 proto kernel table 10223
+ ip -netns x -4 route prepend default dev net1 proto kernel table 10223
+ _ip -d -4 route show table all
+ ip -netns x -d -4 route show table all
unicast default dev net1 table 10223 proto kernel scope link 
local default dev net1 table 10223 proto boot scope host 
unicast 192.168.4.0/24 via 7.7.7.1 dev net1 table 10223 proto boot scope global 
unicast 7.7.7.0/24 dev net1 table main proto boot scope link 
+ _ip -4 route append 192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1
+ ip -netns x -4 route append 192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1
+ echo '>>>>>>'
>>>>>>
+ _ip -d -4 route show table all
+ ip -netns x -d -4 route show table all
unicast default dev net1 table 10223 proto kernel scope link 
local default dev net1 table 10223 proto boot scope host 
unicast 192.168.4.0/24 via 7.7.7.1 dev net1 table 10223 proto boot scope global 
unicast 192.168.4.0/24 via 7.7.7.1 dev net1 table 10223 proto boot scope global 
unicast 7.7.7.0/24 dev net1 table main proto boot scope link 
```


Note that after the script runs, the route

  unicast 192.168.4.0/24 via 7.7.7.1 dev net1 table 10223 proto boot scope global

is twice in the output.


This causes a problem, because NetworkManager wants to put routes in a dictionary/cache, and it cannot cope with having the same route twice. Meaning, in the cache this route can only be once, so we delete this route once with

  ip -netns x route delete unicast 192.168.4.0/24 via 7.7.7.1 dev net1 table 10223 proto boot scope global 

NetworkManager will receive one RTM_DELROUTE event and remove the route from the cache (although, there this route still exists -- once).

Comment 1 Hangbin Liu 2023-07-06 09:01:12 UTC
(In reply to Thomas Haller from comment #0)
> _ip route append 7.7.7.0/24 dev net1
> _ip -4 route append  local default dev net1 table 10223
> _ip -4 route append  192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1

This one added a route with struct fib_nh fi->fib_nh_scope 254 (RT_SCOPE_HOST/local address)

> _ip -4 route prepend default dev net1 proto kernel table 10223
> 
> _ip -d -4 route show table all
> 
> _ip -4 route append  192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1

This one added a route with struct fib_nh fi->fib_nh_scope 253 (RT_SCOPE_LINK/direct route)

The fib_nh_scope was updated in

- fib_create_info
  - fib_check_nh -> the cfg->fc_scope is 0, which is RT_SCOPE_UNIVERSE/global
    - fib_check_nh_v4_gw
      - fib_table_lookup

As there is a local default route, we got scope 254 for the first nexthop route adding.
And then we prepend a direct default route, we got scope 253 the second time.
The scope info is stored in struct fib_nh nh->fib_nh_scope

But when dump the route in fib_dump_info(). The rtm->rtm_scope info is got from struct fib_info fi->fib_scope, which is 0 (RT_SCOPE_UNIVERSE/global).
So these 2 routes look the same.

Comment 2 Hangbin Liu 2023-08-02 03:59:43 UTC
This issue is similar to bug 2162315. The nexthop's weight and scope are meaningless for a non-multipath route. So the kernel is not export this to userspace. Upstream seems no intend to fix/export them. They suggest using the new nexthop api or trying what libnl does to resolve the route cache issue.

For the new nexthop api, it uses unique nhid and does not has this scope issue as it only exists with legacy nexhop api:

# ip nexthop add id 1 via 172.16.104.100 dev dummy1
# ip route add local default dev dummy1 table 200
# ip route add 172.16.107.0/24 table 200 nhid 1
# ip route prepend default dev dummy1 table 200
# ip route append 172.16.107.0/24 table 200 nhid 1     <- this will fail as same route has exists
RTNETLINK answers: File exists

# ip route show table 200
default dev dummy1 scope link
local default dev dummy1 scope host
172.16.107.0/24 nhid 1 via 172.16.104.100 dev dummy1

# ip nexthop add id 2 via 172.16.104.100 dev dummy1     <- if you still want to add same nexthop, you need use diff nhid
# ip route append 172.16.107.0/24 table 200 nhid 2
# ip route show table 200
default dev dummy1 scope link
local default dev dummy1 scope host
172.16.107.0/24 nhid 1 via 172.16.104.100 dev dummy1
172.16.107.0/24 nhid 2 via 172.16.104.100 dev dummy1

Comment 3 Thomas Haller 2023-08-02 07:21:55 UTC
(In reply to Hangbin Liu from comment #2)
> The nexthop's weight and scope are meaningless for a non-multipath route.

The reproducer from comment 0 uses no weight/scope. 

Two identical `ip route append` commands result in two routes, that look identical to user space. That kernel internally gets confused about the scope, that's the bug.

> So the kernel is not export this to userspace.

Even if kernel would export the difference to user space, it seems the effect is already wrong (making the same `ip route append` command resulting in different routes, depending on the existence of another route).


> They suggest using
> the new nexthop api or trying what libnl does to resolve the route cache
> issue.

What exactly does libnl do there?

> For the new nexthop api

Seems unrelated. The problem is not that NetworkManager may do this. The problem is that any user can type up the commands from comment 0 in their terminal, and cause a wrong view of the world in NetworkManager. That's regardless, whether NetworkManager would use nexthop objects or just ever avoiding to get into such a situation on its own.


Note You need to log in before you can comment on or make changes to this bug.