This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2165720 - kernel wrongly configures the same route twice with `ip route append`
Summary: kernel wrongly configures the same route twice with `ip route append`
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: kernel
Version: 9.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Hangbin Liu
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-01-30 20:49 UTC by Thomas Haller
Modified: 2023-09-21 12:56 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-21 12:56:49 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   RHEL-6043 0 None Migrated None 2023-09-21 12:56:44 UTC
Red Hat Issue Tracker RHELPLAN-146950 0 None None None 2023-01-30 20:51:00 UTC

Description Thomas Haller 2023-01-30 20:49:08 UTC
the following reproduces both on rhel-9.2 (5.14.0-244.el9.x86_64) and Fedora 37 (6.1.7-200.fc37.x86_64):


Script:
```
#!/bin/bash

set -ex

ip netns del x &>/dev/null || :
ip netns add x

_ip() {
    ip -netns x "$@" || :
}

_ip link add net1 type dummy
_ip link set net1 up

_ip route append 7.7.7.0/24 dev net1
_ip -4 route append  local default dev net1 table 10223
_ip -4 route append  192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1
_ip -4 route prepend default dev net1 proto kernel table 10223

_ip -d -4 route show table all

_ip -4 route append  192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1

echo ">>>>>>"
_ip -d -4 route show table all
```



Output:
```
+ ip netns del x
+ ip netns add x
+ _ip link add net1 type dummy
+ ip -netns x link add net1 type dummy
+ _ip link set net1 up
+ ip -netns x link set net1 up
+ _ip route append 7.7.7.0/24 dev net1
+ ip -netns x route append 7.7.7.0/24 dev net1
+ _ip -4 route append local default dev net1 table 10223
+ ip -netns x -4 route append local default dev net1 table 10223
+ _ip -4 route append 192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1
+ ip -netns x -4 route append 192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1
+ _ip -4 route prepend default dev net1 proto kernel table 10223
+ ip -netns x -4 route prepend default dev net1 proto kernel table 10223
+ _ip -d -4 route show table all
+ ip -netns x -d -4 route show table all
unicast default dev net1 table 10223 proto kernel scope link 
local default dev net1 table 10223 proto boot scope host 
unicast 192.168.4.0/24 via 7.7.7.1 dev net1 table 10223 proto boot scope global 
unicast 7.7.7.0/24 dev net1 table main proto boot scope link 
+ _ip -4 route append 192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1
+ ip -netns x -4 route append 192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1
+ echo '>>>>>>'
>>>>>>
+ _ip -d -4 route show table all
+ ip -netns x -d -4 route show table all
unicast default dev net1 table 10223 proto kernel scope link 
local default dev net1 table 10223 proto boot scope host 
unicast 192.168.4.0/24 via 7.7.7.1 dev net1 table 10223 proto boot scope global 
unicast 192.168.4.0/24 via 7.7.7.1 dev net1 table 10223 proto boot scope global 
unicast 7.7.7.0/24 dev net1 table main proto boot scope link 
```


Note that after the script runs, the route

  unicast 192.168.4.0/24 via 7.7.7.1 dev net1 table 10223 proto boot scope global

is twice in the output.


This causes a problem, because NetworkManager wants to put routes in a dictionary/cache, and it cannot cope with having the same route twice. Meaning, in the cache this route can only be once, so we delete this route once with

  ip -netns x route delete unicast 192.168.4.0/24 via 7.7.7.1 dev net1 table 10223 proto boot scope global 

NetworkManager will receive one RTM_DELROUTE event and remove the route from the cache (although, there this route still exists -- once).

Comment 1 Hangbin Liu 2023-07-06 09:01:12 UTC
(In reply to Thomas Haller from comment #0)
> _ip route append 7.7.7.0/24 dev net1
> _ip -4 route append  local default dev net1 table 10223
> _ip -4 route append  192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1

This one added a route with struct fib_nh fi->fib_nh_scope 254 (RT_SCOPE_HOST/local address)

> _ip -4 route prepend default dev net1 proto kernel table 10223
> 
> _ip -d -4 route show table all
> 
> _ip -4 route append  192.168.4.0/24 table 10223 nexthop via 7.7.7.1 dev net1

This one added a route with struct fib_nh fi->fib_nh_scope 253 (RT_SCOPE_LINK/direct route)

The fib_nh_scope was updated in

- fib_create_info
  - fib_check_nh -> the cfg->fc_scope is 0, which is RT_SCOPE_UNIVERSE/global
    - fib_check_nh_v4_gw
      - fib_table_lookup

As there is a local default route, we got scope 254 for the first nexthop route adding.
And then we prepend a direct default route, we got scope 253 the second time.
The scope info is stored in struct fib_nh nh->fib_nh_scope

But when dump the route in fib_dump_info(). The rtm->rtm_scope info is got from struct fib_info fi->fib_scope, which is 0 (RT_SCOPE_UNIVERSE/global).
So these 2 routes look the same.

Comment 2 Hangbin Liu 2023-08-02 03:59:43 UTC
This issue is similar to bug 2162315. The nexthop's weight and scope are meaningless for a non-multipath route. So the kernel is not export this to userspace. Upstream seems no intend to fix/export them. They suggest using the new nexthop api or trying what libnl does to resolve the route cache issue.

For the new nexthop api, it uses unique nhid and does not has this scope issue as it only exists with legacy nexhop api:

# ip nexthop add id 1 via 172.16.104.100 dev dummy1
# ip route add local default dev dummy1 table 200
# ip route add 172.16.107.0/24 table 200 nhid 1
# ip route prepend default dev dummy1 table 200
# ip route append 172.16.107.0/24 table 200 nhid 1     <- this will fail as same route has exists
RTNETLINK answers: File exists

# ip route show table 200
default dev dummy1 scope link
local default dev dummy1 scope host
172.16.107.0/24 nhid 1 via 172.16.104.100 dev dummy1

# ip nexthop add id 2 via 172.16.104.100 dev dummy1     <- if you still want to add same nexthop, you need use diff nhid
# ip route append 172.16.107.0/24 table 200 nhid 2
# ip route show table 200
default dev dummy1 scope link
local default dev dummy1 scope host
172.16.107.0/24 nhid 1 via 172.16.104.100 dev dummy1
172.16.107.0/24 nhid 2 via 172.16.104.100 dev dummy1

Comment 3 Thomas Haller 2023-08-02 07:21:55 UTC
(In reply to Hangbin Liu from comment #2)
> The nexthop's weight and scope are meaningless for a non-multipath route.

The reproducer from comment 0 uses no weight/scope. 

Two identical `ip route append` commands result in two routes, that look identical to user space. That kernel internally gets confused about the scope, that's the bug.

> So the kernel is not export this to userspace.

Even if kernel would export the difference to user space, it seems the effect is already wrong (making the same `ip route append` command resulting in different routes, depending on the existence of another route).


> They suggest using
> the new nexthop api or trying what libnl does to resolve the route cache
> issue.

What exactly does libnl do there?

> For the new nexthop api

Seems unrelated. The problem is not that NetworkManager may do this. The problem is that any user can type up the commands from comment 0 in their terminal, and cause a wrong view of the world in NetworkManager. That's regardless, whether NetworkManager would use nexthop objects or just ever avoiding to get into such a situation on its own.

Comment 4 RHEL Program Management 2023-09-21 12:56:31 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 5 RHEL Program Management 2023-09-21 12:56:49 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.


Note You need to log in before you can comment on or make changes to this bug.