Bug 1991793 - ECMP routes with invalid next hops still result in OF groups getting programmed
Summary: ECMP routes with invalid next hops still result in OF groups getting programmed
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.9.0
Assignee: Tim Rozet
QA Contact: Ross Brattain
Depends On:
TreeView+ depends on / blocked
Reported: 2021-08-10 03:43 UTC by Dan Williams
Modified: 2021-10-18 17:45 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1978796
Last Closed: 2021-10-18 17:45:29 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 652 0 None None None 2021-08-10 03:45:50 UTC
Red Hat Bugzilla 1978796 1 high CLOSED ECMP routes with invalid next hops still result in OF groups getting programmed 2022-12-15 00:30:35 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:45:41 UTC

Description Dan Williams 2021-08-10 03:43:57 UTC
+++ This bug was initially created as a clone of Bug #1978796 +++

Description of problem:
When having an ECMP route that would result in a non-local next hop, northd complains that it is invalid and refuses to program some of the flows, however the path is still added to the openflow group. Consider this example:

[root@master-1 ~]# ovn-nbctl lr-route-list GR_worker-148
IPv4 Routes
       src-ip ecmp ecmp-symmetric-reply
         src-ip ecmp ecmp-symmetric-reply

The GR in this case is on the 198.19.3.x network, making as a next hop invalid. Northd warns about this:

2021-07-02T14:40:13Z|104189|ovn_northd|WARN|No path for static route; next hop

However, the OF group still has 2 paths:

[root@worker-148 ~]# ovs-ofctl dump-groups br-int 140
NXST_GROUP_DESC reply (xid=0x2):

If bucket 0 is chosen, the ECMP route will go through the 198.x next hop and work. If bucket 1 is chosen, the packet is dropped because there is no matching flow in table 19:

     -> using bucket 1
    bucket 1
        19. No match.

In the lflows we can see that there are 2 paths:
table=10(lr_in_ip_routing   ), priority=64   , match=(ip4.src ==, action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 3; reg8[16..31] = select(1, 2);)

However in table 11 there is only one path matching:
 table=11(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 3 && reg8[16..31] == 1), action=(reg0 =; reg1 =; eth.src = 98:03:9b:8f:15:ac; outport = "rtoe-GR_worker-148"; next;)

Version-Release number of selected component (if applicable):

--- Additional comment from lorenzo bianconi on 2021-08-02 06:00:10 CDT ---

upstream fix: http://patchwork.ozlabs.org/project/ovn/patch/3b2efab5d394b629a6e922038ab075a78aca2d39.1627901201.git.lorenzo.bianconi@redhat.com/

Comment 1 Dan Williams 2021-08-17 13:39:02 UTC
Merged August 11th

Comment 4 Dan Williams 2021-09-20 15:48:06 UTC
Fix landed in ovn21.09-21.09.0-12 in mid-August. It should be in 4.9.0-fc.0 from 2021-08-20

Comment 5 Dan Williams 2021-09-20 15:51:00 UTC
Tim, any thoughts on Ross' question in comment 2?

Comment 9 errata-xmlrpc 2021-10-18 17:45:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.