Bug 1991793 - ECMP routes with invalid next hops still result in OF groups getting programmed
Summary: ECMP routes with invalid next hops still result in OF groups getting programmed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.9.0
Assignee: Tim Rozet
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-10 03:43 UTC by Dan Williams
Modified: 2021-10-18 17:45 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1978796
Environment:
Last Closed: 2021-10-18 17:45:29 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 652 0 None None None 2021-08-10 03:45:50 UTC
Red Hat Bugzilla 1978796 1 high VERIFIED ECMP routes with invalid next hops still result in OF groups getting programmed 2021-08-20 04:02:09 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:45:41 UTC

Description Dan Williams 2021-08-10 03:43:57 UTC
+++ This bug was initially created as a clone of Bug #1978796 +++

Description of problem:
When having an ECMP route that would result in a non-local next hop, northd complains that it is invalid and refuses to program some of the flows, however the path is still added to the openflow group. Consider this example:

[root@master-1 ~]# ovn-nbctl lr-route-list GR_worker-148
IPv4 Routes
         
            192.168.8.129              10.75.69.166 src-ip ecmp ecmp-symmetric-reply
            192.168.8.129                198.19.3.7 src-ip ecmp ecmp-symmetric-reply


The GR in this case is on the 198.19.3.x network, making 10.75.69.166 as a next hop invalid. Northd warns about this:

2021-07-02T14:40:13Z|104189|ovn_northd|WARN|No path for static route 192.168.8.129; next hop 10.75.69.166

However, the OF group still has 2 paths:

[root@worker-148 ~]# ovs-ofctl dump-groups br-int 140
NXST_GROUP_DESC reply (xid=0x2):
 group_id=140,type=select,selection_method=dp_hash,bucket=bucket_id:0,weight:100,actions=load:0x1->OXM_OF_PKT_REG4[48..63],resubmit(,19),bucket=bucket_id:1,weight:100,actions=load:0x2->OXM_OF_PKT_REG4[48..63],resubmit(,19)

If bucket 0 is chosen, the ECMP route will go through the 198.x next hop and work. If bucket 1 is chosen, the packet is dropped because there is no matching flow in table 19:

    group:140
     -> using bucket 1
    bucket 1
            set_field:0x2000000000000/0xffff000000000000->xreg4
            resubmit(,19)
        19. No match.
            drop

In the lflows we can see that there are 2 paths:
table=10(lr_in_ip_routing   ), priority=64   , match=(ip4.src == 192.168.8.129/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 3; reg8[16..31] = select(1, 2);)

However in table 11 there is only one path matching:
 table=11(lr_in_ip_routing_ecmp), priority=100  , match=(reg8[0..15] == 3 && reg8[16..31] == 1), action=(reg0 = 198.19.3.7; reg1 = 198.19.2.61; eth.src = 98:03:9b:8f:15:ac; outport = "rtoe-GR_worker-148"; next;)



Version-Release number of selected component (if applicable):
ovn2.13-20.12.0-24.el8fdp.x86_64

--- Additional comment from lorenzo bianconi on 2021-08-02 06:00:10 CDT ---

upstream fix: http://patchwork.ozlabs.org/project/ovn/patch/3b2efab5d394b629a6e922038ab075a78aca2d39.1627901201.git.lorenzo.bianconi@redhat.com/

Comment 1 Dan Williams 2021-08-17 13:39:02 UTC
Merged August 11th

Comment 4 Dan Williams 2021-09-20 15:48:06 UTC
Fix landed in ovn21.09-21.09.0-12 in mid-August. It should be in 4.9.0-fc.0 from 2021-08-20

Comment 5 Dan Williams 2021-09-20 15:51:00 UTC
Tim, any thoughts on Ross' question in comment 2?

Comment 9 errata-xmlrpc 2021-10-18 17:45:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.