+++ This bug was initially created as a clone of Bug #1978796 +++
Description of problem:
When having an ECMP route that would result in a non-local next hop, northd complains that it is invalid and refuses to program some of the flows, however the path is still added to the openflow group. Consider this example:
[root@master-1 ~]# ovn-nbctl lr-route-list GR_worker-148
192.168.8.129 10.75.69.166 src-ip ecmp ecmp-symmetric-reply
192.168.8.129 198.19.3.7 src-ip ecmp ecmp-symmetric-reply
The GR in this case is on the 198.19.3.x network, making 10.75.69.166 as a next hop invalid. Northd warns about this:
2021-07-02T14:40:13Z|104189|ovn_northd|WARN|No path for static route 192.168.8.129; next hop 10.75.69.166
However, the OF group still has 2 paths:
[root@worker-148 ~]# ovs-ofctl dump-groups br-int 140
NXST_GROUP_DESC reply (xid=0x2):
If bucket 0 is chosen, the ECMP route will go through the 198.x next hop and work. If bucket 1 is chosen, the packet is dropped because there is no matching flow in table 19:
-> using bucket 1
19. No match.
In the lflows we can see that there are 2 paths:
table=10(lr_in_ip_routing ), priority=64 , match=(ip4.src == 192.168.8.129/32), action=(ip.ttl--; flags.loopback = 1; reg8[0..15] = 3; reg8[16..31] = select(1, 2);)
However in table 11 there is only one path matching:
table=11(lr_in_ip_routing_ecmp), priority=100 , match=(reg8[0..15] == 3 && reg8[16..31] == 1), action=(reg0 = 198.19.3.7; reg1 = 198.19.2.61; eth.src = 98:03:9b:8f:15:ac; outport = "rtoe-GR_worker-148"; next;)
Version-Release number of selected component (if applicable):
--- Additional comment from lorenzo bianconi on 2021-08-02 06:00:10 CDT ---
upstream fix: http://firstname.lastname@example.org/
Merged August 11th
Fix landed in ovn21.09-21.09.0-12 in mid-August. It should be in 4.9.0-fc.0 from 2021-08-20
Tim, any thoughts on Ross' question in comment 2?
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.