The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1891516 - OVN incorrectly prefers static routes over directly connected routes
Summary: OVN incorrectly prefers static routes over directly connected routes
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: RHEL 8.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: lorenzo bianconi
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-26 14:02 UTC by Tim Rozet
Modified: 2021-02-22 20:23 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-22 20:23:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Tim Rozet 2020-10-26 14:02:34 UTC
Description of problem:
Consider the following topology:

    100.64.0.1                                       10.244.0.6
ext--GR1 ------------OVN cluster router (DR)-----------pod1 
                     |
ext--GR2 -------------
    100.64.0.2

pod1 and GR 1 are on the same node. GR1 and GR2 both SNAT ingress traffic to them.

OVN DR has static routes for:

10.244.0.0/24 via 100.64.0.1 src-ip

Now if a packet comes from GR1 from external network (ext) it will be SNAT to 100.64.0.1 and send to pod1. pod 1 sends a reply packet and it goes back to 100.64.0.1 correctly.  However, if a packet ingresses at GR2, it gets SNAT to 100.64.0.2 and the packet is sent to pod 1. pod1 then replies and the packet goes to the DR. At this point OVN uses the static route and incorrectly sends the packet to GR1 (asymmetrical routing).

The correct behavior should have been to send the packet to GR2. This is because the destination is 100.64.0.2, and a router should always have a lower metric for directly connected routes. A workaround to this problem is to add a dummy static route with higher prefix match like:

100.64.0.1/32 via 100.64.0.1 dst-ip
100.64.0.2/32 via 100.64.0.2 dst-ip

^This is kind of silly, but it works.

Comment 1 Tim Rozet 2021-02-10 17:32:13 UTC
Here are the relevant lflows:
  table=10(lr_in_ip_routing   ), priority=65   , match=(ip4.dst == 100.64.0.2/32), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 100.64.0.2; reg1 = 100.64.0.1; eth.src = 0a:58:64:40:00:01; outport = "rtoj-ovn_cluster_router"; flags.loopback = 1; next;)
  table=10(lr_in_ip_routing   ), priority=65   , match=(ip4.dst == 100.64.0.3/32), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 100.64.0.3; reg1 = 100.64.0.1; eth.src = 0a:58:64:40:00:01; outport = "rtoj-ovn_cluster_router"; flags.loopback = 1; next;)
  table=10(lr_in_ip_routing   ), priority=65   , match=(ip4.dst == 100.64.0.4/32), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 100.64.0.4; reg1 = 100.64.0.1; eth.src = 0a:58:64:40:00:01; outport = "rtoj-ovn_cluster_router"; flags.loopback = 1; next;)
  table=10(lr_in_ip_routing   ), priority=49   , match=(ip4.dst == 10.244.0.0/24), action=(ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; reg1 = 10.244.0.1; eth.src = 0a:58:0a:f4:00:01; outport = "rtos-ovn-worker"; flags.loopback = 1; next;)
  table=10(lr_in_ip_routing   ), priority=49   , match=(ip4.dst == 10.244.1.0/24), action=(ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; reg1 = 10.244.1.1; eth.src = 0a:58:0a:f4:01:01; outport = "rtos-ovn-control-plane"; flags.loopback = 1; next;)
  table=10(lr_in_ip_routing   ), priority=49   , match=(ip4.dst == 10.244.2.0/24), action=(ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; reg1 = 10.244.2.1; eth.src = 0a:58:0a:f4:02:01; outport = "rtos-ovn-worker2"; flags.loopback = 1; next;)
  table=10(lr_in_ip_routing   ), priority=48   , match=(ip4.src == 10.244.0.0/24), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 100.64.0.4; reg1 = 100.64.0.1; eth.src = 0a:58:64:40:00:01; outport = "rtoj-ovn_cluster_router"; flags.loopback = 1; next;)
  table=10(lr_in_ip_routing   ), priority=48   , match=(ip4.src == 10.244.1.0/24), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 100.64.0.2; reg1 = 100.64.0.1; eth.src = 0a:58:64:40:00:01; outport = "rtoj-ovn_cluster_router"; flags.loopback = 1; next;)
  table=10(lr_in_ip_routing   ), priority=48   , match=(ip4.src == 10.244.2.0/24), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 100.64.0.3; reg1 = 100.64.0.1; eth.src = 0a:58:64:40:00:01; outport = "rtoj-ovn_cluster_router"; flags.loopback = 1; next;)
  table=10(lr_in_ip_routing   ), priority=41   , match=(ip4.dst == 169.254.0.0/20), action=(ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; reg1 = 169.254.0.2; eth.src = 0a:58:a9:fe:00:02; outport = "rtos-node_local_switch"; flags.loopback = 1; next;)
  table=10(lr_in_ip_routing   ), priority=33   , match=(ip4.dst == 100.64.0.0/16), action=(ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; reg1 = 100.64.0.1; eth.src = 0a:58:64:40:00:01; outport = "rtoj-ovn_cluster_router"; flags.loopback = 1; next;)

The priority 65 flows here are the outcome of our workaround routes:
  table=10(lr_in_ip_routing   ), priority=65   , match=(ip4.dst == 100.64.0.2/32), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 100.64.0.2; reg1 = 100.64.0.1; eth.src = 0a:58:64:40:00:01; outport = "rtoj-ovn_cluster_router"; flags.loopback = 1; next;)
  table=10(lr_in_ip_routing   ), priority=65   , match=(ip4.dst == 100.64.0.3/32), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 100.64.0.3; reg1 = 100.64.0.1; eth.src = 0a:58:64:40:00:01; outport = "rtoj-ovn_cluster_router"; flags.loopback = 1; next;)
  table=10(lr_in_ip_routing   ), priority=65   , match=(ip4.dst == 100.64.0.4/32), action=(ip.ttl--; reg8[0..15] = 0; reg0 = 100.64.0.4; reg1 = 100.64.0.1; eth.src = 0a:58:64:40:00:01; outport = "rtoj-ovn_cluster_router"; flags.loopback = 1; next;)

100.64.0.1/32 via 100.64.0.1 dst-ip
100.64.0.2/32 via 100.64.0.2 dst-ip

The directly connected route has a priority of 33 here:
  table=10(lr_in_ip_routing   ), priority=33   , match=(ip4.dst == 100.64.0.0/16), action=(ip.ttl--; reg8[0..15] = 0; reg0 = ip4.dst; reg1 = 100.64.0.1; eth.src = 0a:58:64:40:00:01; outport = "rtoj-ovn_cluster_router"; flags.loopback = 1; next;)

As Lorenzo explained, this is calculated by prefix * 2 + 1. However, this route should have the highest possible priority (67 should be safe). This is because this is a directly connected route which has a lower metric in routing than any other static route. On most real routing appliances a directly connected route has a metric of 0, and a user is not able to add a static route with a metric any lower than 1. In other words, directly connected routes *always* have higher priority and cannot be overidden with static routes. If a user wants to do something different than traditional routing, they may use policy routing (available in OVN) to override the route lookup determined.

Comment 2 lorenzo bianconi 2021-02-18 22:17:21 UTC
(In reply to Tim Rozet from comment #0)
> Description of problem:
> Consider the following topology:
> 
>     100.64.0.1                                       10.244.0.6
> ext--GR1 ------------OVN cluster router (DR)-----------pod1 
>                      |
> ext--GR2 -------------
>     100.64.0.2
> 
> pod1 and GR 1 are on the same node. GR1 and GR2 both SNAT ingress traffic to
> them.
> 
> OVN DR has static routes for:
> 
> 10.244.0.0/24 via 100.64.0.1 src-ip
> 
> Now if a packet comes from GR1 from external network (ext) it will be SNAT
> to 100.64.0.1 and send to pod1. pod 1 sends a reply packet and it goes back
> to 100.64.0.1 correctly.  However, if a packet ingresses at GR2, it gets
> SNAT to 100.64.0.2 and the packet is sent to pod 1. pod1 then replies and
> the packet goes to the DR. At this point OVN uses the static route and
> incorrectly sends the packet to GR1 (asymmetrical routing).
> 
> The correct behavior should have been to send the packet to GR2. This is
> because the destination is 100.64.0.2, and a router should always have a
> lower metric for directly connected routes. A workaround to this problem is
> to add a dummy static route with higher prefix match like:
> 
> 100.64.0.1/32 via 100.64.0.1 dst-ip
> 100.64.0.2/32 via 100.64.0.2 dst-ip
> 
> ^This is kind of silly, but it works.

Based on the upstream discussion [0], the issue can be fixed using the policy routing table implemented in OVN without removing the capability to overwrite automatic routes with static ones according to the "longest prefix match".
@Tim: do you agree to close the bz as "not a bug"?

[0] https://mail.openvswitch.org/pipermail/ovs-dev/2021-February/380295.html

Comment 3 Tim Rozet 2021-02-22 20:05:15 UTC
Yeah, we can just keep the workaround for now. It doesn't hurt us to have it.


Note You need to log in before you can comment on or make changes to this bug.