1906455 – [OVN] Asymmetric traffic path between external host and VM with a FIP / inconsistent NAT?

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1906455 - [OVN] Asymmetric traffic path between external host and VM with a FIP / inconsistent NAT?

Summary: [OVN] Asymmetric traffic path between external host and VM with a FIP / incon...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux Fast Datapath
Classification:	Red Hat
Component:	ovn2.13
Sub Component:
Version:	FDP 20.B
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	OVN Team
QA Contact:	Jianlin Shi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-12-10 14:51 UTC by Daniel Alvarez Sanchez
Modified:	2023-12-01 15:57 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-12-01 15:57:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	FD-978	0	None	None	None	2022-01-07 14:49:36 UTC

Description Daniel Alvarez Sanchez 2020-12-10 14:51:46 UTC

I'd like to describe a scenario where a private IP address in OVN is reached through an OVN router from an external host. The traffic is delivered from the gateway chassis to the destination chassis via the overlay while the reverse patch happens over the physical network.

Please find below details and captures and what I think it'd be the desired behavior.


Pinging from external destination to a private address through the external router:


e.g. ping from rack-2-host-1 to VM on rack-2-host2 (10.0.0.119) over an external network.


External host to gateway:
=========================

[vagrant@rack-2-host-1 ~]$ ip r get 10.0.0.119
10.0.0.119 via 172.24.4.221 dev eth1 src 172.24.4.99 uid 1000


[vagrant@rack-2-host-1 ~]$ ping 10.0.0.119 -c2
PING 10.0.0.119 (10.0.0.119) 56(84) bytes of data.
64 bytes from 10.0.0.119: icmp_seq=1 ttl=61 time=3.11 ms
64 bytes from 10.0.0.119: icmp_seq=2 ttl=61 time=3.21 ms

--- 10.0.0.119 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 2ms
rtt min/avg/max/mdev = 3.106/3.155/3.205/0.074 ms


Gateway to destination chassis:
===============================

Since the gateway port (172.24.4.221) is bound on rack1-host2, traffic gets there and is sent to the destination chassis via the geneve tunnel:


[vagrant@rack-1-host-2 ~]$ sudo tcpdump -i genev_sys_6081 icmp -vvnee -c2
dropped privs to tcpdump
tcpdump: listening on genev_sys_6081, link-type EN10MB (Ethernet), capture size 262144 bytes
14:36:27.755469 fa:16:3e:78:b4:97 > fa:16:3e:21:f5:06, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 59, id 37322, offset 0, flags [DF], proto ICMP (1), length 84)
    172.24.4.99 > 10.0.0.119: ICMP echo request, id 2999, seq 1, length 64
14:36:28.755051 fa:16:3e:78:b4:97 > fa:16:3e:21:f5:06, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 59, id 37688, offset 0, flags [DF], proto ICMP (1), length 84)
    172.24.4.99 > 10.0.0.119: ICMP echo request, id 2999, seq 2, length 64


The destination mac address is that of the gateway port:

...
logical_port        : cr-lrp-0f43995c-7c47-4798-a4b8-2a2c0f251c5e
mac                 : ["fa:16:3e:b4:82:b5 172.24.4.221/24 2001:db8::1/64"]
...


Delivery to the VM on the destination chassis:
==============================================


[root@rack-2-host-2 ~]# tcpdump -i tap64b0026e-d1 -vvnee icmp -c1
dropped privs to tcpdump
tcpdump: listening on tap64b0026e-d1, link-type EN10MB (Ethernet), capture size 262144 bytes
14:38:32.553203 fa:16:3e:78:b4:97 > fa:16:3e:21:f5:06, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 59, id 29472, offset 0, flags [DF], proto ICMP (1), length 84)
    172.24.4.99 > 10.0.0.119: ICMP echo request, id 2999, seq 126, length 64



So far, so good. Now the thing is that on its way back, the reply traffic takes a different route since the VM has a FIP attached (dnat_and_snat NAT entry):


[vagrant@rack-1-host-1 ~]$ ovn-nbctl list nat
_uuid               : d5257074-a6d8-4d1d-aac1-7287d8b6dc01
external_ip         : "172.24.4.169"
external_mac        : "fa:16:3e:19:e4:4a"
external_port_range : ""
logical_ip          : "10.0.0.119"
logical_port        : "64b0026e-d13c-45fc-bb87-53e820b70f37"
options             : {}
type                : dnat_and_snat



Reply from the VM on its chassis:
=================================


[root@rack-2-host-2 ~]# tcpdump -i br-ex -vvnee icmp -c1
dropped privs to tcpdump
tcpdump: listening on br-ex, link-type EN10MB (Ethernet), capture size 262144 bytes
14:40:24.739054 fa:16:3e:19:e4:4a > 32:c0:7e:74:dc:4f, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 49718, offset 0, flags [none], proto ICMP (1), length 84)
    10.0.0.119 > 172.24.4.99: ICMP echo reply, id 2999, seq 238, length 64




What we see here is that OVN has:

1) changed the source MAC to that of the FIP  (fa:16:3e:19:e4:4a)
2) Left the source IP unchanged (10.0.0.119)
3) Sent the traffic via the localnet port and hence using the physical network



In my opinion, this behavior is not consistent. I'd expect:

* Either change the source MAC and the source IP to that of the FIP or leave both unchanged
* If we leave them unchanged (which it'd be good IMO as the destination IP of the ICMP request is the logical port IP and not the FIP), then send the reply back to the tunnel
* If it goes out the localnet port, then the source IP should be that of the floating IP, the same way that the MAC address is changed to it


If OVN has some mechanism to see if there's a conntrack entry for 10.0.0.119, my preferred way would be to not apply the FIP and keep using the tunnel (ie. to keep both ways symmetric).


Please, note that if I remove the dnat_and_snat entry (ie. remove the FIP) the traffic works symmetrically via the geneve tunnel to the gateway node and using the physical network between the gateway and the external host.

Comment 1 Dumitru Ceara 2020-12-15 15:55:48 UTC

(In reply to Daniel Alvarez Sanchez from comment #0)
> 
> 1) changed the source MAC to that of the FIP  (fa:16:3e:19:e4:4a)
> 2) Left the source IP unchanged (10.0.0.119)
> 3) Sent the traffic via the localnet port and hence using the physical
> network
> 
> 
> 
> In my opinion, this behavior is not consistent. I'd expect:
> 

This does look like a bug to me.  I don't think there should be any reason to
change the packet's SMAC to NAT.external_mac unless SNAT is also performed
using NAT.external_ip.

Comment 2 Daniel Alvarez Sanchez 2020-12-15 16:14:04 UTC

(In reply to Dumitru Ceara from comment #1)
> (In reply to Daniel Alvarez Sanchez from comment #0)
> > 
> > 1) changed the source MAC to that of the FIP  (fa:16:3e:19:e4:4a)
> > 2) Left the source IP unchanged (10.0.0.119)
> > 3) Sent the traffic via the localnet port and hence using the physical
> > network
> > 
> > 
> > 
> > In my opinion, this behavior is not consistent. I'd expect:
> > 
> 
> This does look like a bug to me.  I don't think there should be any reason to
> change the packet's SMAC to NAT.external_mac unless SNAT is also performed
> using NAT.external_ip.


Thanks Dumitru, that's my understanding as well.

From an OpenStack perspective I believe that, if possible, the traffic should return via the same path it entered. ie. if no NAT happens and the traffic came in through the tunnel to the port IP address, no NAT should happen in the reverse path and it also should be sent through the tunnel.

Comment 3 lorenzo bianconi 2021-02-09 15:15:14 UTC

(In reply to Daniel Alvarez Sanchez from comment #2)
> (In reply to Dumitru Ceara from comment #1)
> > (In reply to Daniel Alvarez Sanchez from comment #0)
> > > 
> > > 1) changed the source MAC to that of the FIP  (fa:16:3e:19:e4:4a)
> > > 2) Left the source IP unchanged (10.0.0.119)
> > > 3) Sent the traffic via the localnet port and hence using the physical
> > > network
> > > 
> > > 
> > > 
> > > In my opinion, this behavior is not consistent. I'd expect:
> > > 
> > 
> > This does look like a bug to me.  I don't think there should be any reason to
> > change the packet's SMAC to NAT.external_mac unless SNAT is also performed
> > using NAT.external_ip.
> 
> 
> Thanks Dumitru, that's my understanding as well.
> 
> From an OpenStack perspective I believe that, if possible, the traffic
> should return via the same path it entered. ie. if no NAT happens and the
> traffic came in through the tunnel to the port IP address, no NAT should
> happen in the reverse path and it also should be sent through the tunnel.

The root cause of the issue I guess are the flows added to table=17 when we have FIPs on the hv, e.g:

table=17(lr_in_gw_redirect  ) priority=100  , match=(ip4.src == 10.0.0.3 && outport == "lr0-public" && is_chassis_resident("sw0-port1")), action=(eth.src = 30:54:00:00:00:03; reg1 = 172.16.0.110; next;)

This flow will overwrite the L2/L3 src address and force the packet to be sent out using localnet port and not using the geneve tunnel

Comment 4 Mark Michelson 2023-07-28 19:46:30 UTC

Hi, is this issue still happening in OSP? I see that the priority and severity were bumped back in April, but no new comments were added. Lorenzo's diagnosis of the problem is multiple years old at this point, so I don't know if it's still relevant.

Comment 5 Mark Michelson 2023-12-01 15:57:05 UTC

No response since my comment in July. Closing.

Note You need to log in before you can comment on or make changes to this bug.