I'd like to describe a scenario where a private IP address in OVN is reached through an OVN router from an external host. The traffic is delivered from the gateway chassis to the destination chassis via the overlay while the reverse patch happens over the physical network. Please find below details and captures and what I think it'd be the desired behavior. Pinging from external destination to a private address through the external router: e.g. ping from rack-2-host-1 to VM on rack-2-host2 (10.0.0.119) over an external network. External host to gateway: ========================= [vagrant@rack-2-host-1 ~]$ ip r get 10.0.0.119 10.0.0.119 via 172.24.4.221 dev eth1 src 172.24.4.99 uid 1000 [vagrant@rack-2-host-1 ~]$ ping 10.0.0.119 -c2 PING 10.0.0.119 (10.0.0.119) 56(84) bytes of data. 64 bytes from 10.0.0.119: icmp_seq=1 ttl=61 time=3.11 ms 64 bytes from 10.0.0.119: icmp_seq=2 ttl=61 time=3.21 ms --- 10.0.0.119 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 2ms rtt min/avg/max/mdev = 3.106/3.155/3.205/0.074 ms Gateway to destination chassis: =============================== Since the gateway port (172.24.4.221) is bound on rack1-host2, traffic gets there and is sent to the destination chassis via the geneve tunnel: [vagrant@rack-1-host-2 ~]$ sudo tcpdump -i genev_sys_6081 icmp -vvnee -c2 dropped privs to tcpdump tcpdump: listening on genev_sys_6081, link-type EN10MB (Ethernet), capture size 262144 bytes 14:36:27.755469 fa:16:3e:78:b4:97 > fa:16:3e:21:f5:06, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 59, id 37322, offset 0, flags [DF], proto ICMP (1), length 84) 172.24.4.99 > 10.0.0.119: ICMP echo request, id 2999, seq 1, length 64 14:36:28.755051 fa:16:3e:78:b4:97 > fa:16:3e:21:f5:06, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 59, id 37688, offset 0, flags [DF], proto ICMP (1), length 84) 172.24.4.99 > 10.0.0.119: ICMP echo request, id 2999, seq 2, length 64 The destination mac address is that of the gateway port: ... logical_port : cr-lrp-0f43995c-7c47-4798-a4b8-2a2c0f251c5e mac : ["fa:16:3e:b4:82:b5 172.24.4.221/24 2001:db8::1/64"] ... Delivery to the VM on the destination chassis: ============================================== [root@rack-2-host-2 ~]# tcpdump -i tap64b0026e-d1 -vvnee icmp -c1 dropped privs to tcpdump tcpdump: listening on tap64b0026e-d1, link-type EN10MB (Ethernet), capture size 262144 bytes 14:38:32.553203 fa:16:3e:78:b4:97 > fa:16:3e:21:f5:06, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 59, id 29472, offset 0, flags [DF], proto ICMP (1), length 84) 172.24.4.99 > 10.0.0.119: ICMP echo request, id 2999, seq 126, length 64 So far, so good. Now the thing is that on its way back, the reply traffic takes a different route since the VM has a FIP attached (dnat_and_snat NAT entry): [vagrant@rack-1-host-1 ~]$ ovn-nbctl list nat _uuid : d5257074-a6d8-4d1d-aac1-7287d8b6dc01 external_ip : "172.24.4.169" external_mac : "fa:16:3e:19:e4:4a" external_port_range : "" logical_ip : "10.0.0.119" logical_port : "64b0026e-d13c-45fc-bb87-53e820b70f37" options : {} type : dnat_and_snat Reply from the VM on its chassis: ================================= [root@rack-2-host-2 ~]# tcpdump -i br-ex -vvnee icmp -c1 dropped privs to tcpdump tcpdump: listening on br-ex, link-type EN10MB (Ethernet), capture size 262144 bytes 14:40:24.739054 fa:16:3e:19:e4:4a > 32:c0:7e:74:dc:4f, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 63, id 49718, offset 0, flags [none], proto ICMP (1), length 84) 10.0.0.119 > 172.24.4.99: ICMP echo reply, id 2999, seq 238, length 64 What we see here is that OVN has: 1) changed the source MAC to that of the FIP (fa:16:3e:19:e4:4a) 2) Left the source IP unchanged (10.0.0.119) 3) Sent the traffic via the localnet port and hence using the physical network In my opinion, this behavior is not consistent. I'd expect: * Either change the source MAC and the source IP to that of the FIP or leave both unchanged * If we leave them unchanged (which it'd be good IMO as the destination IP of the ICMP request is the logical port IP and not the FIP), then send the reply back to the tunnel * If it goes out the localnet port, then the source IP should be that of the floating IP, the same way that the MAC address is changed to it If OVN has some mechanism to see if there's a conntrack entry for 10.0.0.119, my preferred way would be to not apply the FIP and keep using the tunnel (ie. to keep both ways symmetric). Please, note that if I remove the dnat_and_snat entry (ie. remove the FIP) the traffic works symmetrically via the geneve tunnel to the gateway node and using the physical network between the gateway and the external host.
(In reply to Daniel Alvarez Sanchez from comment #0) > > 1) changed the source MAC to that of the FIP (fa:16:3e:19:e4:4a) > 2) Left the source IP unchanged (10.0.0.119) > 3) Sent the traffic via the localnet port and hence using the physical > network > > > > In my opinion, this behavior is not consistent. I'd expect: > This does look like a bug to me. I don't think there should be any reason to change the packet's SMAC to NAT.external_mac unless SNAT is also performed using NAT.external_ip.
(In reply to Dumitru Ceara from comment #1) > (In reply to Daniel Alvarez Sanchez from comment #0) > > > > 1) changed the source MAC to that of the FIP (fa:16:3e:19:e4:4a) > > 2) Left the source IP unchanged (10.0.0.119) > > 3) Sent the traffic via the localnet port and hence using the physical > > network > > > > > > > > In my opinion, this behavior is not consistent. I'd expect: > > > > This does look like a bug to me. I don't think there should be any reason to > change the packet's SMAC to NAT.external_mac unless SNAT is also performed > using NAT.external_ip. Thanks Dumitru, that's my understanding as well. From an OpenStack perspective I believe that, if possible, the traffic should return via the same path it entered. ie. if no NAT happens and the traffic came in through the tunnel to the port IP address, no NAT should happen in the reverse path and it also should be sent through the tunnel.
(In reply to Daniel Alvarez Sanchez from comment #2) > (In reply to Dumitru Ceara from comment #1) > > (In reply to Daniel Alvarez Sanchez from comment #0) > > > > > > 1) changed the source MAC to that of the FIP (fa:16:3e:19:e4:4a) > > > 2) Left the source IP unchanged (10.0.0.119) > > > 3) Sent the traffic via the localnet port and hence using the physical > > > network > > > > > > > > > > > > In my opinion, this behavior is not consistent. I'd expect: > > > > > > > This does look like a bug to me. I don't think there should be any reason to > > change the packet's SMAC to NAT.external_mac unless SNAT is also performed > > using NAT.external_ip. > > > Thanks Dumitru, that's my understanding as well. > > From an OpenStack perspective I believe that, if possible, the traffic > should return via the same path it entered. ie. if no NAT happens and the > traffic came in through the tunnel to the port IP address, no NAT should > happen in the reverse path and it also should be sent through the tunnel. The root cause of the issue I guess are the flows added to table=17 when we have FIPs on the hv, e.g: table=17(lr_in_gw_redirect ) priority=100 , match=(ip4.src == 10.0.0.3 && outport == "lr0-public" && is_chassis_resident("sw0-port1")), action=(eth.src = 30:54:00:00:00:03; reg1 = 172.16.0.110; next;) This flow will overwrite the L2/L3 src address and force the packet to be sent out using localnet port and not using the geneve tunnel
Hi, is this issue still happening in OSP? I see that the priority and severity were bumped back in April, but no new comments were added. Lorenzo's diagnosis of the problem is multiple years old at this point, so I don't know if it's still relevant.
No response since my comment in July. Closing.