Description of problem: During egress IP failover traffic sent from a matching pod is redirected to a new node where it should be SNATed on the GR. It was observed that the SNAT is not always applied when the connection was initialized on on one node and egress IP moved to another one. OVN-Kubernetes bug: https://issues.redhat.com/browse/OCPBUGS-283 Slack thread: https://coreos.slack.com/archives/C01G7T6SYSD/p1664296212811939 Version-Release number of selected component (if applicable): OpenShift version: 4.12.0-0.ci-2022-09-28-013725 ovn-nbctl 22.06.1 Open vSwitch Library 2.17.90 DB Schema 6.3.0 See the slack thread for reproducer: https://coreos.slack.com/archives/C01G7T6SYSD/p1664379654907989?thread_ts=1664296212.811939&cid=C01G7T6SYSD egress IP: 10.0.128.101 pod IP:10.0.128.101 Egress IP node before failover: pdiak-09-28-2022-6ml7m-worker-c-rlgb4 Egress IP node after failover: pdiak-09-28-2022-6ml7m-worker-b-7l2x8 Please find the attached network must-gathers for NB/SB databases. Actual results: After egress IP failover packets belonging to existing connections are not always SNATed and are sent out with POD ip as source. Expected results: After egress IP failover packets that belong to an existing connection should be SNATed
Checking the datapath flows after egress IP moved, on the chassis that now owns the egress IP, for the FIN+ACK packet we see: recirc_id(0x40),tunnel(tun_id=0x2,src=10.89.0.5,dst=10.89.0.6,geneve({}{}),flags(-df+csum+key)),in_port(2),ct_state(-new-rpl+trk),eth(),eth_type(0x0800),ipv4(src=10.244.0.6,frag=no), packets:4, bytes:264, used:0.899s, flags:F., actions:ct(commit,nat(src=10.89.0.199)),recirc(0x41) recirc_id(0x41),tunnel(tun_id=0x2,src=10.89.0.5,dst=10.89.0.6,geneve({}{}),flags(-df+csum+key)),in_port(2),ct_state(-new-est-rel-rpl+inv+trk),ct_mark(0/0x1),eth(src=d6:db:93:1b:af:5c,dst=ea:8c:57:97:14:b7),eth_type(0x0800),ipv4(dst=10.64.0.0/255.224.0.0,frag=no), packets:4, bytes:264, used:0.899s, flags:F., actions:ct_clear,ct(commit,zone=64000,mark=0x1/0xffffffff),4 So we first try to SNAT: actions:ct(commit,nat(src=10.89.0.199)) But for some reason this doesn't happen. This is the case only if there was no data traffic on the session after the egress IP move happened. That is, no conntrack entry exists for the session on this chassis. If instead we first generate data traffic, the conntrack session on the new chassis gets created and moves to ESTABLISHED before the FIN+ACK packet is processed. SNAT happens fine for all packets then.
I had missed a datapath flow above, for completeness: recirc_id(0),tunnel(tun_id=0x2,src=10.89.0.5,dst=10.89.0.6,geneve({class=0x102,type=0x80,len=4,0x10003/0x7fffffff}),flags(-df+csum+key)),in_port(2),ct_state(-new-est-rel-rpl-inv-trk),ct_mark(0/0x3),eth(src=0a:58:64:40:00:01,dst=0a:58:64:40:00:03),eth_type(0x0800),ipv4(src=10.244.0.4/255.255.255.252,dst=10.89.0.1,ttl=63,frag=no), packets:4, bytes:264, used:0.899s, flags:F., actions:set(eth(src=d6:db:93:1b:af:5c,dst=ea:8c:57:97:14:b7)),set(ipv4(ttl=62)),ct(zone=5,nat),recirc(0x40) recirc_id(0x40),tunnel(tun_id=0x2,src=10.89.0.5,dst=10.89.0.6,geneve({}{}),flags(-df+csum+key)),in_port(2),ct_state(-new-rpl+trk),eth(),eth_type(0x0800),ipv4(src=10.244.0.6,frag=no), packets:4, bytes:264, used:0.899s, flags:F., actions:ct(commit,nat(src=10.89.0.199)),recirc(0x41) recirc_id(0x41),tunnel(tun_id=0x2,src=10.89.0.5,dst=10.89.0.6,geneve({}{}),flags(-df+csum+key)),in_port(2),ct_state(-new-est-rel-rpl+inv+trk),ct_mark(0/0x1),eth(src=d6:db:93:1b:af:5c,dst=ea:8c:57:97:14:b7),eth_type(0x0800),ipv4(dst=10.64.0.0/255.224.0.0,frag=no), packets:4, bytes:264, used:0.899s, flags:F., actions:ct_clear,ct(commit,zone=64000,mark=0x1/0xffffffff),4
PR https://github.com/ovn-org/ovn-kubernetes/pull/3349 tries to minimize the race condition window in which packets can reach a gateway (after failover) before SNAT is configured in openflow. Closing this BZ for now; bug 2130939 tracks the not-SNAT-ed FIN and RST packets.
(In reply to Dumitru Ceara from comment #9) > Closing this BZ for now; bug 2130939 tracks the not-SNAT-ed FIN and RST > packets. Bug 2160685 is actually the one tracking the not-SNAT-ed FIN and RST packets.