Bug 2130939
| Summary: | OVN-Kubernetes: SNAT not applied for existing connections during egress IP failover | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Patryk Diak <pdiak> |
| Component: | OVN | Assignee: | Dumitru Ceara <dceara> |
| Status: | CLOSED WONTFIX | QA Contact: | Jianlin Shi <jishi> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | FDP 22.L | CC: | ctrautma, dceara, jiji, mmichels, skanakal |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-01-13 10:34:39 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Patryk Diak
2022-09-29 13:58:53 UTC
Checking the datapath flows after egress IP moved, on the chassis that
now owns the egress IP, for the FIN+ACK packet we see:
recirc_id(0x40),tunnel(tun_id=0x2,src=10.89.0.5,dst=10.89.0.6,geneve({}{}),flags(-df+csum+key)),in_port(2),ct_state(-new-rpl+trk),eth(),eth_type(0x0800),ipv4(src=10.244.0.6,frag=no), packets:4, bytes:264, used:0.899s, flags:F., actions:ct(commit,nat(src=10.89.0.199)),recirc(0x41)
recirc_id(0x41),tunnel(tun_id=0x2,src=10.89.0.5,dst=10.89.0.6,geneve({}{}),flags(-df+csum+key)),in_port(2),ct_state(-new-est-rel-rpl+inv+trk),ct_mark(0/0x1),eth(src=d6:db:93:1b:af:5c,dst=ea:8c:57:97:14:b7),eth_type(0x0800),ipv4(dst=10.64.0.0/255.224.0.0,frag=no), packets:4, bytes:264, used:0.899s, flags:F., actions:ct_clear,ct(commit,zone=64000,mark=0x1/0xffffffff),4
So we first try to SNAT: actions:ct(commit,nat(src=10.89.0.199))
But for some reason this doesn't happen.
This is the case only if there was no data traffic on the session after
the egress IP move happened. That is, no conntrack entry exists for the
session on this chassis.
If instead we first generate data traffic, the conntrack session on the
new chassis gets created and moves to ESTABLISHED before the FIN+ACK
packet is processed. SNAT happens fine for all packets then.
I had missed a datapath flow above, for completeness:
recirc_id(0),tunnel(tun_id=0x2,src=10.89.0.5,dst=10.89.0.6,geneve({class=0x102,type=0x80,len=4,0x10003/0x7fffffff}),flags(-df+csum+key)),in_port(2),ct_state(-new-est-rel-rpl-inv-trk),ct_mark(0/0x3),eth(src=0a:58:64:40:00:01,dst=0a:58:64:40:00:03),eth_type(0x0800),ipv4(src=10.244.0.4/255.255.255.252,dst=10.89.0.1,ttl=63,frag=no), packets:4, bytes:264, used:0.899s, flags:F., actions:set(eth(src=d6:db:93:1b:af:5c,dst=ea:8c:57:97:14:b7)),set(ipv4(ttl=62)),ct(zone=5,nat),recirc(0x40)
recirc_id(0x40),tunnel(tun_id=0x2,src=10.89.0.5,dst=10.89.0.6,geneve({}{}),flags(-df+csum+key)),in_port(2),ct_state(-new-rpl+trk),eth(),eth_type(0x0800),ipv4(src=10.244.0.6,frag=no), packets:4, bytes:264, used:0.899s, flags:F., actions:ct(commit,nat(src=10.89.0.199)),recirc(0x41)
recirc_id(0x41),tunnel(tun_id=0x2,src=10.89.0.5,dst=10.89.0.6,geneve({}{}),flags(-df+csum+key)),in_port(2),ct_state(-new-est-rel-rpl+inv+trk),ct_mark(0/0x1),eth(src=d6:db:93:1b:af:5c,dst=ea:8c:57:97:14:b7),eth_type(0x0800),ipv4(dst=10.64.0.0/255.224.0.0,frag=no), packets:4, bytes:264, used:0.899s, flags:F., actions:ct_clear,ct(commit,zone=64000,mark=0x1/0xffffffff),4
PR https://github.com/ovn-org/ovn-kubernetes/pull/3349 tries to minimize the race condition window in which packets can reach a gateway (after failover) before SNAT is configured in openflow. Closing this BZ for now; bug 2130939 tracks the not-SNAT-ed FIN and RST packets. (In reply to Dumitru Ceara from comment #9) > Closing this BZ for now; bug 2130939 tracks the not-SNAT-ed FIN and RST > packets. Bug 2160685 is actually the one tracking the not-SNAT-ed FIN and RST packets. |