Bug 2161281
| Summary: | "SNAT in separate zone from DNAT" test fails due to OVN issues | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | xsimonar |
| Component: | ovn23.06 | Assignee: | Ales Musil <amusil> |
| Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | high | ||
| Version: | FDP 22.H | CC: | amusil, ctrautma, dcbw, jiji, mmichels, ovn-bot |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ovn23.06-23.06.0-beta.118.el8fdp | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-01-24 11:05:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
xsimonar
2023-01-16 13:24:55 UTC
After discussion with Xavier and poking around with the test there seem to be indeed two issues. 1) Is with first traffic. Because of the ARP resolution we lose the ct_mark/label which results in SNAT happening in the common zone. This has the following consequence: - The original traffic arrives to destination and CT entry for SNAT is created in common zone. - The reply traffic goes through LR pipeline and hits bug 2 (described down below), the unSNAT happens in common zone, but because of conflict the unDNAT cannot be done properly (they are in the same zone). - The traffic arrives with wrong source address. 2) In ingress router pipeline the unSNAT does not have a proper state matching when it should do the unSNAT in separate or common zone. - There are logical flows that differentiate between common and separate SNAT zone: table=4 (lr_in_unsnat ), priority=100 , match=(ip && ip4.dst == 172.16.0.101 && inport == "r1_public" && flags.loopback == 0 && is_chassis_resident("cr-r1_public")), action=(ct_snat_in_czone;) table=4 (lr_in_unsnat ), priority=100 , match=(ip && ip4.dst == 172.16.0.101 && inport == "r1_public" && flags.loopback == 1 && flags.use_snat_zone == 1 && is_chassis_resident("cr-r1_public")), action=(ct_snat;) - In order to do unSNAT in separate zone we need to have loopback=1 and use_snat_zone=1, those two conditions are set only if the traffic is local e.g. hairpin and is sent back to the same port via "lr_out_egr_loop". - This also has the consequence, that once the MAC binding is learned and CT entry from the common zone one expires, every traffic is dropped because it's not properly unSNATted. The outcome is that SNAT and LB done on distributed router ports is suffering from this issue. One of the possibilities how to fix this issue is to use the separate zone for SNAT every time, however I'm not sure if that would have any other consequences. The behavior when CT entry related to 1st ping (snat in the common zone) has expired has changed recently a few times:
Before commit "northd: Add logical flow to defrag ICMP traffic", there was a return packet with wrong src address (return packet was not undnatted)
Then, before commit "northd: Drop packets destined to router owned NAT IP for DGP", it ... worked (correct reply packet).
Then, as indicated above, after that commit, there is no reply packet anymore.
The reason what it "worked" (once initial CT entry has been cleared) is the following:
- echo request is dnatted in dnat zone and snatted in snat zone.
- for reply packet, unsnat fails (as we try to unsnat in common/dnat zone, hitting rule w/ loopback == 0)
- dst of the reply packet remains 172.16.0.102
- packet is re-routed the same router (r1), but this time with loopback bit set
- unsnat is done in correct zone (hitting rule w/ flags.loopback == 1)
- it would not hit the undnat rule as outport is wrong
table=1 (lr_out_undnat ), priority=120 , match=(ip4 && ((ip4.src == 172.16.0.102)) && outport == "r1_public" && is_chassis_resident("cr-r1_public")), action=(ct_dnat_in_czone;)
- but it hits
table=5 (lr_in_defrag ), priority=50 , match=(icmp || icmp6), action=(ct_dnat;)
To sum up what the solution should look like: Have a config knob that allows user to specify to use always separate zones for SNAT and DNAT or use common zone when possible. The reason for the knob is that the common zone was needed for HWOL and we should still allow this behavior. Also the default behavior should be correct one -> separate zones allowing user that needs HWOL to go back to the "old" behavior. Patch posted: https://patchwork.ozlabs.org/project/ovn/patch/20230210092049.603012-1-amusil@redhat.com/ Accepted patchset is https://patchwork.ozlabs.org/project/ovn/list/?series=350439&archive=both&state=* ovn23.06 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2203012 ovn23.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2203013 *** Bug 2203012 has been marked as a duplicate of this bug. *** use the reproducer in https://bugzilla.redhat.com/show_bug.cgi?id=2203013#c3. reproduced on ovn23.03-23.03.0-101.el8: [root@kvm-02-guest29 bz2161281]# rpm -qa | grep -E "ovn23.03|openvswitch3.1" openvswitch3.1-3.1.0-70.el8fdp.x86_64 ovn23.03-central-23.03.0-101.el8fdp.x86_64 ovn23.03-23.03.0-101.el8fdp.x86_64 ovn23.03-host-23.03.0-101.el8fdp.x86_64 [root@kvm-02-guest29 bz2161281]# ip netns exec vm1 ping 30.0.0.1 -c 1 -w 2 PING 30.0.0.1 (30.0.0.1) 56(84) bytes of data. 64 bytes from 172.16.0.102: icmp_seq=1 ttl=62 time=36.3 ms --- 30.0.0.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 36.344/36.344/36.344/0.000 ms [root@kvm-02-guest29 ~]# ip netns exec vm1 tcpdump -i vm1 -nnle -v dropped privs to tcpdump tcpdump: listening on vm1, link-type EN10MB (Ethernet), capture size 262144 bytes 22:33:26.769369 00:de:ad:01:00:01 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 173.0.1.1 tell 173.0.1.2, length 28 22:33:26.769631 00:de:ad:fe:00:01 > 00:de:ad:01:00:01, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 173.0.1.1 is-at 00:de:ad:fe:00:01, length 28 22:33:26.769638 00:de:ad:01:00:01 > 00:de:ad:fe:00:01, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 33918, offset 0, flags [DF], proto ICMP (1), length 84) 173.0.1.2 > 30.0.0.1: ICMP echo request, id 18110, seq 1, length 64 22:33:26.805694 00:de:ad:fe:00:01 > 00:de:ad:01:00:01, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 62, id 31610, offset 0, flags [none], proto ICMP (1), length 84) 172.16.0.102 > 173.0.1.2: ICMP echo reply, id 18110, seq 1, length 64 <=== src ip is not un-dnated Verified on ovn23.06-23.06.1-60.el8: [root@kvm-02-guest29 bz2161281]# ip netns exec vm1 ping 30.0.0.1 -c 1 -w 2 PING 30.0.0.1 (30.0.0.1) 56(84) bytes of data. 64 bytes from 30.0.0.1: icmp_seq=1 ttl=62 time=31.5 ms --- 30.0.0.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 31.542/31.542/31.542/0.000 ms [root@kvm-02-guest29 bz2161281]# rpm -qa | grep -E "ovn23.06|openvswitch3.1" openvswitch3.1-3.1.0-70.el8fdp.x86_64 ovn23.06-23.06.1-60.el8fdp.x86_64 ovn23.06-central-23.06.1-60.el8fdp.x86_64 ovn23.06-host-23.06.1-60.el8fdp.x86_64 [root@kvm-02-guest29 ~]# ip netns exec vm1 tcpdump -i vm1 -nnle -v not ip6 dropped privs to tcpdump tcpdump: listening on vm1, link-type EN10MB (Ethernet), capture size 262144 bytes 22:36:56.063825 00:de:ad:01:00:01 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 173.0.1.1 tell 173.0.1.2, length 28 22:36:56.064642 00:de:ad:fe:00:01 > 00:de:ad:01:00:01, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 173.0.1.1 is-at 00:de:ad:fe:00:01, length 28 22:36:56.064651 00:de:ad:01:00:01 > 00:de:ad:fe:00:01, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 36021, offset 0, flags [DF], proto ICMP (1), length 84) 173.0.1.2 > 30.0.0.1: ICMP echo request, id 19434, seq 1, length 64 22:36:56.095345 00:de:ad:fe:00:01 > 00:de:ad:01:00:01, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 62, id 7443, offset 0, flags [none], proto ICMP (1), length 84) 30.0.0.1 > 173.0.1.2: ICMP echo reply, id 19434, seq 1, length 64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn23.06 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:0388 |