Description of problem: Currently in an OVN-Kubernetes backed Openshift cluster, packets are being dropped in certain scenarios where Services and Network Policies are being used. Specifically in the scenario discussed in BZ 1903651 where we have 1. A Default Deny Network Policy 2. An allow from same namespace Network Policy 3. a SVC backed by pods A, B, and C When Pod A sends a packet to the SVC and it is DNAT-ed(loadbalanced) back to POD A, the packet is hairpinned and SNAT-ed to one of the service VIPs, that VIP is not covered by the "allow-from-same-Namespace" ACL and the packet is dropped. The SNAT IP is usually the VIP of the Service however it can vary since the SNAT IP is chosen automatically by OVN (See https://patchwork.ozlabs.org/project/openvswitch/patch/1580302197-14276-1-git-send-email-dceara@redhat.com/). This is a difficult scenario to cover fully within OVN-K since it would add a new dependency between NetworkPolicies and Services and there are many corner cases. We've(dceara and I) have decided that the best way to forward would to implement a fix that would allow us to explicitly specify an SNAT IP for hairpinned traffic. This would simplify the hairpin traffic management process for OVN-K, and is similar to the way this was previously dealt with in Openshift SDN. This bug serves to track the feature work.
@Andrew, just to clarify the request, would it be acceptable if the hairpin SNAT IP would be defined as an option of the logical switch? E.g.: ovn-nbctl set logical_switch node-ls options:hairpin_snat_ip=x.y.z.t Thanks, Dumitru
Well that would still require us to add more than one IP to a given ingress allow Rule's Address_set..... Since we could have 3 pods backing a service all running on different nodes therefore SNATing hairpin taffic to 3 different IPs (Granted I guess we could set the IP to be equal for all switches). Could we define the SNAT IP for a given loadbalancer instead? This would be a bit easier for OVN-K to deal with, since then we only have to worry about 1 IP per service. If not, setting it for a logical switch would still make things better then they are currently. Thanks, Andrew
(In reply to Andrew Stoycos from comment #2) [...] > switches). Could we define the SNAT IP for a given loadbalancer instead? I think this should be fine too, it will be the responsibility of the CMS to make sure that that SNAT IP makes sense for all logical switches the load balancer is applied on. > This would be a bit easier for OVN-K to deal with, since then we only have > to worry about 1 IP per service. If not, setting it for a logical switch > would still make things better then they are currently. > Ok, thanks for the reply! > Thanks, > Andrew
Patch posted upstream for review: http://patchwork.ozlabs.org/project/ovn/list/?series=224666&state=*
Patch merged to master Feb 3rd.
tested with following script: systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:1.1.175.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.175.25 systemctl restart ovn-controller ip netns add server0 ip link add veth0_s0 netns server0 type veth peer name veth0_s0_p ip netns exec server0 ip link set lo up ip netns exec server0 ip link set veth0_s0 up ip netns exec server0 ip link set veth0_s0 address 00:00:00:01:01:02 ip netns exec server0 ip addr add 192.168.1.1/24 dev veth0_s0 ip netns exec server0 ip -6 addr add 2001::1/64 dev veth0_s0 ip netns exec server0 ip route add default via 192.168.1.254 dev veth0_s0 ip netns exec server0 ip -6 route add default via 2001::a dev veth0_s0 ovs-vsctl add-port br-int veth0_s0_p ip link set veth0_s0_p up ovs-vsctl set interface veth0_s0_p external_ids:iface-id=ls1p1 ip netns exec server0 nc -l -k 1100 & ovn-nbctl ls-add ls1 ovn-nbctl lsp-add ls1 ls1p1 ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:02 192.168.1.1 2001::1" ovn-nbctl lr-add lr1 ovn-nbctl lrp-add lr1 lr1-ls1 00:00:00:00:00:01 192.168.1.254/24 2001::a/64 ovn-nbctl lsp-add ls1 ls1-lr1 ovn-nbctl lsp-set-addresses ls1-lr1 "00:00:00:00:00:01 192.168.1.254 2001::a" ovn-nbctl lsp-set-type ls1-lr1 router ovn-nbctl lsp-set-options ls1-lr1 router-port=lr1-ls1 ovn-nbctl lb-add lb0-tcp4 8.8.8.8:1234 192.168.1.1:1100 tcp ovn-nbctl ls-lb-add ls1 lb0-tcp4 ovn-nbctl set load_balancer lb0-tcp4 options:hairpin_snat_ip="8.8.8.7" ovn-nbctl lb-add lb0-tcp6 [8888::1]:1234 [2001::1]:1100 tcp ovn-nbctl ls-lb-add ls1 lb0-tcp6 ovn-nbctl set load_balancer lb0-tcp6 options:hairpin_snat_ip="8888::7" ovs-ofctl dump-flows br-int table=69 ip netns exec server0 tcpdump -i any -w server0.pcap & ip netns exec server0 nc 8.8.8.8 1234 <<< h ip netns exec server0 nc 8888::1 1234 <<< h ovs-ofctl dump-flows br-int table=69 Verified on 20.12.0-17: [root@wsfd-advnetlab21 bz1908540]# rpm -qa | grep -E "openvswitch2.13|ovn2.13" openvswitch2.13-2.13.0-82.el7fdp.x86_64 ovn2.13-20.12.0-17.el7fdp.x86_64 ovn2.13-host-20.12.0-17.el7fdp.x86_64 ovn2.13-central-20.12.0-17.el7fdp.x86_64 + ovs-ofctl dump-flows br-int table=69 + ip netns exec server0 nc 8.8.8.8 1234 + ip netns exec server0 tcpdump -i any -w server0.pcap tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes h + ip netns exec server0 nc 8888::1 1234 h + ovs-ofctl dump-flows br-int table=69 cookie=0xbabcbf0d, duration=0.065s, table=69, n_packets=5, n_bytes=350, tcp,metadata=0x1,nw_src=192.168.1.1,nw_dst=8.8.8.7,tp_src=1100 actions=load:0x1->NXM_NX_REG10[7] <=== 8.8.8.7 cookie=0x289af82d, duration=0.021s, table=69, n_packets=4, n_bytes=364, tcp6,metadata=0x1,ipv6_src=2001::1,ipv6_dst=8888::7,tp_src=1100 actions=load:0x1->NXM_NX_REG10[7] <=== 8888::7 [root@wsfd-advnetlab21 bz1908540]# tcpdump -r server0.pcap dst host 192.168.1.1 -nnle reading from file server0.pcap, link-type LINUX_SLL (Linux cooked) 21:27:31.652938 In 00:00:00:00:00:01 ethertype ARP (0x0806), length 44: Reply 192.168.1.254 is-at 00:00:00:00:00:01, length 28 21:27:31.653842 In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 76: 8.8.8.7.33990 > 192.168.1.1.1100: Flags [S], seq 3812429305, win 29200, options [mss 1460,sackOK,TS val 92969351 ecr 0,nop,wscale 7], length 0 <==== 8.8.8.7 21:27:31.654566 In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 76: 8.8.8.8.1234 > 192.168.1.1.33990: Flags [S.], seq 2031433868, ack 3812429306, win 28960, options [mss 1460,sackOK,TS val 92970355 ecr 92969351,nop,wscale 7], length 0 21:27:31.655539 In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 68: 8.8.8.7.33990 > 192.168.1.1.1100: Flags [F.], seq 3812429308, ack 2031433869, win 229, options [nop,nop,TS val 92970356 ecr 92970355], length 0 21:27:31.655603 In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 70: 8.8.8.7.33990 > 192.168.1.1.1100: Flags [P.], seq 4294967294:0, ack 1, win 229, options [nop,nop,TS val 92970356 ecr 92970355], length 2 21:27:31.655650 In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 80: 8.8.8.8.1234 > 192.168.1.1.33990: Flags [.], ack 1, win 227, options [nop,nop,TS val 92970357 ecr 92969351,nop,nop,sack 1 {3:4}], length 0 21:27:31.655688 In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 68: 8.8.8.7.33990 > 192.168.1.1.1100: Flags [.], ack 1, win 229, options [nop,nop,TS val 92970356 ecr 92970355], length 0 21:27:31.655698 In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 68: 8.8.8.8.1234 > 192.168.1.1.33990: Flags [.], ack 4, win 227, options [nop,nop,TS val 92970357 ecr 92970356], length 0 21:27:31.655767 In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 68: 8.8.8.8.1234 > 192.168.1.1.33990: Flags [F.], seq 1, ack 4, win 227, options [nop,nop,TS val 92970357 ecr 92970356], length 0 21:27:31.655833 In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 68: 8.8.8.7.33990 > 192.168.1.1.1100: Flags [.], ack 2, win 229, options [nop,nop,TS val 92970357 ecr 92970357], length 0 21:27:31.655879 In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 68: 8.8.8.8.1234 > 192.168.1.1.33990: Flags [.], ack 4, win 227, options [nop,nop,TS val 92970357 ecr 92970356], length 0 tcpdump: pcap_loop: truncated dump file; tried to read 104 captured bytes, only got 26 [root@wsfd-advnetlab21 bz1908540]# tcpdump -r server0.pcap dst host 2001::1 -nnle reading from file server0.pcap, link-type LINUX_SLL (Linux cooked) 21:27:31.696998 In 00:00:00:00:00:01 ethertype IPv6 (0x86dd), length 88: 2001::a > 2001::1: ICMP6, neighbor advertisement, tgt is 2001::a, length 32 21:27:31.698430 In 00:00:00:00:00:01 ethertype IPv6 (0x86dd), length 96: 8888::7.53990 > 2001::1.1100: Flags [S], seq 1513552003, win 28800, options [mss 1440,sackOK,TS val 92970397 ecr 0,nop,wscale 7], length 0 <=== 8888::7 21:27:31.699529 In 00:00:00:00:00:01 ethertype IPv6 (0x86dd), length 96: 8888::1.1234 > 2001::1.53990: Flags [S.], seq 414057409, ack 1513552004, win 28560, options [mss 1440,sackOK,TS val 92970400 ecr 92970397,nop,wscale 7], length 0 21:27:31.700164 In 00:00:00:00:00:01 ethertype IPv6 (0x86dd), length 88: 8888::7.53990 > 2001::1.1100: Flags [.], ack 414057410, win 225, options [nop,nop,TS val 92970401 ecr 92970400], length 0 tcpdump: pcap_loop: truncated dump file; tried to read 104 captured bytes, only got 26
also verified on rhel8: + ovs-ofctl dump-flows br-int table=69 cookie=0x12269b6f, duration=5.144s, table=69, n_packets=6, n_bytes=428, tcp,metadata=0x1,nw_src=192.168.1.1,nw_dst=8.8.8.7,tp_src=1100 actions=load:0x1->NXM_NX_REG10[7] cookie=0xae73feee, duration=0.033s, table=69, n_packets=5, n_bytes=462, tcp6,metadata=0x1,ipv6_src=2001::1,ipv6_dst=8888::7,tp_src=1100 actions=load:0x1->NXM_NX_REG10[7] + pkill tcpdump 60 packets captured 60 packets received by filter 0 packets dropped by kernel [root@dell-per740-12 bz1908540]# rpm -qa | grep -E "openvswitch2.13|ovn2.13" openvswitch2.13-2.13.0-95.el8fdp.x86_64 ovn2.13-host-20.12.0-20.el8fdp.x86_64 ovn2.13-20.12.0-20.el8fdp.x86_64 ovn2.13-central-20.12.0-20.el8fdp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0836