The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1908540 - Explicitly specify the SNAT IP for Hairpinned traffic
Summary: Explicitly specify the SNAT IP for Hairpinned traffic
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn2.13
Version: RHEL 8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Dumitru Ceara
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-12-17 00:22 UTC by Andrew Stoycos
Modified: 2021-03-15 14:36 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-15 14:36:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:0836 0 None None None 2021-03-15 14:36:23 UTC

Description Andrew Stoycos 2020-12-17 00:22:26 UTC
Description of problem:

Currently in an OVN-Kubernetes backed Openshift cluster, packets are being dropped in certain scenarios where Services and Network Policies are being used. 

Specifically in the scenario discussed in BZ 1903651 where we have 

1. A Default Deny Network Policy 
2. An allow from same namespace Network Policy 
3. a SVC backed by pods A, B, and C

When Pod A sends a packet to the SVC and it is DNAT-ed(loadbalanced) back to POD A, the packet is hairpinned and SNAT-ed to one of the service VIPs, that VIP is not covered by the "allow-from-same-Namespace" ACL and the packet is dropped. 

The SNAT IP is usually the VIP of the Service however it can vary since the SNAT IP is chosen automatically by OVN (See https://patchwork.ozlabs.org/project/openvswitch/patch/1580302197-14276-1-git-send-email-dceara@redhat.com/). This is a difficult scenario to cover fully within OVN-K since it would add a new dependency between NetworkPolicies and Services and there are many corner cases. 

We've(dceara and I) have decided that the best way to forward would to implement a fix that would allow us to explicitly specify an SNAT IP for hairpinned traffic.  This would simplify the hairpin traffic management process for OVN-K, and is similar to the way this was previously dealt with in Openshift SDN. This bug serves to track the feature work.

Comment 1 Dumitru Ceara 2020-12-17 14:17:16 UTC
@Andrew, just to clarify the request, would it be acceptable if the hairpin
SNAT IP would be defined as an option of the logical switch?  E.g.:

ovn-nbctl set logical_switch node-ls options:hairpin_snat_ip=x.y.z.t

Thanks,
Dumitru

Comment 2 Andrew Stoycos 2020-12-17 14:55:47 UTC
Well that would still require us to add more than one IP to a given ingress allow Rule's Address_set..... Since we could have 3 pods backing a service all running on different nodes therefore SNATing hairpin taffic to 3 different IPs (Granted I guess we could set the IP to be equal for all switches). Could we define the SNAT IP for a given loadbalancer instead? This would be a bit easier for OVN-K to deal with, since then we only have to worry about 1 IP per service. If not, setting it for a logical switch would still make things better then they are currently.

Thanks, 
Andrew

Comment 3 Dumitru Ceara 2020-12-17 15:12:21 UTC
(In reply to Andrew Stoycos from comment #2)

[...]

> switches). Could we define the SNAT IP for a given loadbalancer instead?

I think this should be fine too, it will be the responsibility of the CMS
to make sure that that SNAT IP makes sense for all logical switches the
load balancer is applied on.

> This would be a bit easier for OVN-K to deal with, since then we only have
> to worry about 1 IP per service. If not, setting it for a logical switch
> would still make things better then they are currently.
> 

Ok, thanks for the reply!

> Thanks, 
> Andrew

Comment 4 Dumitru Ceara 2021-01-15 18:27:22 UTC
Patch posted upstream for review: http://patchwork.ozlabs.org/project/ovn/list/?series=224666&state=*

Comment 5 Dan Williams 2021-02-09 14:07:30 UTC
Patch merged to master Feb 3rd.

Comment 9 Jianlin Shi 2021-02-19 02:31:33 UTC
tested with following script:

systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:1.1.175.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.175.25
systemctl restart ovn-controller
ip netns add server0
ip link add veth0_s0 netns server0 type veth peer name veth0_s0_p
ip netns exec server0 ip link set lo up
ip netns exec server0 ip link set veth0_s0 up
ip netns exec server0 ip link set veth0_s0 address 00:00:00:01:01:02
ip netns exec server0 ip addr add 192.168.1.1/24 dev veth0_s0
ip netns exec server0 ip -6 addr add 2001::1/64 dev veth0_s0
ip netns exec server0 ip route add default via 192.168.1.254 dev veth0_s0
ip netns exec server0 ip -6 route add default via 2001::a dev veth0_s0                                
ovs-vsctl add-port br-int veth0_s0_p                                                                  
ip link set veth0_s0_p up
ovs-vsctl set interface veth0_s0_p external_ids:iface-id=ls1p1                                        
ip netns exec server0 nc -l -k 1100 &

ovn-nbctl ls-add ls1                                                                                  
ovn-nbctl lsp-add ls1 ls1p1
ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:02 192.168.1.1 2001::1"
ovn-nbctl lr-add lr1                                                                                  
ovn-nbctl lrp-add lr1 lr1-ls1 00:00:00:00:00:01 192.168.1.254/24 2001::a/64
ovn-nbctl lsp-add ls1 ls1-lr1
ovn-nbctl lsp-set-addresses ls1-lr1 "00:00:00:00:00:01 192.168.1.254 2001::a"
ovn-nbctl lsp-set-type ls1-lr1 router
ovn-nbctl lsp-set-options ls1-lr1 router-port=lr1-ls1
                                                                                                      
ovn-nbctl lb-add lb0-tcp4 8.8.8.8:1234 192.168.1.1:1100 tcp
ovn-nbctl ls-lb-add ls1 lb0-tcp4                                                                      
ovn-nbctl set load_balancer lb0-tcp4 options:hairpin_snat_ip="8.8.8.7"
ovn-nbctl lb-add lb0-tcp6 [8888::1]:1234 [2001::1]:1100 tcp
ovn-nbctl ls-lb-add ls1 lb0-tcp6                                                                      
ovn-nbctl set load_balancer lb0-tcp6 options:hairpin_snat_ip="8888::7"
ovs-ofctl dump-flows br-int table=69                                                                  
ip netns exec server0 tcpdump -i any -w server0.pcap &
ip netns exec server0 nc 8.8.8.8 1234 <<< h                                                           
ip netns exec server0 nc 8888::1 1234 <<< h                                                           
ovs-ofctl dump-flows br-int table=69


Verified on 20.12.0-17:

[root@wsfd-advnetlab21 bz1908540]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
openvswitch2.13-2.13.0-82.el7fdp.x86_64
ovn2.13-20.12.0-17.el7fdp.x86_64
ovn2.13-host-20.12.0-17.el7fdp.x86_64
ovn2.13-central-20.12.0-17.el7fdp.x86_64

+ ovs-ofctl dump-flows br-int table=69 
+ ip netns exec server0 nc 8.8.8.8 1234
+ ip netns exec server0 tcpdump -i any -w server0.pcap
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
h                   
+ ip netns exec server0 nc 8888::1 1234                                    
h                            
+ ovs-ofctl dump-flows br-int table=69                                       
 cookie=0xbabcbf0d, duration=0.065s, table=69, n_packets=5, n_bytes=350, tcp,metadata=0x1,nw_src=192.168.1.1,nw_dst=8.8.8.7,tp_src=1100 actions=load:0x1->NXM_NX_REG10[7]

<=== 8.8.8.7

 cookie=0x289af82d, duration=0.021s, table=69, n_packets=4, n_bytes=364, tcp6,metadata=0x1,ipv6_src=2001::1,ipv6_dst=8888::7,tp_src=1100 actions=load:0x1->NXM_NX_REG10[7]
<=== 8888::7

[root@wsfd-advnetlab21 bz1908540]# tcpdump  -r server0.pcap dst host 192.168.1.1 -nnle
reading from file server0.pcap, link-type LINUX_SLL (Linux cooked)
21:27:31.652938  In 00:00:00:00:00:01 ethertype ARP (0x0806), length 44: Reply 192.168.1.254 is-at 00:00:00:00:00:01, length 28
21:27:31.653842  In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 76: 8.8.8.7.33990 > 192.168.1.1.1100: Flags [S], seq 3812429305, win 29200, options [mss 1460,sackOK,TS val 92969351 ecr 0,nop,wscale 7], length 0

<==== 8.8.8.7

21:27:31.654566  In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 76: 8.8.8.8.1234 > 192.168.1.1.33990: Flags [S.], seq 2031433868, ack 3812429306, win 28960, options [mss 1460,sackOK,TS val 92970355 ecr 92969351,nop,wscale 7], length 0
21:27:31.655539  In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 68: 8.8.8.7.33990 > 192.168.1.1.1100: Flags [F.], seq 3812429308, ack 2031433869, win 229, options [nop,nop,TS val 92970356 ecr 92970355], length 0
21:27:31.655603  In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 70: 8.8.8.7.33990 > 192.168.1.1.1100: Flags [P.], seq 4294967294:0, ack 1, win 229, options [nop,nop,TS val 92970356 ecr 92970355], length 2
21:27:31.655650  In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 80: 8.8.8.8.1234 > 192.168.1.1.33990: Flags [.], ack 1, win 227, options [nop,nop,TS val 92970357 ecr 92969351,nop,nop,sack 1 {3:4}], length 0
21:27:31.655688  In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 68: 8.8.8.7.33990 > 192.168.1.1.1100: Flags [.], ack 1, win 229, options [nop,nop,TS val 92970356 ecr 92970355], length 0
21:27:31.655698  In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 68: 8.8.8.8.1234 > 192.168.1.1.33990: Flags [.], ack 4, win 227, options [nop,nop,TS val 92970357 ecr 92970356], length 0
21:27:31.655767  In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 68: 8.8.8.8.1234 > 192.168.1.1.33990: Flags [F.], seq 1, ack 4, win 227, options [nop,nop,TS val 92970357 ecr 92970356], length 0
21:27:31.655833  In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 68: 8.8.8.7.33990 > 192.168.1.1.1100: Flags [.], ack 2, win 229, options [nop,nop,TS val 92970357 ecr 92970357], length 0
21:27:31.655879  In 00:00:00:00:00:01 ethertype IPv4 (0x0800), length 68: 8.8.8.8.1234 > 192.168.1.1.33990: Flags [.], ack 4, win 227, options [nop,nop,TS val 92970357 ecr 92970356], length 0
tcpdump: pcap_loop: truncated dump file; tried to read 104 captured bytes, only got 26
[root@wsfd-advnetlab21 bz1908540]# tcpdump  -r server0.pcap dst host 2001::1 -nnle
reading from file server0.pcap, link-type LINUX_SLL (Linux cooked)
21:27:31.696998  In 00:00:00:00:00:01 ethertype IPv6 (0x86dd), length 88: 2001::a > 2001::1: ICMP6, neighbor advertisement, tgt is 2001::a, length 32
21:27:31.698430  In 00:00:00:00:00:01 ethertype IPv6 (0x86dd), length 96: 8888::7.53990 > 2001::1.1100: Flags [S], seq 1513552003, win 28800, options [mss 1440,sackOK,TS val 92970397 ecr 0,nop,wscale 7], length 0

<=== 8888::7


21:27:31.699529  In 00:00:00:00:00:01 ethertype IPv6 (0x86dd), length 96: 8888::1.1234 > 2001::1.53990: Flags [S.], seq 414057409, ack 1513552004, win 28560, options [mss 1440,sackOK,TS val 92970400 ecr 92970397,nop,wscale 7], length 0
21:27:31.700164  In 00:00:00:00:00:01 ethertype IPv6 (0x86dd), length 88: 8888::7.53990 > 2001::1.1100: Flags [.], ack 414057410, win 225, options [nop,nop,TS val 92970401 ecr 92970400], length 0
tcpdump: pcap_loop: truncated dump file; tried to read 104 captured bytes, only got 26

Comment 10 Jianlin Shi 2021-02-19 03:21:30 UTC
also verified on rhel8:

+ ovs-ofctl dump-flows br-int table=69
 cookie=0x12269b6f, duration=5.144s, table=69, n_packets=6, n_bytes=428, tcp,metadata=0x1,nw_src=192.168.1.1,nw_dst=8.8.8.7,tp_src=1100 actions=load:0x1->NXM_NX_REG10[7]
 cookie=0xae73feee, duration=0.033s, table=69, n_packets=5, n_bytes=462, tcp6,metadata=0x1,ipv6_src=2001::1,ipv6_dst=8888::7,tp_src=1100 actions=load:0x1->NXM_NX_REG10[7]
+ pkill tcpdump                                                                                       
60 packets captured                                                                                   
60 packets received by filter                                                                         
0 packets dropped by kernel                                                                           
[root@dell-per740-12 bz1908540]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
openvswitch2.13-2.13.0-95.el8fdp.x86_64
ovn2.13-host-20.12.0-20.el8fdp.x86_64
ovn2.13-20.12.0-20.el8fdp.x86_64
ovn2.13-central-20.12.0-20.el8fdp.x86_64

Comment 12 errata-xmlrpc 2021-03-15 14:36:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0836


Note You need to log in before you can comment on or make changes to this bug.