Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1931599

Summary: Loadbalancer hairpin no longer works when multiple load balancers exist on switches
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Tim Rozet <trozet>
Component: OVNAssignee: Dumitru Ceara <dceara>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: RHEL 8.0CC: ctrautma
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-15 14:34:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1934675    

Description Tim Rozet 2021-02-22 18:34:15 UTC
Description of problem:
Looks to be a problem with https://github.com/ovn-org/ovn/commit/022ea339c8e22824ba6f6f1257da0d1b6c66d401#

where we are no longer able to have a load balancer endpoint access the load balancer VIP and hairpin back to itself. This is a common test in ovn-kubernetes where an endpoint on a kubernetes service will try to access a VIP, and test that it can reach all endpoints (including itself).

From pcap we can see that the client is able to send a packet to the UDP server. But the UDP server's response never arrives at the client and fails to get unSNAT'ed and unDNAT'ed:

root@ovn-worker2:/# tcpdump -i any port 9999 -eennv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
17:16:05.368349   P 0a:58:0a:f4:00:14 ethertype IPv4 (0x0800), length 53: (tos 0x0, ttl 64, id 46541, offset 0, flags [DF], proto UDP (17), length 37)
    10.244.0.20.45930 > 10.96.205.26.9999: UDP, length 9
17:16:05.370080 Out 0a:58:0a:f4:00:01 ethertype IPv4 (0x0800), length 53: (tos 0x0, ttl 64, id 46541, offset 0, flags [DF], proto UDP (17), length 37)
    10.96.205.26.45930 > 10.244.0.20.9999: UDP, length 9
17:16:05.370303   P 0a:58:0a:f4:00:14 ethertype IPv4 (0x0800), length 51: (tos 0x0, ttl 64, id 46543, offset 0, flags [DF], proto UDP (17), length 35)
    10.244.0.20.9999 > 10.96.205.26.45930: UDP, length 7
17:16:05.371700  In 0a:58:0a:f4:00:01 ethertype IPv4 (0x0800), length 51: (tos 0x0, ttl 63, id 46543, offset 0, flags [DF], proto UDP (17), length 35)
    10.244.0.20.9999 > 10.96.205.26.45930: UDP, length 7
17:16:05.371760 Out 96:24:e2:5d:bb:75 ethertype IPv4 (0x0800), length 51: (tos 0x0, ttl 62, id 46543, offset 0, flags [DF], proto UDP (17), length 35)
    10.244.0.2.9999 > 10.96.205.26.45930: UDP, length 7
17:16:05.373355  In 96:24:e2:5d:bb:75 ethertype IPv4 (0x0800), length 51: (tos 0x0, ttl 61, id 46543, offset 0, flags [DF], proto UDP (17), length 35)
    169.254.13.131.9999 > 10.96.205.26.45930: UDP, length 7

Comment 1 Jianlin Shi 2021-02-23 02:28:59 UTC
I tried to add multiple load balancer on the same switch with following script:

systemctl start openvswitch                                  
systemctl start ovn-northd                                  
ovn-nbctl set-connection ptcp:6641                                       
ovn-sbctl set-connection ptcp:6642                                    
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.173.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.173.25
systemctl restart ovn-controller
ip netns add server0                                          
ip link add veth0_s0 netns server0 type veth peer name veth0_s0_p
ip netns exec server0 ip link set lo up
ip netns exec server0 ip link set veth0_s0 up
ip netns exec server0 ip link set veth0_s0 address 00:00:00:01:01:02
ip netns exec server0 ip addr add 192.168.1.1/24 dev veth0_s0
ip netns exec server0 ip -6 addr add 2001::1/64 dev veth0_s0
ip netns exec server0 ip route add default via 192.168.1.254 dev veth0_s0
ip netns exec server0 ip -6 route add default via 2001::a dev veth0_s0
ovs-vsctl add-port br-int veth0_s0_p                                     
ip link set veth0_s0_p up  
ovs-vsctl set interface veth0_s0_p external_ids:iface-id=ls1p1           

ip netns add server1
ip link add veth0_s1 netns server1 type veth peer name veth0_s1_p          
ip netns exec server1 ip link set lo up
ip netns exec server1 ip link set veth0_s1 up                                
ip netns exec server1 ip link set veth0_s1 address 00:00:00:01:02:02
ip netns exec server1 ip addr add 192.168.1.2/24 dev veth0_s1
ip netns exec server1 ip -6 addr add 2001::2/64 dev veth0_s1
ip netns exec server1 ip route add default via 192.168.1.254 dev veth0_s1   
ip netns exec server1 ip -6 route add default via 2001::a dev veth0_s1
ovs-vsctl add-port br-int veth0_s1_p                                          
ip link set veth0_s1_p up        
ovs-vsctl set interface veth0_s1_p external_ids:iface-id=ls1p2
                        
ip netns exec server0 nc -l -k 1100 &
ip netns exec server1 nc -l -k 1100 &                 
ip netns exec server0 nc -l -k 1101 &       
ip netns exec server1 nc -l -k 1101 &        
                                            
ovn-nbctl ls-add ls1                         
ovn-nbctl lsp-add ls1 ls1p1                 
ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:02 192.168.1.1 2001::1"
ovn-nbctl lsp-add ls1 ls1p2         
ovn-nbctl lsp-set-addresses ls1p2 "00:00:00:01:02:02 192.168.1.2 2001::2"

ovn-nbctl lr-add lr1                                                                                                                                                                         
ovn-nbctl lrp-add lr1 lr1-ls1 00:00:00:00:00:01 192.168.1.254/24 2001::a/64
ovn-nbctl lsp-add ls1 ls1-lr1
ovn-nbctl lsp-set-addresses ls1-lr1 "00:00:00:00:00:01 192.168.1.254 2001::a"
ovn-nbctl lsp-set-type ls1-lr1 router
ovn-nbctl lsp-set-options ls1-lr1 router-port=lr1-ls1
                                                                                                                                                                                             
ovn-nbctl lb-add lb0-tcp4 8.8.8.8:1234 192.168.1.1:1100,192.168.1.2:1100 tcp
ovn-nbctl ls-lb-add ls1 lb0-tcp4                                                                                                                                                             
ovn-nbctl lb-add lb0-tcp42 8.8.8.11:1235 192.168.1.1:1101,192.168.1.2:1101 tcp
ovn-nbctl ls-lb-add ls1 lb0-tcp42

ovn-nbctl --wait=hv sync
ovs-ofctl dump-flows br-int table=69                                                                                                                                                         
ip netns exec server0 tcpdump -i any -w server0.pcap &
ip netns exec server0 nc  8.8.8.8 1234 <<< h
ip netns exec server0 nc  8.8.8.11 1235 <<< h
ip netns exec server0 nc  8.8.8.8 1234 <<< h
ip netns exec server0 nc  8.8.8.11 1235 <<< h
ip netns exec server0 nc  8.8.8.8 1234 <<< h
ip netns exec server0 nc  8.8.8.11 1235 <<< h
ovs-ofctl dump-flows br-int table=69

but it works well on 20.12.0-20:

[root@wsfd-advnetlab21 bz1931599]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
python3-openvswitch2.13-2.13.0-95.el8fdp.x86_64
openvswitch2.13-2.13.0-95.el8fdp.x86_64
ovn2.13-20.12.0-20.el8fdp.x86_64
ovn2.13-host-20.12.0-20.el8fdp.x86_64
ovn2.13-central-20.12.0-20.el8fdp.x86_64

[root@wsfd-advnetlab21 bz1931599]# bash -x rep.sh                               
+ systemctl start openvswitch                                 
+ systemctl start ovn-northd                                               
+ ovn-nbctl set-connection ptcp:6641                                    
+ ovn-sbctl set-connection ptcp:6642
......
+ ip netns exec server0 nc 8.8.8.11 1235
h
+ ip netns exec server0 nc 8.8.8.8 1234
h
+ ip netns exec server0 nc 8.8.8.11 1235
h
+ ip netns exec server0 nc 8.8.8.8 1234
h
+ ip netns exec server0 nc 8.8.8.11 1235
h

<=== nc passed

+ ovs-ofctl dump-flows br-int table=69
 cookie=0x1b742c5e, duration=0.122s, table=69, n_packets=8, n_bytes=592, tcp,metadata=0x1,nw_src=192.168.1.1,nw_dst=8.8.8.11,tp_src=1101 actions=load:0x1->NXM_NX_REG10[7]
 cookie=0x9a39f40e, duration=0.078s, table=69, n_packets=4, n_bytes=272, tcp,metadata=0x1,nw_src=192.168.1.1,nw_dst=8.8.8.8,tp_src=1100 actions=load:0x1->NXM_NX_REG10[7]

Comment 2 Jianlin Shi 2021-02-23 06:00:43 UTC
reproduced with following script:

systemctl start openvswitch                                                    
systemctl start ovn-northd                                  
ovn-nbctl set-connection ptcp:6641                                       
ovn-sbctl set-connection ptcp:6642                                    
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.173.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.173.25
systemctl restart ovn-controller                               
ip netns add server0                                          
ip link add veth0_s0 netns server0 type veth peer name veth0_s0_p
ip netns exec server0 ip link set lo up
ip netns exec server0 ip link set veth0_s0 up
ip netns exec server0 ip link set veth0_s0 address 00:00:00:01:01:02
ip netns exec server0 ip addr add 192.168.1.1/24 dev veth0_s0
ip netns exec server0 ip -6 addr add 2001::1/64 dev veth0_s0
ip netns exec server0 ip route add default via 192.168.1.254 dev veth0_s0
ip netns exec server0 ip -6 route add default via 2001::a dev veth0_s0
ovs-vsctl add-port br-int veth0_s0_p                                     
ip link set veth0_s0_p up  
ovs-vsctl set interface veth0_s0_p external_ids:iface-id=ls1p1           
 
ip netns add server1                    
ip link add veth0_s1 netns server1 type veth peer name veth0_s1_p          
ip netns exec server1 ip link set lo up
ip netns exec server1 ip link set veth0_s1 up                                                                                                                              
ip netns exec server1 ip link set veth0_s1 address 00:00:00:01:02:02
ip netns exec server1 ip addr add 192.168.1.2/24 dev veth0_s1
ip netns exec server1 ip -6 addr add 2001::2/64 dev veth0_s1
ip netns exec server1 ip route add default via 192.168.1.254 dev veth0_s1
ip netns exec server1 ip -6 route add default via 2001::a dev veth0_s1
ovs-vsctl add-port br-int veth0_s1_p                                                                                                                                 
ip link set veth0_s1_p up        
ovs-vsctl set interface veth0_s1_p external_ids:iface-id=ls1p2
                                                                 
ip netns exec server0 nc -l -k 1100 &  
ip netns exec server1 nc -l -k 1100 &        
ip netns exec server0 nc -l -k 1101 &                               
ip netns exec server1 nc -l -k 1101 &

ovn-nbctl ls-add ls1
ovn-nbctl lsp-add ls1 ls1p1
ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:02 192.168.1.1 2001::1"
ovn-nbctl lsp-add ls1 ls1p2
ovn-nbctl lsp-set-addresses ls1p2 "00:00:00:01:02:02 192.168.1.2 2001::2"

ovn-nbctl lr-add lr1                                                                                                                                                                         
ovn-nbctl lrp-add lr1 lr1-ls1 00:00:00:00:00:01 192.168.1.254/24 2001::a/64
ovn-nbctl lsp-add ls1 ls1-lr1
ovn-nbctl lsp-set-addresses ls1-lr1 "00:00:00:00:00:01 192.168.1.254 2001::a"
ovn-nbctl lsp-set-type ls1-lr1 router
ovn-nbctl lsp-set-options ls1-lr1 router-port=lr1-ls1
                                                                                                                                                                                             
ovn-nbctl lb-add lb0-tcp4 8.8.8.8:1234 192.168.1.1:1100 tcp
ovn-nbctl ls-lb-add ls1 lb0-tcp4                                                                                                                                                             
ovn-nbctl lb-add lb0-tcp42 8.8.8.11:1235 192.168.1.1:1100 tcp
ovn-nbctl ls-lb-add ls1 lb0-tcp42

ovn-nbctl --wait=hv sync
ovs-ofctl dump-flows br-int table=69                                                                                                                                                         
ip netns exec server0 nc  8.8.8.8 1234 <<< h
ip netns exec server0 nc  8.8.8.11 1235 <<< h
ip netns exec server0 nc  8.8.8.8 1234 <<< h
ip netns exec server0 nc  8.8.8.11 1235 <<< h
ip netns exec server0 nc  8.8.8.8 1234 <<< h
ip netns exec server0 nc  8.8.8.11 1235 <<< h
ovs-ofctl dump-flows br-int table=69

result on 20.12.0-20:

[root@wsfd-advnetlab21 bz1931599]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
python3-openvswitch2.13-2.13.0-95.el8fdp.x86_64
ovn2.13-central-20.12.0-20.el8fdp.x86_64
openvswitch2.13-2.13.0-95.el8fdp.x86_64
ovn2.13-20.12.0-20.el8fdp.x86_64
ovn2.13-host-20.12.0-20.el8fdp.x86_64

+ ovs-ofctl dump-flows br-int table=69 
+ ip netns exec server0 nc 8.8.8.8 1234      
h                                                                   
+ ip netns exec server0 nc 8.8.8.11 1235                     
Ncat: Connection timed out.                                 
+ ip netns exec server0 nc 8.8.8.8 1234                                  
h                                                                     
+ ip netns exec server0 nc 8.8.8.11 1235                                 
Ncat: Connection timed out.
+ ip netns exec server0 nc 8.8.8.8 1234                                  
h
+ ip netns exec server0 nc 8.8.8.11 1235
Ncat: Connection timed out.  

<=== nc failed
                                              
+ ovs-ofctl dump-flows br-int table=69 
 cookie=0x207c05ac, duration=30.258s, table=69, n_packets=11, n_bytes=750, tcp,metadata=0x1,nw_src=192.168.1.1,nw_dst=8.8.8.8,tp_src=1100 actions=load:0x1->NXM_NX_REG10[7]

<=== only flow for 8.8.8.8. no flow for 8.8.8.11

Comment 3 Dumitru Ceara 2021-02-23 13:21:00 UTC
Fix posted for review upstream: http://patchwork.ozlabs.org/project/ovn/list/?series=230666&state=*

Comment 7 Jianlin Shi 2021-03-05 02:13:45 UTC
Verified on ovn2.13-20.12.0-24:

[root@wsfd-advnetlab18 bz1931599]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
ovn2.13-central-20.12.0-24.el7fdp.x86_64
openvswitch2.13-2.13.0-84.el7fdp.x86_64
ovn2.13-20.12.0-24.el7fdp.x86_64
ovn2.13-host-20.12.0-24.el7fdp.x86_64

+ ovn-nbctl --wait=hv sync
+ ovs-ofctl dump-flows br-int table=69                                                                
+ ip netns exec server0 nc 8.8.8.8 1234
h                                                                                                     
+ ip netns exec server0 nc 8.8.8.11 1235                                                              
h                                                                                                     
+ ip netns exec server0 nc 8.8.8.8 1234
h                                                                                                     
+ ip netns exec server0 nc 8.8.8.11 1235                                                              
h                                                                                                     
+ ip netns exec server0 nc 8.8.8.8 1234
h                                                                                                     
+ ip netns exec server0 nc 8.8.8.11 1235                                                              
h

<=== passed

+ ovs-ofctl dump-flows br-int table=69
 cookie=0x58507a89, duration=0.250s, table=69, n_packets=8, n_bytes=556, tcp,metadata=0x1,nw_src=192.168.1.1,nw_dst=8.8.8.8,tp_src=1100 actions=load:0x1->NXM_NX_REG10[7]
 cookie=0x8ccc174a, duration=0.204s, table=69, n_packets=7, n_bytes=490, tcp,metadata=0x1,nw_src=192.168.1.1,nw_dst=8.8.8.11,tp_src=1100 actions=load:0x1->NXM_NX_REG10[7]

<=== flow for both 8.8.8.8 and 8.8.8.11

Comment 9 errata-xmlrpc 2021-03-15 14:34:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0839