Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2130045

Summary: the first ping from lsp to external through snat would fail in gateway router mode
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Jianlin Shi <jishi>
Component: ovn22.06Assignee: xsimonar
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: medium Docs Contact:
Priority: medium    
Version: FDP 21.ICC: ctrautma, dceara, jiji, xsimonar
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn22.06-22.06.0-59.el8fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-03 00:30:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jianlin Shi 2022-09-27 01:28:45 UTC
Description of problem:
the first ping from lsp to external through snat would fail in gateway router mode

Version-Release number of selected component (if applicable):
ovn22.06-22.06.0-57.el8

How reproducible:
Always

Steps to Reproduce:
systemctl start openvswitch
systemctl start ovn-northd 
ovn-nbctl set-connection ptcp:6641                                       
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.187.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.187.25
systemctl restart ovn-controller

ovs-vsctl del-br br-test
ovs-vsctl add-br br-test                                                                              
ip link set br-test up                                                                                
ovs-vsctl set open . external-ids:ovn-bridge-mappings=provider:br-test
                                                                                                
ovn-nbctl ls-add ls1                                                                   
ovn-nbctl lsp-add ls1 ls1p1                                                                     
ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:01 192.168.1.1 2001::1"
ovn-nbctl lsp-add ls1 ls1p2 
ovn-nbctl lsp-set-addresses ls1p2 "00:00:00:01:01:02 192.168.1.2 2001::2"
                                                    
ovn-nbctl lr-add lr1                                                
ovn-nbctl lrp-add lr1 lr1-ls1 00:00:00:00:00:01 192.168.1.254/24 2001::a/64
ovn-nbctl lsp-add ls1 ls1-lr1
ovn-nbctl lsp-set-addresses ls1-lr1 "00:00:00:00:00:01 192.168.1.254 2001::a"
ovn-nbctl lsp-set-type ls1-lr1 router
ovn-nbctl lsp-set-options ls1-lr1 router-port=lr1-ls1    

ovn-nbctl set logical_router lr1 options:chassis=hv1

ovn-nbctl lrp-add lr1 lr1-pub 00:00:00:00:0f:01 172.16.1.254/24 1711::a/64
ovn-nbctl ls-add pub
ovn-nbctl lsp-add pub pub-lr1
ovn-nbctl lsp-set-type pub-lr1 router
ovn-nbctl lsp-set-options pub-lr1 router-port=lr1-pub
ovn-nbctl lsp-set-addresses pub-lr1 router

ovn-nbctl lsp-add pub ln
ovn-nbctl lsp-set-type ln localnet
ovn-nbctl lsp-set-addresses ln unknown
ovn-nbctl lsp-set-options ln network_name=provider

ovn-nbctl lr-nat-add lr1 snat 172.16.1.10 192.168.1.0/24
ovn-nbctl lr-nat-add lr1 snat 1711::10 2001::/64
                                  
ovs-vsctl add-port br-int ls1p1 -- set interface ls1p1 type=internal external_ids:iface-id=ls1p1
ovs-vsctl add-port br-int ls1p2 -- set interface ls1p2 type=internal external_ids:iface-id=ls1p2
                                                                                
ip netns add ls1p1                     
ip link set ls1p1 netns ls1p1              
ip netns exec ls1p1 ip link set ls1p1 address 00:00:00:01:01:01
ip netns exec ls1p1 ip link set ls1p1 up
ip netns exec ls1p1 ip addr add 192.168.1.1/24 dev ls1p1
ip netns exec ls1p1 ip addr add 2001::1/64 dev ls1p1
ip netns exec ls1p1 ip route add default via 192.168.1.254 dev ls1p1
ip netns exec ls1p1 ip -6 route add default via 2001::a dev ls1p1

ip netns add ls1p2                                                                                    
ip link set ls1p2 netns ls1p2                                                                         
ip netns exec ls1p2 ip link set ls1p2 address 00:00:00:01:01:02                                       
ip netns exec ls1p2 ip link set ls1p2 up                                                              
ip netns exec ls1p2 ip addr add 192.168.1.2/24 dev ls1p2                                              
ip netns exec ls1p2 ip addr add 2001::2/64 dev ls1p2                                                  
ip netns exec ls1p2 ip route add default via 192.168.1.254 dev ls1p2                                  
ip netns exec ls1p2 ip -6 route add default via 2001::a                                               


ovs-vsctl add-port br-test ext1 -- set interface ext1 type=internal
ip netns add ext1
ip link set ext1 netns ext1
ip netns exec ext1 ip link set ext1 up
ip netns exec ext1 ip addr add 172.16.1.1/24 dev ext1
ip netns exec ext1 ip -6 addr add 1711::1/64 dev ext1

ovn-nbctl --wait=hv sync
sleep 2
ip netns exec ls1p1 ping 172.16.1.1 -c 1
ip netns exec ls1p1 ping 172.16.1.1 -c 1
ip netns exec ls1p1 ping6 1711::1 -c 1
ip netns exec ls1p1 ping6 1711::1 -c 1

Actual results:
+ ip netns exec ls1p1 ping 172.16.1.1 -c 1                                                            
PING 172.16.1.1 (172.16.1.1) 56(84) bytes of data.                                                    
                                                                                                      
--- 172.16.1.1 ping statistics ---                                                                    
1 packets transmitted, 0 received, 100% packet loss, time 0ms     

<=== the first ping failed                                    
                                                                                                      
+ ip netns exec ls1p1 ping 172.16.1.1 -c 1                                                            
PING 172.16.1.1 (172.16.1.1) 56(84) bytes of data.                                                    
64 bytes from 172.16.1.1: icmp_seq=1 ttl=63 time=2.65 ms                                              
                                                                                                      
--- 172.16.1.1 ping statistics ---                                                                    
1 packets transmitted, 1 received, 0% packet loss, time 0ms                                           
rtt min/avg/max/mdev = 2.654/2.654/2.654/0.000 ms                                                     
+ ip netns exec ls1p1 ping6 1711::1 -c 1                                                              
PING 1711::1(1711::1) 56 data bytes                                                                   
                                                                                                      
--- 1711::1 ping statistics ---                                                                       
1 packets transmitted, 0 received, 100% packet loss, time 0ms                                         
                                                                                                      
+ ip netns exec ls1p1 ping6 1711::1 -c 1                                                              
PING 1711::1(1711::1) 56 data bytes                                                                   
64 bytes from 1711::1: icmp_seq=1 ttl=63 time=4.20 ms                                                 
                                                                                                      
--- 1711::1 ping statistics ---                                                                       
1 packets transmitted, 1 received, 0% packet loss, time 0ms                                           
rtt min/avg/max/mdev = 4.203/4.203/4.203/0.000 ms 

Expected results:
pass

Additional info:

[root@wsfd-advnetlab16 test]# rpm -qa | grep -E "openvswitch2.17|ovn22.06"                            
openvswitch2.17-2.17.0-50.el8fdp.x86_64
python3-openvswitch2.17-2.17.0-50.el8fdp.x86_64
ovn22.06-central-22.06.0-57.el8fdp.x86_64
ovn22.06-host-22.06.0-57.el8fdp.x86_64
ovn22.06-22.06.0-57.el8fdp.x86_64

from tcpdump, the first ping doesn't reach to ext1.

the issue didn't exist on ovn22.06-22.06.0-27.el8.

Comment 1 Dumitru Ceara 2022-09-27 08:03:43 UTC
Git bisect points to:
https://github.com/ovn-org/ovn/commit/b89b96e1a16134c0aa8cd6513d920d49ff8c6cda

commit b89b96e1a16134c0aa8cd6513d920d49ff8c6cda
Author: Xavier Simonart <xsimonar>
Date:   Mon Aug 29 05:27:20 2022 -0400

    controller: fix potential segmentation violation when removing ports
    
    If a logical switch port is added and connected to a logical router
    port (through options: router-port) before the router port is
    created, then this might cause further issues such as segmentation
    violation when the switch and router ports are deleted.
    
    Signed-off-by: Xavier Simonart <xsimonar>
    Signed-off-by: Han Zhou <hzhou>
    (cherry picked from commit 04292cc2dc2c3823b0cf86612e50ad0023bcb73f)

 controller/local_data.c | 38 +++++++++------------
 controller/pinctrl.c    | 16 +++++++--
 tests/ovn.at            | 89 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 117 insertions(+), 26 deletions(-)

When reproducing locally with the steps mentioned in the bug description, it's important to ensure that the SB MAC_Binding table is cleared between runs, e.g.:
ovn-sbctl --all  destroy  mac_binding

Comment 3 OVN Bot 2022-09-29 04:05:08 UTC
ovn22.09 fast-datapath-rhel-8 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2130752
ovn22.09 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2130753
ovn22.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2130754

Comment 4 Jianlin Shi 2022-09-30 02:24:15 UTC
the issue is fixed on ovn22.06-22.06.0-59.el8:

+ ovn-nbctl --wait=hv sync                                                                            
+ sleep 2                                                                                             
+ ip netns exec ls1p1 ping 172.16.1.1 -c 1                                                            
PING 172.16.1.1 (172.16.1.1) 56(84) bytes of data.                                                    
64 bytes from 172.16.1.1: icmp_seq=1 ttl=63 time=6.41 ms                                              
                                                                                                      
--- 172.16.1.1 ping statistics ---                                                                    
1 packets transmitted, 1 received, 0% packet loss, time 0ms                                           
rtt min/avg/max/mdev = 6.411/6.411/6.411/0.000 ms                                                     
+ ip netns exec ls1p1 ping 172.16.1.1 -c 1                                                            
PING 172.16.1.1 (172.16.1.1) 56(84) bytes of data.                                                    
64 bytes from 172.16.1.1: icmp_seq=1 ttl=63 time=0.849 ms                                             
                                                                                                      
--- 172.16.1.1 ping statistics ---                                                                    
1 packets transmitted, 1 received, 0% packet loss, time 0ms                                           
rtt min/avg/max/mdev = 0.849/0.849/0.849/0.000 ms                                                     
+ ip netns exec ls1p1 ping6 1711::1 -c 1                                                              
PING 1711::1(1711::1) 56 data bytes                                                                   
64 bytes from 1711::1: icmp_seq=1 ttl=63 time=7.42 ms                                                 
                                                                                                      
--- 1711::1 ping statistics ---                                                                       
1 packets transmitted, 1 received, 0% packet loss, time 0ms                                           
rtt min/avg/max/mdev = 7.420/7.420/7.420/0.000 ms                                                     
+ ip netns exec ls1p1 ping6 1711::1 -c 1                                                              
PING 1711::1(1711::1) 56 data bytes                                                                   
64 bytes from 1711::1: icmp_seq=1 ttl=63 time=0.886 ms                                                
                                                                                                      
--- 1711::1 ping statistics ---                                                                       
1 packets transmitted, 1 received, 0% packet loss, time 0ms                                           
rtt min/avg/max/mdev = 0.886/0.886/0.886/0.000 ms                                                     
[root@dell-per740-12 bz2130045]# rpm -qa | grep -e "openvswitch2.17|ovn22.06"                         
[root@dell-per740-12 bz2130045]# rpm -qa | grep -e "openvswitch2.17|ovn22.06"                         
[root@dell-per740-12 bz2130045]# rpm -qa | grep -E "openvswitch2.17|ovn22.06"                         
ovn22.06-22.06.0-59.el8fdp.x86_64                                                                     
ovn22.06-host-22.06.0-59.el8fdp.x86_64                                                                
openvswitch2.17-2.17.0-52.el8fdp.x86_64                                                               
ovn22.06-central-22.06.0-59.el8fdp.x86_64

Comment 7 Jianlin Shi 2022-10-14 02:57:42 UTC
Verified on ovn22.06-64.el8:

+ ovn-nbctl --wait=hv sync
+ sleep 2
+ ip netns exec ls1p1 ping 172.16.1.1 -c 1
PING 172.16.1.1 (172.16.1.1) 56(84) bytes of data.
64 bytes from 172.16.1.1: icmp_seq=1 ttl=63 time=5.55 ms

--- 172.16.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 5.553/5.553/5.553/0.000 ms
+ ip netns exec ls1p1 ping 172.16.1.1 -c 1
PING 172.16.1.1 (172.16.1.1) 56(84) bytes of data.
64 bytes from 172.16.1.1: icmp_seq=1 ttl=63 time=1.31 ms

--- 172.16.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.311/1.311/1.311/0.000 ms
+ ip netns exec ls1p1 ping6 1711::1 -c 1
PING 1711::1(1711::1) 56 data bytes
64 bytes from 1711::1: icmp_seq=1 ttl=63 time=8.35 ms

--- 1711::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 8.352/8.352/8.352/0.000 ms
+ ip netns exec ls1p1 ping6 1711::1 -c 1
PING 1711::1(1711::1) 56 data bytes
64 bytes from 1711::1: icmp_seq=1 ttl=63 time=1.42 ms

--- 1711::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.415/1.415/1.415/0.000 ms
[root@dell-per740-12 bz2130046]# rpm -qa | grep -E "openvswitch2.17|ovn22.06"
ovn22.06-central-22.06.0-64.el8fdp.x86_64
ovn22.06-22.06.0-64.el8fdp.x86_64
ovn22.06-host-22.06.0-64.el8fdp.x86_64
openvswitch2.17-2.17.0-58.el8fdp.x86_64

Comment 9 errata-xmlrpc 2022-11-03 00:30:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn22.06), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:7395