Bug 1761371

Summary: [RHEL 8] GARP reply packets from switches are handled on all ovn-controllers
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Numan Siddique <nusiddiq>
Component: ovn2.11Assignee: Numan Siddique <nusiddiq>
Status: CLOSED ERRATA QA Contact: haidong li <haili>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: FDP 19.GCC: ctrautma, haili, jishi, kfida, mmichels, qding
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn2.12-2.12.0-2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1749739 Environment:
Last Closed: 2019-11-06 05:23:45 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1749739    
Bug Blocks:    

Description Numan Siddique 2019-10-14 09:36:19 UTC
+++ This bug was initially created as a clone of Bug #1749739 +++

Description of problem:
If all the chassis have external connectivity (ovn-bridge-mappings defined) and if the physical switch keeps sending periodic GARP replies (instead of Requests) they are handled by all chassis's.

It's enough if those GARPs are handled only on the gateway chassis where the distributed router port is scheduled.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 haidong li 2019-10-17 09:44:34 UTC
Hi Numan,to verify this bug,is it the right way by checking the cpu of the chassises when send the GARP from physical switch?Thanks!

Comment 4 Numan Siddique 2019-10-17 13:25:29 UTC
Yes. You can flood the garp from physical network and monitor the cpu. 
All the chassis should have bridge mappings configured so that the packet enteres the ovn pipeline i.e it should enter br-int.
But should be processed by only one node and there should be no cpu hogging.

Thanks

Comment 5 haidong li 2019-10-18 08:40:34 UTC
verified on the latest version:
[root@hp-dl380g10-05 bin]# rpm -qa | grep openvswitch
openvswitch-selinux-extra-policy-1.0-19.el8fdp.noarch
openvswitch2.11-2.11.0-24.el8fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-1.0-148.noarch
[root@hp-dl380g10-05 bin]# rpm -qa | grep ovn
ovn2.11-2.11.1-8.el8fdp.x86_64
ovn2.11-host-2.11.1-8.el8fdp.x86_64
kernel-kernel-networking-openvswitch-ovn-1.0-148.noarch
ovn2.11-central-2.11.1-8.el8fdp.x86_64
[root@hp-dl380g10-05 bin]# 
[root@dell-per730-42 ovn]# ovn-nbctl show
switch 08f2e2c0-0a9f-4f76-ac5f-a74f809d6dd8 (s2)
    port hv1_vm01_vnet1
        addresses: ["00:de:ad:01:01:01 172.16.102.12"]
    port s2_r1
        type: router
        addresses: ["00:de:ad:ff:01:02 172.16.102.1"]
        router-port: r1_s2
    port hv1_vm00_vnet1
        addresses: ["00:de:ad:01:00:01 172.16.102.11"]
switch f89eb566-1792-445a-b0fc-c4dc2132e5ab (s3)
    port hv0_vm01_vnet1
        addresses: ["00:de:ad:00:01:01 172.16.103.12"]
    port hv0_vm00_vnet1
        addresses: ["00:de:ad:00:00:01 172.16.103.11"]
    port s3_r1
        type: router
        addresses: ["00:de:ad:ff:01:03 172.16.103.1"]
        router-port: r1_s3
switch 555cf4a3-aa90-44bc-9735-0b8ac50b3a0d (public)
    port ln_p1
        type: localnet
        addresses: ["unknown"]
    port public_r1
        type: router
        router-port: r1_public
router 38b406b7-e78d-48e6-bf5f-89ebec55ddd9 (r1)
    port r1_public
        mac: "40:44:00:00:00:03"
        networks: ["172.16.104.1/24"]
    port r1_s3
        mac: "00:de:ad:ff:01:03"
        networks: ["172.16.103.1/24"]
    port r1_s2
        mac: "00:de:ad:ff:01:02"
        networks: ["172.16.102.1/24"]
    nat 1e99a341-7b98-4758-9574-d9f85bde501a
        external ip: "172.16.104.200"
        logical ip: "172.16.102.11"
        type: "dnat_and_snat"
    nat c87da4ea-7a83-4fc8-9613-6092c1d07cb6
        external ip: "172.16.104.201"
        logical ip: "172.16.103.11"
        type: "dnat_and_snat"
[root@dell-per730-42 ovn]# 


produce a lot of RARP packets on another machine on switch:

from scapy.all import *
>>> for x in range(1000):
...   sendp(Ether(src="00:de:ad:01:00:01",dst="ff:ff:ff:ff:ff:ff")/ARP(op=2,hwsrc="00:de:aq:01:00:01",hwdst="00:de:aq:01:00:01",psrc="172.16.102.99",pdst="172.16.102.99"),iface="p4p1")
... 

[root@dell-per730-42 ~]# top

top - 03:39:47 up 1 day, 23:59,  2 users,  load average: 0.05, 0.10, 0.04
Tasks: 547 total,   1 running, 546 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  31967.2 total,  23898.4 free,   1976.0 used,   6092.8 buff/cache
MiB Swap:  16128.0 total,  16128.0 free,      0.0 used.  29505.8 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                     
38168 openvsw+  10 -10 3408452 157256  33044 S   1.3   0.5   0:41.38 ovs-vswitchd                                                
38237 root      10 -10  264336   5632   3356 S   0.3   0.0   0:01.79 ovn-controller                                              
38412 qemu      20   0 7025884 567320  21220 S   0.3   1.7   1:11.51 qemu-kvm                                                    
38571 qemu      20   0 6169032 585332  21116 S   0.3   1.8   1:01.51 qemu-kvm                                                    
    1 root      20   0  244620  11584   8276 S   0.0   0.0   0:06.95 systemd                                                     
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.07 kthreadd                      
[root@hp-dl380g10-05 ~]# top

top - 04:13:58 up 2 days, 32 min,  2 users,  load average: 0.00, 0.00, 0.00
Tasks: 525 total,   1 running, 524 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  63989.4 total,  55211.2 free,   2454.2 used,   6324.0 buff/cache
MiB Swap:  28608.0 total,  28608.0 free,      0.0 used.  60878.0 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                   
35052 openvsw+  10 -10 3408432 157416  33044 S   2.0   0.2   2:09.00 ovs-vswitchd                                              
 4503 root      20   0  424984  31828  16224 S   0.3   0.0   0:14.86 tuned                                                     
    1 root      20   0  247016  14340   9040 S   0.0   0.0   0:08.47 systemd                                                   
    2 root      20   0       0      0      0 S   0.0   0.0   0:00.15 kthreadd                                                  
    3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                    
    4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                
    6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-kblockd                                      
    7 root      20   0       0      0      0 I   0.0   0.0   0:00.01 kworker/u96:0-cpuset_migrate_mm                           
    9 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq                                              
   10 root      20   0       0      0      0 S   0.0   0.0   0:00.04 ksoftirqd/0                                               
   11 root      20   0       0      0      0 I   0.0   0.0   0:34.95 rcu_sched

Comment 7 errata-xmlrpc 2019-11-06 05:23:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3721