Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1729846

Summary: [OVN] [RHEL7] ovn-controller is using 100% cpu when receive lots of incorrect arp packets
Product: Red Hat Enterprise Linux Fast Datapath Reporter: haidong li <haili>
Component: ovn2.11Assignee: Dumitru Ceara <dceara>
Status: CLOSED ERRATA QA Contact: haidong li <haili>
Severity: unspecified Docs Contact:
Priority: high    
Version: FDP 19.GCC: ctrautma, dceara, jishi, kfida, mmichels, nusiddiq, qding
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovn2.11-2.11.1-7.el7fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1761411 (view as bug list) Environment:
Last Closed: 2019-11-06 05:00:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1761411    

Description haidong li 2019-07-15 06:13:11 UTC
Description of problem:
ovn-controller is using 100% cpu when receive lots of incorrect arp packets

Version-Release number of selected component (if applicable):
[root@dell-per730-19 ovn]# uname -a
Linux dell-per730-19.rhts.eng.pek2.redhat.com 3.10.0-1060.el7.x86_64 #1 SMP Mon Jul 1 18:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@dell-per730-19 ovn]# rpm -qa | grep openvswitch
openvswitch-selinux-extra-policy-1.0-11.el7fdp.noarch
openvswitch2.11-2.11.0-14.el7fdp.x86_64
[root@dell-per730-19 ovn]# rpm -qa | grep ovn
ovn2.11-host-2.11.0-19.el7fdp.x86_64
ovn2.11-central-2.11.0-19.el7fdp.x86_64
ovn2.11-2.11.0-19.el7fdp.x86_64
[root@dell-per730-19 ovn]#

How reproducible:
everytime

Steps to Reproduce:
1.setup ovn environment and add a lot of logical switch port
2.connect a guest to logical switch
3.send a lot of incorrect arp packets with 0.0.0.0 as the "Target IP address"

Actual results:
the cpu usage is 101% by the ovn-controller

scripts in guest:
from scapy.all import *
for x in range(1000): 
  sendp(Ether(src="00:de:ad:01:00:01", dst="ff:ff:ff:ff:ff:ff")/ARP(op=1,hwsrc='00:de:ad:01:00:01',hwdst='00:00:00:00:00:00',psrc='172.16.102.11',pdst='0.0.0.0'),iface="eth1")

[root@dell-per730-19 ~]# top

top - 02:11:07 up 3 days,  4:07,  2 users,  load average: 0.18, 0.06, 0.06
Tasks: 465 total,   2 running, 463 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.8 us,  0.3 sy,  0.0 ni, 95.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 65708808 total, 56629192 free,  3094836 used,  5984780 buff/cache
KiB Swap: 29241340 total, 29241340 free,        0 used. 62099024 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                           
11493 root      10 -10  285408   9848   1716 R 100.7  0.0   1:58.82 ovn-controller                                    
11650 qemu      20   0 2974064 929068  11900 S  43.4  1.4   2:51.37 qemu-kvm                                          
11419 openvsw+  10 -10 2323772  97776  17916 S  10.9  0.1   1:43.32 ovs-vswitchd                                      
10527 root      20   0 1555548  26604  14540 S   1.0  0.0   0:03.44 libvirtd                                          
11819 qemu      20   0 2950492 614672  11904 S   1.0  0.9   1:37.65 qemu-kvm                                          
 5834 root      20   0  152752   5740   4408 S   0.7  0.0   0:02.07 sshd                                              
 6064 root      20   0  162312   2684   1600 R   0.7  0.0   1:16.23 top                                               
12325 root      20   0       0      0      0 S   0.7  0.0   0:00.42 vhost-11650                                       
13975 root      20   0  347304   6876   5160 S   0.7  0.0   0:00.17 virsh                                             
    9 root      20   0       0      0      0 S   0.3  0.0   0:52.33 rcu_sched                                         
 1443 root      20   0   22296   1960    996 S   0.3  0.0   3:28.58 irqbalance                                        
 3592 root      20   0  548544   9536   6784 S   0.3  0.0   0:24.74 NetworkManager                                    
12048 root      20   0       0      0      0 S   0.3  0.0   0:00.19 kworker/8:1                                       
12355 root      20   0       0      0      0 S   0.3  0.0   0:00.07 vhost-1181

[root@dell-per730-19 ovn]# cat /var/log/openvswitch/ovn-controller.log | grep CPU | tail -4
2019-07-15T06:11:07.320Z|00056|poll_loop|INFO|wakeup due to 0-ms timeout at ovn/controller/pinctrl.c:2489 (100% CPU usage)
2019-07-15T06:11:08.244Z|00058|poll_loop|INFO|wakeup due to 0-ms timeout at ovn/controller/pinctrl.c:2489 (100% CPU usage)
2019-07-15T06:11:14.249Z|00060|poll_loop|INFO|wakeup due to 0-ms timeout at ovn/controller/pinctrl.c:2489 (100% CPU usage)
2019-07-15T06:11:21.322Z|00062|poll_loop|INFO|wakeup due to [POLLIN] on fd 21 (20.0.0.25:48756<->20.0.0.25:6642) at lib/stream-fd.c:157 (99% CPU usage)
[root@dell-per730-19 ovn]# 

Expected results:
cpu usage is not high

Additional info:

Comment 4 haidong li 2019-10-23 03:45:57 UTC
This bug is verified on the latest version:
root@dell-per740-18 ovn]# uname -a
Linux dell-per740-18.rhts.eng.pek2.redhat.com 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root@dell-per740-18 ovn]# rpm -qa | grep openvswitch
openvswitch-selinux-extra-policy-1.0-14.el7fdp.noarch
openvswitch2.11-2.11.0-26.el7fdp.x86_64
kernel-kernel-networking-openvswitch-ovn_ha-1.0-41.noarch
[root@dell-per740-18 ovn]# rpm -qa | grep ovn
ovn2.11-2.11.1-8.el7fdp.x86_64
ovn2.11-central-2.11.1-8.el7fdp.x86_64
ovn2.11-host-2.11.1-8.el7fdp.x86_64
kernel-kernel-networking-openvswitch-ovn_ha-1.0-41.noarch
[root@dell-per740-18 ovn]# 

[root@dell-per740-18 ovn]# ovn-nbctl show
switch 086e87ca-1eee-4440-9790-6f0d7859360d (s3)
    port hv0_vm00_vnet1
        addresses: ["00:de:ad:00:00:01 172.16.103.11"]
    port hv0_vm01_vnet1
        addresses: ["00:de:ad:00:01:01 172.16.103.12"]
    port s3_r1
        type: router
        addresses: ["00:de:ad:ff:01:03 172.16.103.1"]
        router-port: r1_s3
switch ecc8b593-19fe-4509-8590-a68edcd2185d (public)
    port ln_p1
        type: localnet
        addresses: ["unknown"]
    port public_r1
        type: router
        router-port: r1_public
switch 3827320f-2e1f-4ad2-9394-cfef23e086dc (s2)
    port s2_r1
        type: router
        addresses: ["00:de:ad:ff:01:02 172.16.102.1"]
        router-port: r1_s2
    port hv1_vm01_vnet1
        addresses: ["00:de:ad:01:01:01 172.16.102.12"]
    port hv1_vm00_vnet1
        addresses: ["00:de:ad:01:00:01 172.16.102.11"]
router 5b2f265f-abdd-4f82-b57f-a45ed441f52d (r1)
    port r1_s3
        mac: "00:de:ad:ff:01:03"
        networks: ["172.16.103.1/24"]
    port r1_public
        mac: "40:44:00:00:00:03"
        networks: ["172.16.104.1/24"]
    port r1_s2
        mac: "00:de:ad:ff:01:02"
        networks: ["172.16.102.1/24"]
    nat 599e4d31-0fba-4af3-8dc5-cca7adea2b42
        external ip: "172.16.104.200"
        logical ip: "172.16.102.11"
        type: "dnat_and_snat"
    nat de0cfce7-68c6-4612-bdf8-4eaa5299e6c8
        external ip: "172.16.104.201"
        logical ip: "172.16.103.11"
        type: "dnat_and_snat"
[root@dell-per740-18 ovn]# 

scripts in guest:
from scapy.all import *
for x in range(1000): 
  sendp(Ether(src="00:de:ad:01:00:01", dst="ff:ff:ff:ff:ff:ff")/ARP(op=1,hwsrc='00:de:ad:01:00:01',hwdst='00:00:00:00:00:00',psrc='172.16.102.11',pdst='0.0.0.0'),iface="eth1")

[root@dell-per740-18 ~]# top

top - 23:40:53 up 1 day, 19:50,  2 users,  load average: 0.61, 0.16, 0.08
Tasks: 533 total,   1 running, 532 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.1 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 65280816 total, 56768988 free,  2849836 used,  5661992 buff/cache
KiB Swap: 32767996 total, 32767996 free,        0 used. 61932168 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                 
 28062 openvsw+  10 -10 3413632 141724  17936 S   3.0  0.2  36:41.50 ovs-vswitchd                                            
 28503 qemu      20   0 2920808 697496  10248 S   1.0  1.1   3:01.08 qemu-kvm                                                
  1525 root      20   0   21884   1620    996 S   0.3  0.0   2:20.03 irqbalance                                              
 25468 root      20   0  102896   5528   3444 S   0.3  0.0   0:00.27 dhclient                                                
 28148 root      10 -10  281840  10420   1820 S   0.3  0.0   0:21.79 ovn-controller                                          
 28333 qemu      20   0 2920804 776872  10252 S   0.3  1.2   4:00.01 qemu-kvm                                                
 87543 root      20   0  162456   2740   1580 R   0.3  0.0   0:00.11 top                                                     
     1 root      20   0  193924   7064   4216 S   0.0  0.0   0:10.96 systemd                                                 
     2 root      20   0       0      0      0 S   0.0  0.0   0:00.06 kthreadd

Comment 6 errata-xmlrpc 2019-11-06 05:00:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3718