Bug 1748731 - [OVN]cpu is high after create a lot of logical ports
Summary: [OVN]cpu is high after create a lot of logical ports
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: FDP 19.F
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Numan Siddique
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks: 1749610 1749840
TreeView+ depends on / blocked
 
Reported: 2019-09-04 03:51 UTC by haidong li
Modified: 2020-03-02 14:05 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1749610 1749840 (view as bug list)
Environment:
Last Closed: 2020-03-02 14:05:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description haidong li 2019-09-04 03:51:53 UTC
Description of problem:
cpu is high after create a lot of logical ports

Version-Release number of selected component (if applicable):


How reproducible:
everytime

Steps to Reproduce:
1.set up ovn environment,create another ovs bridge and set bridge-mapping
2.add a lot of logical switches,and add logical ports to it.add a lot of veth ports and set ipv4 and ipv6 address to them
3.add localnet port to it and use different vlan tag for external traffic
4.after the configurartion,the cpu goes high

[root@dell-per740-04 ~]# top

top - 23:25:17 up 5 days,  2:13,  2 users,  load average: 3.36, 3.78, 3.39
Tasks: 565 total,   4 running, 561 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.1 us,  1.3 sy,  0.0 ni, 92.8 id,  0.0 wa,  0.0 hi,  1.8 si,  0.0 st
KiB Mem : 65213648 total, 30859008 free,  7398928 used, 26955712 buff/cache
KiB Swap: 32767996 total, 32767996 free,        0 used. 54367732 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                              
169046 openvsw+  10 -10 4253016   1.6g  17936 R 207.9  2.6  61361:05 ovs-vswitchd                                         
169120 root      10 -10 1534852   1.2g   1760 R 100.0  1.9   6363:19 ovn-controller                                       
   211 root      20   0       0      0      0 R  99.7  0.0 138:55.56 ksoftirqd/40                                         
  1114 root      20   0  134912  75128  74012 S   0.7  0.1  81:07.19 systemd-journal                                      
     1 root      20   0  216728  29960   4228 S   0.3  0.0   0:50.41 systemd                                              
     9 root      20   0       0      0      0 S   0.3  0.0  28:14.41 rcu_sched                                            
  1617 root      20   0   21928   1640    996 S   0.3  0.0   4:56.83 irqbalance                                           
273925 root      20   0       0      0      0 S   0.3  0.0   0:00.04 kworker/10:1                                         
274063 root      20   0  162452   2760   1580 R   0.3  0.0   0:00.06 top                                                  
     2 root      20   0       0      0      0 S   0.0  0.0   0:00.15 kthreadd                                             
     4 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H                                         
     6 root      20   0       0      0      0 S   0.0  0.0   0:05.00 ksoftirqd/0                 


someinfo in the ovs-vswitchd.log:
2019-09-04T03:36:28.190Z|177224|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (FIFO pipe:[24021198]) at ../lib/ovs-rcu.c:235 (99% CPU usage)
2019-09-04T03:36:28.191Z|177225|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (FIFO pipe:[24021198]) at ../lib/ovs-rcu.c:235 (99% CPU usage)
2019-09-04T03:36:29.193Z|92360|ovs_rcu(urcu6)|WARN|blocked 1002 ms waiting for main to quiesce
2019-09-04T03:36:29.214Z|40325|poll_loop(revalidator99)|INFO|wakeup due to 376-ms timeout at ../ofproto/ofproto-dpif-upcall.c:982 (53% CPU usage)
2019-09-04T03:36:29.326Z|40326|poll_loop(revalidator99)|INFO|wakeup due to [POLLIN] on fd 75 (FIFO pipe:[24021531]) at ../lib/ovs-thread.c:311 (53% CPU usage)
2019-09-04T03:36:29.715Z|40327|poll_loop(revalidator99)|INFO|wakeup due to 375-ms timeout at ../ofproto/ofproto-dpif-upcall.c:982 (53% CPU usage)
2019-09-04T03:36:30.192Z|92361|ovs_rcu(urcu6)|WARN|blocked 2001 ms waiting for main to quiesce
2019-09-04T03:36:31.520Z|40328|timeval(revalidator99)|WARN|Unreasonably long 1305ms poll interval (0ms user, 1235ms system)
2019-09-04T03:36:31.520Z|40329|timeval(revalidator99)|WARN|faults: 7 minor, 0 major
2019-09-04T03:36:31.520Z|40330|timeval(revalidator99)|WARN|context switches: 2280 voluntary, 2 involuntary
2019-09-04T03:36:32.191Z|92362|ovs_rcu(urcu6)|WARN|blocked 4000 ms waiting for main to quiesce
2019-09-04T03:36:36.191Z|92363|ovs_rcu(urcu6)|WARN|blocked 8000 ms waiting for main to quiesce
2019-09-04T03:36:44.192Z|92364|ovs_rcu(urcu6)|WARN|blocked 16000 ms waiting for main to quiesce

Actual results:


Expected results:


Additional info:
Use this case to reproduce the issue.It is needed to change the loop circle from 10*10 to 200*200,so there will be 4000 logical switches and veth ports.
 
http://pkgs.devel.redhat.com/cgit/tests/kernel/tree/networking/openvswitch/ovn
 function name "ovn_multi_vlan"

Comment 1 Numan Siddique 2019-09-05 15:59:43 UTC
Hi Haidong Li,

Since the issue is seen with ovs-vswitchd as well, would you mind cloning this bug to openvswitch component as well ?

Comment 2 haidong li 2019-09-06 14:47:51 UTC
Have copied this bug to bz1749840 on openvswitch2.11 component and to bz1749610 on ovs2.9

Comment 3 Numan Siddique 2019-09-09 09:46:26 UTC
I logged into the setup and looked into  a bit.
Something seems wrong with ovs-vswitchd. ovn-controller is breaking connection with the ovs-vswitchd (openflow connection) and it is reconnecting all the time.
That is why we are seeing high cpu usage in ovn-controller.
Looks like we need to investigate ovs-vswitchd and see what is going on there.

I don't think this is OVN issue.

Comment 4 Flavio Leitner 2020-02-11 17:22:46 UTC
Can you still reproduce this?
If yes, can I have access to the system while reproducing the issue?

Comment 5 Jianlin Shi 2020-02-12 01:21:39 UTC
the explanation is described in https://bugzilla.redhat.com/show_bug.cgi?id=1749840#c7, I think this bug can be closed


Note You need to log in before you can comment on or make changes to this bug.