Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1748731

Summary:	[OVN]cpu is high after create a lot of logical ports
Product:	Red Hat Enterprise Linux Fast Datapath	Reporter:	haidong li <haili>
Component:	OVN	Assignee:	Numan Siddique <nusiddiq>
Status:	CLOSED NOTABUG	QA Contact:	Jianlin Shi <jishi>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	FDP 19.F	CC:	ctrautma, fleitner, jishi, nusiddiq
Target Milestone:	---
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1749610 1749840 (view as bug list)		Environment:
Last Closed:	2020-03-02 14:05:18 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1749610, 1749840

Description haidong li 2019-09-04 03:51:53 UTC

Description of problem:
cpu is high after create a lot of logical ports

Version-Release number of selected component (if applicable):


How reproducible:
everytime

Steps to Reproduce:
1.set up ovn environment,create another ovs bridge and set bridge-mapping
2.add a lot of logical switches,and add logical ports to it.add a lot of veth ports and set ipv4 and ipv6 address to them
3.add localnet port to it and use different vlan tag for external traffic
4.after the configurartion,the cpu goes high

[root@dell-per740-04 ~]# top

top - 23:25:17 up 5 days,  2:13,  2 users,  load average: 3.36, 3.78, 3.39
Tasks: 565 total,   4 running, 561 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.1 us,  1.3 sy,  0.0 ni, 92.8 id,  0.0 wa,  0.0 hi,  1.8 si,  0.0 st
KiB Mem : 65213648 total, 30859008 free,  7398928 used, 26955712 buff/cache
KiB Swap: 32767996 total, 32767996 free,        0 used. 54367732 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                              
169046 openvsw+  10 -10 4253016   1.6g  17936 R 207.9  2.6  61361:05 ovs-vswitchd                                         
169120 root      10 -10 1534852   1.2g   1760 R 100.0  1.9   6363:19 ovn-controller                                       
   211 root      20   0       0      0      0 R  99.7  0.0 138:55.56 ksoftirqd/40                                         
  1114 root      20   0  134912  75128  74012 S   0.7  0.1  81:07.19 systemd-journal                                      
     1 root      20   0  216728  29960   4228 S   0.3  0.0   0:50.41 systemd                                              
     9 root      20   0       0      0      0 S   0.3  0.0  28:14.41 rcu_sched                                            
  1617 root      20   0   21928   1640    996 S   0.3  0.0   4:56.83 irqbalance                                           
273925 root      20   0       0      0      0 S   0.3  0.0   0:00.04 kworker/10:1                                         
274063 root      20   0  162452   2760   1580 R   0.3  0.0   0:00.06 top                                                  
     2 root      20   0       0      0      0 S   0.0  0.0   0:00.15 kthreadd                                             
     4 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H                                         
     6 root      20   0       0      0      0 S   0.0  0.0   0:05.00 ksoftirqd/0                 


someinfo in the ovs-vswitchd.log:
2019-09-04T03:36:28.190Z|177224|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (FIFO pipe:[24021198]) at ../lib/ovs-rcu.c:235 (99% CPU usage)
2019-09-04T03:36:28.191Z|177225|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (FIFO pipe:[24021198]) at ../lib/ovs-rcu.c:235 (99% CPU usage)
2019-09-04T03:36:29.193Z|92360|ovs_rcu(urcu6)|WARN|blocked 1002 ms waiting for main to quiesce
2019-09-04T03:36:29.214Z|40325|poll_loop(revalidator99)|INFO|wakeup due to 376-ms timeout at ../ofproto/ofproto-dpif-upcall.c:982 (53% CPU usage)
2019-09-04T03:36:29.326Z|40326|poll_loop(revalidator99)|INFO|wakeup due to [POLLIN] on fd 75 (FIFO pipe:[24021531]) at ../lib/ovs-thread.c:311 (53% CPU usage)
2019-09-04T03:36:29.715Z|40327|poll_loop(revalidator99)|INFO|wakeup due to 375-ms timeout at ../ofproto/ofproto-dpif-upcall.c:982 (53% CPU usage)
2019-09-04T03:36:30.192Z|92361|ovs_rcu(urcu6)|WARN|blocked 2001 ms waiting for main to quiesce
2019-09-04T03:36:31.520Z|40328|timeval(revalidator99)|WARN|Unreasonably long 1305ms poll interval (0ms user, 1235ms system)
2019-09-04T03:36:31.520Z|40329|timeval(revalidator99)|WARN|faults: 7 minor, 0 major
2019-09-04T03:36:31.520Z|40330|timeval(revalidator99)|WARN|context switches: 2280 voluntary, 2 involuntary
2019-09-04T03:36:32.191Z|92362|ovs_rcu(urcu6)|WARN|blocked 4000 ms waiting for main to quiesce
2019-09-04T03:36:36.191Z|92363|ovs_rcu(urcu6)|WARN|blocked 8000 ms waiting for main to quiesce
2019-09-04T03:36:44.192Z|92364|ovs_rcu(urcu6)|WARN|blocked 16000 ms waiting for main to quiesce

Actual results:


Expected results:


Additional info:
Use this case to reproduce the issue.It is needed to change the loop circle from 10*10 to 200*200,so there will be 4000 logical switches and veth ports.
 
http://pkgs.devel.redhat.com/cgit/tests/kernel/tree/networking/openvswitch/ovn
 function name "ovn_multi_vlan"

Comment 1 Numan Siddique 2019-09-05 15:59:43 UTC

Hi Haidong Li,

Since the issue is seen with ovs-vswitchd as well, would you mind cloning this bug to openvswitch component as well ?

Comment 2 haidong li 2019-09-06 14:47:51 UTC

Have copied this bug to bz1749840 on openvswitch2.11 component and to bz1749610 on ovs2.9

Comment 3 Numan Siddique 2019-09-09 09:46:26 UTC

I logged into the setup and looked into  a bit.
Something seems wrong with ovs-vswitchd. ovn-controller is breaking connection with the ovs-vswitchd (openflow connection) and it is reconnecting all the time.
That is why we are seeing high cpu usage in ovn-controller.
Looks like we need to investigate ovs-vswitchd and see what is going on there.

I don't think this is OVN issue.

Comment 4 Flavio Leitner 2020-02-11 17:22:46 UTC

Can you still reproduce this?
If yes, can I have access to the system while reproducing the issue?

Comment 5 Jianlin Shi 2020-02-12 01:21:39 UTC

the explanation is described in https://bugzilla.redhat.com/show_bug.cgi?id=1749840#c7, I think this bug can be closed