This bug was initially created as a copy of Bug #1677616 I am copying this bug because: Description of problem: Customer is facing delay on the security groups being applied """ There is a delay in applying security groups on instances created by a heat template. Thus there is no connection between instances. After some time (about 30 min) it starts working. Investigation revealed high CPU usage by the ovn-controller. """ I found there is alot of flows, on br-int ( Feb 12 ) $ wc -l ovs-ofctl_-O_OpenFlow13_dump-flows_br-int 329665 ovs-ofctl_-O_OpenFlow13_dump-flows_br-int This can be found on the journal: Feb 11 16:17:51 cnifthc02cmp004 journal: 2019-02-11T21:17:51Z|1519076|timeval|WARN|Unreasonably long 5381ms poll interval (4889ms user, 492ms system) Feb 11 16:17:51 cnifthc02cmp004 journal: 2019-02-11T21:17:51Z|1519077|timeval|WARN|faults: 571166 minor, 0 major Feb 11 16:17:51 cnifthc02cmp004 journal: 2019-02-11T21:17:51Z|1519078|timeval|WARN|context switches: 0 voluntary, 7 involuntary Feb 11 16:17:51 cnifthc02cmp004 journal: 2019-02-11T21:17:51Z|1519079|poll_loop|INFO|Dropped 1 log messages in last 5 seconds (most recently, 5 seconds ago) due to excessive rate Feb 11 16:17:51 cnifthc02cmp004 journal: 2019-02-11T21:17:51Z|1519080|poll_loop|INFO|wakeup due to [POLLIN] on fd 8 (192.168.105.14:43410<->192.168.105.251:6642) at lib/stream-fd.c:157 (101% CPU usage) Feb 11 16:17:51 cnifthc02cmp004 journal: 2019-02-11T21:17:51Z|1519081|pinctrl|INFO|Dropped 2 log messages in last 5 seconds (most recently, 5 seconds ago) due to excessive rate Feb 11 16:17:51 cnifthc02cmp004 journal: 2019-02-11T21:17:51Z|1519082|pinctrl|INFO|DHCPACK fa:16:3e:89:44:2a 10.11.154.160 Feb 11 16:17:56 cnifthc02cmp004 journal: 2019-02-11T21:17:56Z|1519083|timeval|WARN|Unreasonably long 5252ms poll interval (4903ms user, 346ms system) Feb 11 16:17:56 cnifthc02cmp004 journal: 2019-02-11T21:17:56Z|1519084|timeval|WARN|faults: 403610 minor, 0 major Feb 11 16:17:56 cnifthc02cmp004 journal: 2019-02-11T21:17:56Z|1519085|timeval|WARN|context switches: 0 voluntary, 9 involuntary Feb 11 16:17:56 cnifthc02cmp004 journal: 2019-02-11T21:17:56Z|1519086|poll_loop|INFO|Dropped 1 log messages in last 5 seconds (most recently, 5 seconds ago) due to excessive rate Feb 11 16:17:56 cnifthc02cmp004 journal: 2019-02-11T21:17:56Z|1519087|poll_loop|INFO|wakeup due to [POLLIN] on fd 8 (192.168.105.14:43410<->192.168.105.251:6642) at lib/stream-fd.c:157 (101% CPU usage) Feb 11 16:17:56 cnifthc02cmp004 journal: 2019-02-11T21:17:56Z|1519088|pinctrl|INFO|DHCPACK fa:16:3e:e5:df:b6 10.11.63.32 Feb 11 16:17:59 cnifthc02cmp004 dhclient[742515]: DHCPDISCOVER on enp7s0f0 to 255.255.255.255 port 67 interval 20 (xid=0x1fa5a1b6) Feb 11 16:18:01 cnifthc02cmp004 journal: 2019-02-11T21:18:01Z|1519089|timeval|WARN|Unreasonably long 5054ms poll interval (4739ms user, 314ms system) Feb 11 16:18:01 cnifthc02cmp004 journal: 2019-02-11T21:18:01Z|1519090|timeval|WARN|faults: 313461 minor, 0 major Feb 11 16:18:01 cnifthc02cmp004 journal: 2019-02-11T21:18:01Z|1519091|timeval|WARN|context switches: 0 voluntary, 4 involuntary Feb 11 16:18:01 cnifthc02cmp004 journal: 2019-02-11T21:18:01Z|1519092|poll_loop|INFO|Dropped 1 log messages in last 5 seconds (most recently, 5 seconds ago) due to excessive rate Feb 11 16:18:01 cnifthc02cmp004 journal: 2019-02-11T21:18:01Z|1519093|poll_loop|INFO|wakeup due to [POLLIN] on fd 8 (192.168.105.14:43410<->192.168.105.251:6642) at lib/stream-fd.c:157 (101% CPU usage) .. .. Feb 12 01:43:17 cnifthc02cmp004 journal: 2019-02-12T06:43:17Z|1561264|timeval|WARN|Unreasonably long 4598ms poll interval (4585ms user, 13ms system) Feb 12 01:43:17 cnifthc02cmp004 journal: 2019-02-12T06:43:17Z|1561265|timeval|WARN|context switches: 0 voluntary, 1 involuntary Feb 12 01:43:17 cnifthc02cmp004 journal: 2019-02-12T06:43:17Z|1561266|poll_loop|INFO|Dropped 1 log messages in last 5 seconds (most recently, 5 seconds ago) due to excessive rate Feb 12 01:43:17 cnifthc02cmp004 journal: 2019-02-12T06:43:17Z|1561267|poll_loop|INFO|wakeup due to [POLLIN] on fd 8 (192.168.105.14:58120<->192.168.105.251:6642) at lib/stream-fd.c:157 (100% CPU usage) Feb 12 01:43:17 cnifthc02cmp004 journal: 2019-02-12T06:43:17Z|1561268|pinctrl|INFO|Dropped 2 log messages in last 5 seconds (most recently, 5 seconds ago) due to excessive rate Feb 12 01:43:17 cnifthc02cmp004 journal: 2019-02-12T06:43:17Z|1561269|pinctrl|INFO|DHCPACK fa:16:3e:40:d5:92 10.11.151.184 Feb 12 01:43:22 cnifthc02cmp004 journal: 2019-02-12T06:43:22Z|1561270|timeval|WARN|Unreasonably long 4713ms poll interval (4713ms user, 0ms system) Feb 12 01:43:22 cnifthc02cmp004 journal: 2019-02-12T06:43:22Z|1561271|timeval|WARN|faults: 52678 minor, 0 major Feb 12 01:43:22 cnifthc02cmp004 journal: 2019-02-12T06:43:22Z|1561272|timeval|WARN|context switches: 0 voluntary, 3 involuntary Feb 12 01:43:22 cnifthc02cmp004 journal: 2019-02-12T06:43:22Z|1561273|pinctrl|INFO|Dropped 1 log messages in last 4 seconds (most recently, 4 seconds ago) due to excessive rate Feb 12 01:43:22 cnifthc02cmp004 journal: 2019-02-12T06:43:22Z|1561274|pinctrl|INFO|DHCPACK fa:16:3e:e5:df:b6 10.11.63.32 Feb 12 01:43:22 cnifthc02cmp004 journal: 2019-02-12T06:43:22Z|1561275|pinctrl|INFO|DHCPACK fa:16:3e:e5:df:b6 10.11.63.32 These are number from today, when cu. faced the issue again: [root@cnifthc02ctl001 heat-admin]# docker exec -itu root bde0cda6a714 /bin/bash ()[root@ctl001 /]# ovn-nbctl list ACL |grep uuid|wc -l 5051 ()[root@ctl001 /]# ovn-nbctl list Logical_Switch | grep uuid | wc -l 62 ()[root@ctl001 /]# ovn-nbctl list Logical_Router | grep uuid | wc -l 20 [rootctl001 /]# ovn-nbctl list Logical_Switch_Port | grep uuid | wc -l 501 Version-Release number of selected component (if applicable): OSP13 How reproducible: Second time it happens Steps to Reproduce: 1. unsure 2. 3. Actual results: ovs high cpu usage. Huge delay on security groups applied Expected results: none of the actual results Additional info:
reproduced on the old version: [root@hp-dl380pg8-04 opt]# uname -a Linux hp-dl380pg8-04.rhts.eng.pek2.redhat.com 3.10.0-957.el7.x86_64 #1 SMP Thu Oct 4 20:48:51 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux [root@hp-dl380pg8-04 opt]# rpm -qa | grep openvswitch openvswitch-selinux-extra-policy-1.0-11.el7fdp.noarch openvswitch2.11-2.11.0-4.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-1.0-121.noarch [root@hp-dl380pg8-04 opt]# rpm -qa | grep ovn ovn2.11-host-2.11.0-4.el7fdp.x86_64 ovn2.11-common-2.11.0-4.el7fdp.x86_64 ovn2.11-central-2.11.0-4.el7fdp.x86_64 ovn2.11-2.11.0-4.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-1.0-121.noarch [root@hp-dl380pg8-04 opt]# add a lot of logical ports and excute the script: from scapy.all import * for x in range(1000): sendp(Ether(src="00:de:ad:01:00:01", dst="ff:ff:ff:ff:ff:ff")/IP(src="0.0.0.0", dst="255.255.255.255")/UDP(sport=68,dport=67)/BOOTP(chaddr="00:de:ad:01:00:01",ciaddr="0.0.0.0")/DHCP(options=[("message-type","discover")])/DHCP(options=[("subnet_mask","255.255.255.0")])/DHCP(options=[("router","0.0.0.0")])/DHCP(options=[("name_server", "0.0.0.0"), "end"]), iface="eth1") [root@hp-dl380pg8-04 ovn]# top top - 02:22:28 up 2 days, 3:03, 4 users, load average: 0.80, 0.27, 0.23 Tasks: 314 total, 2 running, 312 sleeping, 0 stopped, 0 zombie %Cpu(s): 6.1 us, 0.3 sy, 0.0 ni, 93.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 32802316 total, 25893460 free, 1955284 used, 4953572 buff/cache KiB Swap: 16515068 total, 16515068 free, 0 used. 30065744 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 59606 root 10 -10 223824 19980 1676 R 100.3 0.1 82:17.21 ovn-controller 60028 qemu 20 0 2932040 1.1g 10236 S 45.8 3.6 21:10.92 qemu-kvm 59533 openvsw+ 10 -10 1938872 241900 17392 S 6.0 0.7 14:44.17 ovs-vswitchd 56162 root 20 0 347080 7036 5188 S 1.0 0.0 0:06.11 virsh 59079 root 20 0 1538776 26328 14312 S 1.0 0.1 0:08.69 libvirtd 59943 root 20 0 150432 5684 4340 S 1.0 0.0 0:11.24 sshd 60230 root 20 0 0 0 0 S 0.7 0.0 0:10.58 vhost-60028 10 root 20 0 0 0 0 S 0.3 0.0 2:24.81 rcu_sched [root@hp-dl380pg8-04 opt]# cat /var/log/openvswitch/ovn-controller.log | grep CPU 2019-04-11T06:22:14.381Z|01075|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (101% CPU usage) 2019-04-11T06:22:14.564Z|01076|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (101% CPU usage) 2019-04-11T06:22:14.745Z|01077|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (101% CPU usage) 2019-04-11T06:22:14.926Z|01078|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (101% CPU usage) 2019-04-11T06:22:15.108Z|01079|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (101% CPU usage) 2019-04-11T06:22:15.288Z|01080|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (101% CPU usage) 2019-04-11T06:22:15.469Z|01081|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (101% CPU usage) 2019-04-11T06:22:15.651Z|01082|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (101% CPU usage) 2019-04-11T06:22:15.831Z|01085|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (101% CPU usage) 2019-04-11T06:22:16.011Z|01086|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (101% CPU usage) 2019-04-11T06:22:20.528Z|01090|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (102% CPU usage) 2019-04-11T06:22:26.505Z|01096|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (102% CPU usage) =================================================================================================================================== verified on the latest version: [root@hp-dl380pg8-04 scapy]# rpm -qa | grep openvswitch openvswitch-selinux-extra-policy-1.0-11.el7fdp.noarch openvswitch2.11-2.11.0-5.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-1.0-121.noarch [root@hp-dl380pg8-04 scapy]# rpm -qa | grep ovn ovn2.11-central-2.11.0-7.el7fdp.x86_64 ovn2.11-2.11.0-7.el7fdp.x86_64 ovn2.11-host-2.11.0-7.el7fdp.x86_64 kernel-kernel-networking-openvswitch-ovn-1.0-121.noarch [root@hp-dl380pg8-04 scapy]# [root@hp-dl380pg8-04 scapy]# top - 06:15:23 up 2 days, 6:56, 4 users, load average: 0.09, 0.38, 0.52 Tasks: 314 total, 1 running, 313 sleeping, 0 stopped, 0 zombie %Cpu(s): 1.5 us, 0.4 sy, 0.0 ni, 98.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 32802316 total, 26416436 free, 1588120 used, 4797760 buff/cache KiB Swap: 16515068 total, 16515068 free, 0 used. 30434164 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 60896 qemu 20 0 2929992 810684 10236 S 43.9 2.5 1:30.35 qemu-kvm 60409 openvsw+ 10 -10 1939560 234360 17392 S 7.0 0.7 0:25.01 ovs-vswitc+ 59570 root 20 0 1547032 24220 14524 S 1.0 0.1 0:01.90 libvirtd 56012 root 20 0 347080 6876 5164 S 0.7 0.0 0:00.54 virsh 59943 root 20 0 150432 5684 4340 S 0.7 0.0 0:11.81 sshd 61121 root 20 0 0 0 0 S 0.7 0.0 0:00.35 vhost-60896 10 root 20 0 0 0 0 S 0.3 0.0 2:32.25 rcu_sched 115 root rt 0 0 0 0 S 0.3 0.0 0:01.14 migration/+ 145 root 20 0 0 0 0 S 0.3 0.0 0:01.99 kworker/11+ 3751 root 20 0 0 0 0 S 0.3 0.0 0:15.79 xfsaild/dm+ 6782 root 20 0 0 0 0 S 0.3 0.0 0:30.08 xfsaild/dm+ 50665 root 20 0 162144 2456 1592 R 0.3 0.0 0:00.10 top
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0907