Bug 1731165
| Summary: | CPU usage goes high while constantly add veth port to ovs bridge | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | haidong li <haili> |
| Component: | openvswitch2.11 | Assignee: | Flavio Leitner <fleitner> |
| Status: | CLOSED NOTABUG | QA Contact: | haidong li <haili> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | FDP 19.D | CC: | ctrautma, fleitner, jhsiao, mcroce, ralongi, tredaelli |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-08-21 17:41:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Hi, is this really a bug? what should we expect during the veth insertion? I think the veth insertion shouldn't take so much 100% cpu usage,it's just adding or removing ovs port command. Adding or removing a port to the switch will force it to rebuild itself, internal caches and changes to the database.
I don't know the goal of this test, but if you just want to add many interfaces to the bridge as a goal, then add them in batches in a atomic transaction.
E.g.:
ovs-vsctl add-port br-int veth$vlanid -- \
add-port br-int veth${vlanid+1} -- \
add-port br-int veth${vlanid+2} -- \
add-port br-int veth${vlanid+3} -- \
add-port br-int veth${vlanid+4} -- \
add-port br-int veth${vlanid+5} -- \
add-port br-int veth${vlanid+6} -- \
add-port br-int veth${vlanid+7} -- \
add-port br-int veth${vlanid+8} -- \
add-port br-int veth${vlanid+9}
You could try adding 10 ifaces at once, or 50 maybe and see if it helps.
Of course, there will be a spike in CPU usage but for a significant shorter period of time.
HTH
fbl
Hi, It has been almost a month without updates, so I am going to close this. Feel free to re-open this bug if you need anything and I will be happy to continue to help. Thanks! fbl |
Description of problem: CPU usage goes high while constantly add veth port to ovs bridge Version-Release number of selected component (if applicable): [root@dell-per730-42 ~]# uname -a Linux dell-per730-42.rhts.eng.pek2.redhat.com 3.10.0-1061.el7.x86_64 #1 SMP Thu Jul 11 21:02:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux [root@dell-per730-42 ~]# rpm -qa | grep openvswitch openvswitch2.11-2.11.0-14.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-11.el7fdp.noarch [root@dell-per730-42 ~]# How reproducible: everytime Steps to Reproduce: 1.add ovs bridge 2.constantly add veth port to ovs bridge [root@dell-per730-42 ~]# top top - 07:50:36 up 47 min, 2 users, load average: 0.43, 0.16, 0.10 Tasks: 464 total, 9 running, 455 sleeping, 0 stopped, 0 zombie %Cpu(s): 1.5 us, 1.9 sy, 0.0 ni, 96.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 32744248 total, 31240408 free, 885072 used, 618768 buff/cache KiB Swap: 16515068 total, 16515068 free, 0 used. 31469072 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12944 openvsw+ 10 -10 3390808 138164 17080 R 132.9 0.4 0:59.22 ovs-vswitchd 2388 root 20 0 559508 22424 6704 R 14.0 0.1 0:15.53 NetworkManager 12862 openvsw+ 10 -10 71432 5164 1736 S 3.3 0.0 0:01.15 ovsdb-server 517 root 20 0 0 0 0 S 1.7 0.0 0:00.72 kworker/35:1 11632 root 20 0 115844 2464 1656 S 1.3 0.0 0:03.92 bash 157 root 20 0 0 0 0 S 0.7 0.0 0:00.06 kworker/29:0 318 root 20 0 0 0 0 S 0.7 0.0 0:00.90 kworker/32:1 321 root 20 0 0 0 0 S 0.7 0.0 0:00.18 kworker/27:1 516 root 20 0 0 0 0 S 0.7 0.0 0:01.02 kworker/33:1 1433 root 20 0 47664 12044 11664 R 0.7 0.0 0:01.39 systemd-journal 2369 dbus 20 0 66596 2656 1876 R 0.7 0.0 0:01.59 dbus-daemon 7239 root 20 0 49024 3888 488 R 0.7 0.0 0:00.21 systemd-udevd [root@dell-per730-42 openvswitch]# cat ovs-vswitchd.log | grep CPU|tail -20 2019-07-18T13:33:37.542Z|00790|poll_loop|INFO|wakeup due to [POLLIN] on fd 14 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (91% CPU usage) 2019-07-18T13:33:37.542Z|00791|poll_loop|INFO|wakeup due to [POLLIN] on fd 15 (<->/var/run/openvswitch/db.sock) at ../lib/stream-fd.c:157 (91% CPU usage) 2019-07-18T13:33:37.542Z|00792|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (91% CPU usage) 2019-07-18T13:33:43.261Z|00833|poll_loop|INFO|wakeup due to [POLLIN] on fd 14 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (99% CPU usage) 2019-07-18T13:33:49.316Z|00877|poll_loop|INFO|wakeup due to [POLLIN] on fd 14 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (100% CPU usage) 2019-07-18T13:33:55.142Z|00920|poll_loop|INFO|wakeup due to [POLLIN] on fd 14 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (98% CPU usage) 2019-07-18T13:34:22.595Z|00943|poll_loop|INFO|wakeup due to [POLLIN] on fd 14 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (74% CPU usage) 2019-07-18T13:34:22.595Z|00944|poll_loop|INFO|wakeup due to [POLLIN] on fd 15 (<->/var/run/openvswitch/db.sock) at ../lib/stream-fd.c:157 (74% CPU usage) 2019-07-18T13:34:22.722Z|00946|poll_loop|INFO|wakeup due to [POLLIN] on fd 14 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (74% CPU usage) 2019-07-18T13:34:22.722Z|00947|poll_loop|INFO|wakeup due to [POLLIN] on fd 15 (<->/var/run/openvswitch/db.sock) at ../lib/stream-fd.c:157 (74% CPU usage) 2019-07-18T13:34:25.271Z|00966|poll_loop|INFO|wakeup due to [POLLIN] on fd 14 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (74% CPU usage) 2019-07-18T13:34:31.474Z|01005|poll_loop|INFO|wakeup due to [POLLIN] on fd 16 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (99% CPU usage) 2019-07-18T13:36:23.283Z|01007|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (FIFO pipe:[72262]) at ../vswitchd/bridge.c:384 (87% CPU usage) 2019-07-18T13:36:23.783Z|01008|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (FIFO pipe:[72262]) at ../vswitchd/bridge.c:384 (87% CPU usage) 2019-07-18T13:36:24.283Z|01009|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (FIFO pipe:[72262]) at ../vswitchd/bridge.c:384 (87% CPU usage) 2019-07-18T13:36:24.503Z|01010|poll_loop|INFO|wakeup due to 220-ms timeout at ../vswitchd/bridge.c:2828 (87% CPU usage) 2019-07-18T13:36:24.783Z|01011|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (FIFO pipe:[72262]) at ../vswitchd/bridge.c:384 (87% CPU usage) 2019-07-18T13:36:25.283Z|01012|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (FIFO pipe:[72262]) at ../vswitchd/bridge.c:384 (87% CPU usage) 2019-07-18T13:36:25.784Z|01013|poll_loop|INFO|wakeup due to [POLLIN] on fd 22 (FIFO pipe:[72262]) at ../vswitchd/bridge.c:384 (87% CPU usage) 2019-07-18T13:36:59.802Z|01195|poll_loop|INFO|wakeup due to [POLLIN] on fd 14 (NETLINK_ROUTE<->NETLINK_ROUTE) at ../lib/netlink-socket.c:1401 (101% CPU usage) [root@dell-per730-42 openvswitch]# script: vlanid=0 for ((i=1;i<=20;i++)) do for ((j=1;j<=200;j++)) do ((vlanid+=1)) ip link add name veth$vlanid type veth peer name peer$vlanid ip add add 80.$i.$j.2/24 dev peer$vlanid ip link set veth$vlanid up ip link set peer$vlanid up if (($i<16));then M=0 fi if (($j<16));then N=0 fi X=`printf %x $i` Y=`printf %x $j` mac_peer_hv1=00:00:00:22:$M$X:$N$Y mac_peer_hv0=00:00:00:11:$M$X:$N$Y ip link set peer$vlanid address $mac_peer_hv0 ovs-vsctl add-port br-int veth$vlanid # vlanid=$(($i*$j)) ovs-vsctl set interface veth$vlanid external_ids:iface-id=lsprmt$vlanid done done Actual results: Expected results: Additional info: This issue can be reproduced by constantly add and remove a veth port from ovs bridge