Description of problem: CPU will be very high after set "ovn-nbctl acl-add pg1 to-lport 3 ip reject" Version-Release number of selected component (if applicable): # rpm -qa|grep ovn ovn2.13-central-20.12.0-24.el8fdp.x86_64 ovn2.13-20.12.0-24.el8fdp.x86_64 ovn2.13-host-20.12.0-24.el8fdp.x86_64 [root@dell-per740-53 acl]# rpm -qa|grep openvs openvswitch2.13-2.13.0-95.el8fdp.x86_64 python3-openvswitch2.13-2.13.0-95.el8fdp.x86_64 How reproducible: always Steps to Reproduce: server: ovn-nbctl ls-add ls ovn-nbctl lsp-add ls vm1 ovn-nbctl lsp-set-addresses vm1 00:00:00:00:00:01 ovn-nbctl lsp-add ls vm2 ovn-nbctl lsp-set-addresses vm2 00:00:00:00:00:02 ovn-nbctl lsp-add ls vm3 ovn-nbctl lsp-set-addresses vm3 00:00:00:00:00:03 ip netns add vm1 ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal ip link set vm1 netns vm1 ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01 ip netns exec vm1 ip addr add 42.42.42.1/24 dev vm1 ip netns exec vm1 ip link set vm1 up ovs-vsctl set Interface vm1 external_ids:iface-id=vm1 ovn-nbctl pg-add pg1 vm1 vm2 client: ip netns add vm2 ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal ip link set vm2 netns vm2 ip netns exec vm2 ip link set vm2 address 00:00:00:00:00:02 ip netns exec vm2 ip addr add 42.42.42.2/24 dev vm2 ip netns exec vm2 ip link set vm2 up ip netns exec vm2 ip link set lo up ovs-vsctl set Interface vm2 external_ids:iface-id=vm2 ip netns add vm3 ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal ip link set vm3 netns vm3 ip netns exec vm3 ip link set vm3 address 00:00:00:00:00:03 ip netns exec vm3 ip addr add 42.42.42.3/24 dev vm3 ip netns exec vm3 ip link set vm3 up ip netns exec vm3 ip link set lo up ip netns exec vm2 ip route add default via 42.42.42.5 ip netns exec vm3 ip route add default via 42.42.42.5 ovs-vsctl set Interface vm3 external_ids:iface-id=vm3 then on server set the command: # ovn-nbctl acl-add pg1 to-lport 3 ip reject after a few seconds,we can see that cpu comes up to very high # top top - 04:46:14 up 8:25, 1 user, load average: 2.17, 3.71, 5.58 Tasks: 592 total, 3 running, 589 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.7 us, 5.3 sy, 0.0 ni, 90.6 id, 0.0 wa, 0.2 hi, 1.2 si, 0.0 st MiB Mem : 128155.4 total, 125945.5 free, 1470.1 used, 739.7 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 125743.8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 82023 openvsw+ 10 -10 3404912 159656 36036 R 427.6 0.1 27:21.14 ovs-vswitchd 82124 openvsw+ 10 -10 282456 7988 3476 S 31.8 0.0 2:07.27 ovn-controller 134 root 20 0 0 0 0 R 2.3 0.0 0:12.59 ksoftirqd/20 83886 root 20 0 65908 5272 4092 R 0.7 0.0 0:00.04 top and we can see logs in ovs: 2021-03-02T09:47:25.183Z|00204|poll_loop|INFO|Dropped 57187 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate 2021-03-02T09:47:25.183Z|00205|poll_loop|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif.c:1941 (81% CPU usage) 2021-03-02T09:47:31.182Z|00206|poll_loop|INFO|Dropped 55136 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate 2021-03-02T09:47:31.183Z|00207|poll_loop|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif.c:1941 (84% CPU usage) Actual results: cpu is very high, sometimes much higher than that showed in the description and last all the time until delete the acl rule Expected results: cpu is not very high Additional info:
Hi, this is a self comment for future debugging: apparently what is happened here is that when you add two or more ports to the pg and add acl to reject the ip traffic if one of the group members has sent an IP packet to any other group member it will get ip/icmp unreachable msg which in turn will replay with ip/icmp unreachable msg because we added the acl and here we will enter to infinity loop of unreachable msgs between group members which will cause ovs-vswitch to receive huge amount of packets and kill the system CPUs.
suggested fix submitted upstream: http://patchwork.ozlabs.org/project/ovn/patch/20211006152815.2134267-1-mheib@redhat.com/
reproduced on version: # rpm -qa|grep ovn ovn-2021-central-21.09.0-12.el8fdp.x86_64 ovn-2021-host-21.09.0-12.el8fdp.x86_64 ovn-2021-21.09.0-12.el8fdp.x86_64 # top top - 02:18:01 up 11 days, 23:22, 1 user, load average: 2.46, 2.18, 1.85 Tasks: 556 total, 2 running, 554 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.7 us, 0.7 sy, 0.0 ni, 98.4 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st MiB Mem : 128155.6 total, 94641.8 free, 25787.9 used, 7725.9 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 101463.5 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3157955 openvsw+ 10 -10 302164 74132 41028 S 41.7 0.1 9:14.07 ovs-vswitchd 3158057 openvsw+ 10 -10 282684 28604 3552 S 21.5 0.0 4:54.42 ovn-controller 10 root 20 0 0 0 0 R 1.0 0.0 7:08.65 ksoftirqd/0 3164366 root 20 0 65940 5164 3968 R 0.7 0.0 0:00.04 top 11 root 20 0 0 0 0 I 0.3 0.0 4:04.50 rcu_sched 1 root 20 0 175684 14324 9008 S 0.0 0.0 0:11.47 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.19 kthreadd 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp 6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-events_highpri 9 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq 12 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 13 root rt 0 0 0 0 S 0.0 0.0 0:00.09 watchdog/0 14 root 20 0 0 0 0 S 0.0 0.0 0:00.01 cpuhp/0 15 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/1 16 root rt 0 0 0 0 S 0.0 0.0 0:00.63 watchdog/1 17 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/1 18 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1 19 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker/1:0 20 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/1:0H 21 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/2 22 root rt 0 0 0 0 S 0.0 0.0 0:00.26 watchdog/2 23 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/2 24 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/2 26 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/2:0H-events_highpri 27 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/3 28 root rt 0 0 0 0 S 0.0 0.0 0:00.65 watchdog/3 29 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/3 30 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/3 31 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker/3:0 32 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/3:0H verified on version: # rpm -qa|grep ovn ovn-2021-host-21.12.0-11.el8fdp.x86_64 ovn-2021-21.12.0-11.el8fdp.x86_64 ovn-2021-central-21.12.0-11.el8fdp.x86_64 [root@dell-per740-53 acl]# top top - 01:32:53 up 11 days, 22:37, 1 user, load average: 0.00, 0.13, 0.81 Tasks: 555 total, 2 running, 553 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.1 us, 0.3 sy, 0.0 ni, 99.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 128155.6 total, 94682.8 free, 25756.5 used, 7716.3 buff/cache MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 101503.4 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3155120 root 20 0 65940 5016 3824 R 5.9 0.0 0:00.02 top 1 root 20 0 175684 14324 9008 S 0.0 0.0 0:10.70 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.19 kthreadd 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp 6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/0:0H-events_highpri 9 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 mm_percpu_wq 10 root 20 0 0 0 0 S 0.0 0.0 6:48.61 ksoftirqd/0 11 root 20 0 0 0 0 R 0.0 0.0 4:02.91 rcu_sched 12 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 13 root rt 0 0 0 0 S 0.0 0.0 0:00.09 watchdog/0 14 root 20 0 0 0 0 S 0.0 0.0 0:00.01 cpuhp/0 15 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/1 16 root rt 0 0 0 0 S 0.0 0.0 0:00.63 watchdog/1 17 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/1 18 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1 19 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker/1:0 20 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/1:0H 21 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/2 22 root rt 0 0 0 0 S 0.0 0.0 0:00.25 watchdog/2 23 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/2 24 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/2 26 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/2:0H-events_highpri 27 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/3 28 root rt 0 0 0 0 S 0.0 0.0 0:00.65 watchdog/3 29 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/3 30 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/3 31 root 20 0 0 0 0 I 0.0 0.0 0:00.00 kworker/3:0 32 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kworker/3:0H 33 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/4 34 root rt 0 0 0 0 S 0.0 0.0 0:00.89 watchdog/4
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0674