Bug 1934011

Summary: [OVN] CPU will be very high after set "ovn-nbctl acl-add pg1 to-lport 3 ip reject"
Product: Red Hat Enterprise Linux Fast Datapath Reporter: ying xu <yinxu>
Component: ovn2.13Assignee: Mohammad Heib <mheib>
Status: CLOSED ERRATA QA Contact: ying xu <yinxu>
Severity: high Docs Contact:
Priority: high    
Version: FDP 21.BCC: ctrautma, jiji, jishi, mheib, mmichels, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-02-24 17:47:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description ying xu 2021-03-02 09:50:09 UTC
Description of problem:
CPU will be very high after set "ovn-nbctl  acl-add pg1 to-lport 3 ip reject"

Version-Release number of selected component (if applicable):
# rpm -qa|grep ovn
ovn2.13-central-20.12.0-24.el8fdp.x86_64
ovn2.13-20.12.0-24.el8fdp.x86_64
ovn2.13-host-20.12.0-24.el8fdp.x86_64
[root@dell-per740-53 acl]# rpm -qa|grep openvs
openvswitch2.13-2.13.0-95.el8fdp.x86_64
python3-openvswitch2.13-2.13.0-95.el8fdp.x86_64



How reproducible:

always

Steps to Reproduce:
server:
ovn-nbctl ls-add ls
                ovn-nbctl lsp-add ls vm1
                ovn-nbctl lsp-set-addresses vm1 00:00:00:00:00:01
                ovn-nbctl lsp-add ls vm2
                ovn-nbctl lsp-set-addresses vm2 00:00:00:00:00:02

                ovn-nbctl lsp-add ls vm3
                ovn-nbctl lsp-set-addresses vm3 00:00:00:00:00:03
                ip netns add vm1
                ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
                ip link set vm1 netns vm1
                ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01
                ip netns exec vm1 ip addr add 42.42.42.1/24 dev vm1
                ip netns exec vm1 ip link set vm1 up
                ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovn-nbctl pg-add pg1 vm1 vm2

client:
ip netns add vm2
                ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
                ip link set vm2 netns vm2
                ip netns exec vm2 ip link set vm2 address 00:00:00:00:00:02
                ip netns exec vm2 ip addr add 42.42.42.2/24 dev vm2
                ip netns exec vm2 ip link set vm2 up
                ip netns exec vm2 ip link set lo up
                ovs-vsctl set Interface vm2 external_ids:iface-id=vm2
                ip netns add vm3
                ovs-vsctl add-port br-int vm3 -- set interface vm3 type=internal
                ip link set vm3 netns vm3
                ip netns exec vm3 ip link set vm3 address 00:00:00:00:00:03
                ip netns exec vm3 ip addr add 42.42.42.3/24 dev vm3
                ip netns exec vm3 ip link set vm3 up
                ip netns exec vm3 ip link set lo up
                ip netns exec vm2 ip route add default via 42.42.42.5
                ip netns exec vm3 ip route add default via 42.42.42.5
                ovs-vsctl set Interface vm3 external_ids:iface-id=vm3


then on server set the command:
# ovn-nbctl  acl-add pg1 to-lport 3 ip reject

after a few seconds,we can see that cpu comes up to very high
# top

top - 04:46:14 up  8:25,  1 user,  load average: 2.17, 3.71, 5.58
Tasks: 592 total,   3 running, 589 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.7 us,  5.3 sy,  0.0 ni, 90.6 id,  0.0 wa,  0.2 hi,  1.2 si,  0.0 st
MiB Mem : 128155.4 total, 125945.5 free,   1470.1 used,    739.7 buff/cache
MiB Swap:   4096.0 total,   4096.0 free,      0.0 used. 125743.8 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                    
  82023 openvsw+  10 -10 3404912 159656  36036 R 427.6   0.1  27:21.14 ovs-vswitchd                                                               
  82124 openvsw+  10 -10  282456   7988   3476 S  31.8   0.0   2:07.27 ovn-controller                                                             
    134 root      20   0       0      0      0 R   2.3   0.0   0:12.59 ksoftirqd/20                                                               
  83886 root      20   0   65908   5272   4092 R   0.7   0.0   0:00.04 top  

and we can see logs in ovs:
   2021-03-02T09:47:25.183Z|00204|poll_loop|INFO|Dropped 57187 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2021-03-02T09:47:25.183Z|00205|poll_loop|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif.c:1941 (81% CPU usage)
2021-03-02T09:47:31.182Z|00206|poll_loop|INFO|Dropped 55136 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2021-03-02T09:47:31.183Z|00207|poll_loop|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif.c:1941 (84% CPU usage)
   
Actual results:
cpu is very high, sometimes much higher than that showed in the description
and last all the time until delete the acl rule

Expected results:
cpu is not very high

Additional info:

Comment 1 Mohammad Heib 2021-10-05 21:33:18 UTC
Hi,
this is a self comment for future debugging:

apparently what is happened here is that when you add two or more ports to the pg and add acl to reject the ip traffic if one of the group members has sent an IP packet to any other group member it will get ip/icmp unreachable msg which in turn will replay with ip/icmp unreachable msg because we added the acl and here we will enter to infinity loop of unreachable msgs between group members which will cause ovs-vswitch to receive huge amount of packets and kill the system CPUs.

Comment 2 Mohammad Heib 2021-10-06 15:33:36 UTC
suggested fix submitted upstream:
http://patchwork.ozlabs.org/project/ovn/patch/20211006152815.2134267-1-mheib@redhat.com/

Comment 5 ying xu 2022-01-17 07:38:21 UTC
reproduced on version:
# rpm -qa|grep ovn
ovn-2021-central-21.09.0-12.el8fdp.x86_64
ovn-2021-host-21.09.0-12.el8fdp.x86_64
ovn-2021-21.09.0-12.el8fdp.x86_64

# top

top - 02:18:01 up 11 days, 23:22,  1 user,  load average: 2.46, 2.18, 1.85
Tasks: 556 total,   2 running, 554 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.7 us,  0.7 sy,  0.0 ni, 98.4 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
MiB Mem : 128155.6 total,  94641.8 free,  25787.9 used,   7725.9 buff/cache
MiB Swap:   4096.0 total,   4096.0 free,      0.0 used. 101463.5 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                
3157955 openvsw+  10 -10  302164  74132  41028 S  41.7   0.1   9:14.07 ovs-vswitchd                                                                           
3158057 openvsw+  10 -10  282684  28604   3552 S  21.5   0.0   4:54.42 ovn-controller                                                                         
     10 root      20   0       0      0      0 R   1.0   0.0   7:08.65 ksoftirqd/0                                                                            
3164366 root      20   0   65940   5164   3968 R   0.7   0.0   0:00.04 top                                                                                    
     11 root      20   0       0      0      0 I   0.3   0.0   4:04.50 rcu_sched                                                                              
      1 root      20   0  175684  14324   9008 S   0.0   0.0   0:11.47 systemd                                                                                
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.19 kthreadd                                                                               
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                                                 
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                                             
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-events_highpri                                                            
      9 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq                                                                           
     12 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/0                                                                            
     13 root      rt   0       0      0      0 S   0.0   0.0   0:00.09 watchdog/0                                                                             
     14 root      20   0       0      0      0 S   0.0   0.0   0:00.01 cpuhp/0                                                                                
     15 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/1                                                                                
     16 root      rt   0       0      0      0 S   0.0   0.0   0:00.63 watchdog/1                                                                             
     17 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/1                                                                            
     18 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/1                                                                            
     19 root      20   0       0      0      0 I   0.0   0.0   0:00.00 kworker/1:0                                                                            
     20 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/1:0H                                                                           
     21 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/2                                                                                
     22 root      rt   0       0      0      0 S   0.0   0.0   0:00.26 watchdog/2                                                                             
     23 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/2                                                                            
     24 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/2                                                                            
     26 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/2:0H-events_highpri                                                            
     27 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/3                                                                                
     28 root      rt   0       0      0      0 S   0.0   0.0   0:00.65 watchdog/3                                                                             
     29 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/3                                                                            
     30 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/3                                                                            
     31 root      20   0       0      0      0 I   0.0   0.0   0:00.00 kworker/3:0                                                                            
     32 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/3:0H       

verified on version:
# rpm -qa|grep ovn
ovn-2021-host-21.12.0-11.el8fdp.x86_64
ovn-2021-21.12.0-11.el8fdp.x86_64
ovn-2021-central-21.12.0-11.el8fdp.x86_64

[root@dell-per740-53 acl]# top

top - 01:32:53 up 11 days, 22:37,  1 user,  load average: 0.00, 0.13, 0.81
Tasks: 555 total,   2 running, 553 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.3 sy,  0.0 ni, 99.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 128155.6 total,  94682.8 free,  25756.5 used,   7716.3 buff/cache
MiB Swap:   4096.0 total,   4096.0 free,      0.0 used. 101503.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                
3155120 root      20   0   65940   5016   3824 R   5.9   0.0   0:00.02 top                                                                                    
      1 root      20   0  175684  14324   9008 S   0.0   0.0   0:10.70 systemd                                                                                
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.19 kthreadd                                                                               
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                                                 
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                                             
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-events_highpri                                                            
      9 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq                                                                           
     10 root      20   0       0      0      0 S   0.0   0.0   6:48.61 ksoftirqd/0                                                                            
     11 root      20   0       0      0      0 R   0.0   0.0   4:02.91 rcu_sched                                                                              
     12 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/0                                                                            
     13 root      rt   0       0      0      0 S   0.0   0.0   0:00.09 watchdog/0                                                                             
     14 root      20   0       0      0      0 S   0.0   0.0   0:00.01 cpuhp/0                                                                                
     15 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/1                                                                                
     16 root      rt   0       0      0      0 S   0.0   0.0   0:00.63 watchdog/1                                                                             
     17 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/1                                                                            
     18 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/1                                                                            
     19 root      20   0       0      0      0 I   0.0   0.0   0:00.00 kworker/1:0                                                                            
     20 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/1:0H                                                                           
     21 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/2                                                                                
     22 root      rt   0       0      0      0 S   0.0   0.0   0:00.25 watchdog/2                                                                             
     23 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/2                                                                            
     24 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/2                                                                            
     26 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/2:0H-events_highpri                                                            
     27 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/3                                                                                
     28 root      rt   0       0      0      0 S   0.0   0.0   0:00.65 watchdog/3                                                                             
     29 root      rt   0       0      0      0 S   0.0   0.0   0:00.00 migration/3                                                                            
     30 root      20   0       0      0      0 S   0.0   0.0   0:00.00 ksoftirqd/3                                                                            
     31 root      20   0       0      0      0 I   0.0   0.0   0:00.00 kworker/3:0                                                                            
     32 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/3:0H                                                                           
     33 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/4                                                                                
     34 root      rt   0       0      0      0 S   0.0   0.0   0:00.89 watchdog/4

Comment 7 errata-xmlrpc 2022-02-24 17:47:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0674