Bug 1919055

Summary: ovn-northd would crash if set name for logical_switch_port the same as name for logical_router_port
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Jianlin Shi <jishi>
Component: ovn2.13Assignee: Dumitru Ceara <dceara>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: high Docs Contact:
Priority: high    
Version: FDP 21.ACC: ctrautma, dceara, jishi, ralongi, yinxu
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-15 14:36:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jianlin Shi 2021-01-22 01:47:46 UTC
Description of problem:
ovn-northd would crash if set name for logical_switch_port the same as name for logical_router_port

Version-Release number of selected component (if applicable):
ovn2.13-20.12.0-1

How reproducible:
Always

Steps to Reproduce:
systemctl start ovn-northd
ovn-nbctl ls-add ls1
ovn-nbctl lsp-add ls1 ls1p1
ovn-nbctl lr-add lr1
ovn-nbctl lrp-add lr1 lr1p1 00:00:11:11:11:11 1.2.3.4/24
ovn-nbctl set logical_switch_port ls1p1 name=lr1p1

Actual results:
ovn-northd crash

Expected results:
not crash

Additional info:


[root@wsfd-advnetlab20 ~]# grep crash /var/log/ovn/ovn-northd.log
2021-01-22T01:40:50.113Z|00003|daemon_unix(monitor)|ERR|1 crashes: pid 172608 died, killed (Segmentation fault), restarting
2021-01-22T01:40:50.128Z|00005|daemon_unix(monitor)|WARN|2 crashes: pid 172640 died, killed (Segmentation fault), waiting until 10 seconds since last restart
2021-01-22T01:41:00.129Z|00006|daemon_unix(monitor)|ERR|2 crashes: pid 172640 died, killed (Segmentation fault), restarting
2021-01-22T01:41:00.143Z|00008|daemon_unix(monitor)|WARN|3 crashes: pid 172645 died, killed (Segmentation fault), waiting until 10 seconds since last restart

[root@wsfd-advnetlab20 test]# rpm -qa | grep -E "ovn2.13|openvswitch2.13"                             
openvswitch2.13-2.13.0-80.el8fdp.x86_64                                                               
ovn2.13-20.12.0-1.el8fdp.x86_64                                                                       
ovn2.13-central-20.12.0-1.el8fdp.x86_64                                                               
ovn2.13-host-20.12.0-1.el8fdp.x86_64

after change the name for logical_switch_port back, ovn-northd would consume 100% cpu:

[root@wsfd-advnetlab20 test]# ovn-nbctl set logical_switch_port ls1p1 name=ls1p1 
[root@wsfd-advnetlab20 ~]# tail /var/log/ovn/ovn-northd.log
2021-01-22T01:46:43.187Z|00133|poll_loop(ovn-northd)|INFO|Dropped 38977 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2021-01-22T01:46:43.187Z|00134|poll_loop(ovn-northd)|INFO|wakeup due to [POLLIN] on fd 11 (FIFO pipe:[2681941]) at lib/fatal-signal.c:324 (99% CPU usage)
2021-01-22T01:46:49.187Z|00135|poll_loop(ovn-northd)|INFO|Dropped 38905 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2021-01-22T01:46:49.187Z|00136|poll_loop(ovn-northd)|INFO|wakeup due to [POLLIN] on fd 11 (FIFO pipe:[2681941]) at lib/fatal-signal.c:324 (99% CPU usage)
2021-01-22T01:46:55.187Z|00137|poll_loop(ovn-northd)|INFO|Dropped 38786 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2021-01-22T01:46:55.187Z|00138|poll_loop(ovn-northd)|INFO|wakeup due to [POLLIN] on fd 11 (FIFO pipe:[2681941]) at lib/fatal-signal.c:324 (99% CPU usage)
2021-01-22T01:47:01.187Z|00139|poll_loop(ovn-northd)|INFO|Dropped 38928 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2021-01-22T01:47:01.187Z|00140|poll_loop(ovn-northd)|INFO|wakeup due to [POLLIN] on fd 11 (FIFO pipe:[2681941]) at lib/fatal-signal.c:324 (99% CPU usage)
2021-01-22T01:47:07.187Z|00141|poll_loop(ovn-northd)|INFO|Dropped 39055 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2021-01-22T01:47:07.188Z|00142|poll_loop(ovn-northd)|INFO|wakeup due to [POLLIN] on fd 11 (FIFO pipe:[2681941]) at lib/fatal-signal.c:324 (99% CPU usage)

[root@wsfd-advnetlab20 test]# top -n 1                                            
top - 20:47:26 up 4 days, 36 min,  3 users,  load average: 1.26, 0.93, 0.51                     
Tasks: 570 total,   2 running, 568 sleeping,   0 stopped,   0 zombie          
%Cpu(s):  2.1 us,  0.3 sy,  0.0 ni, 97.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st  
MiB Mem :  95015.8 total,  92314.4 free,   1107.8 used,   1593.6 buff/cache       
MiB Swap:   4096.0 total,   4096.0 free,      0.0 used.  93127.1 avail Mem        
                                                                                                
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                        
 172669 openvsw+  10 -10   60784   4808   3060 R 100.0   0.0   5:45.42 ovn-northd

Comment 1 Dumitru Ceara 2021-01-22 08:45:23 UTC
The fix for bug 1918582 also fixes this one.  Moving to post as the fix is up for review: http://patchwork.ozlabs.org/project/ovn/patch/1611232451-9414-1-git-send-email-dceara@redhat.com/

Comment 3 Jianlin Shi 2021-02-01 02:59:26 UTC
Verified on 20.12.0-9:

[root@wsfd-advnetlab20 bz1919055]# rpm -qa | grep ovn2.13                                             
ovn2.13-20.12.0-9.el7fdp.x86_64                                                                       
ovn2.13-host-20.12.0-9.el7fdp.x86_64                                                                  
ovn2.13-central-20.12.0-9.el7fdp.x86_64

[root@wsfd-advnetlab20 bz1919055]# bash -x rep.sh                                                     
+ systemctl start ovn-northd                                                                          
+ ovn-nbctl ls-add ls1
+ ovn-nbctl lsp-add ls1 ls1p1                                                                         
+ ovn-nbctl lr-add lr1
+ ovn-nbctl lrp-add lr1 lr1p1 00:00:11:11:11:11 1.2.3.4/24                                            
+ ovn-nbctl set logical_switch_port ls1p1 name=lr1p1

[root@wsfd-advnetlab20 ~]# tail -f /var/log/ovn/ovn-northd.log                                        
2021-02-01T02:56:09.918Z|00001|vlog|INFO|opened log file /var/log/ovn/ovn-northd.log                  
2021-02-01T02:56:09.925Z|00002|ovn_northd|INFO|OVN internal version is : [20.12.0-20.13.0-52.0]       
2021-02-01T02:56:09.925Z|00003|reconnect|INFO|unix:/run/ovn/ovnnb_db.sock: connecting...              
2021-02-01T02:56:09.925Z|00004|reconnect|INFO|unix:/run/ovn/ovnsb_db.sock: connecting...
2021-02-01T02:56:09.925Z|00005|reconnect|INFO|unix:/run/ovn/ovnnb_db.sock: connected
2021-02-01T02:56:09.925Z|00006|reconnect|INFO|unix:/run/ovn/ovnsb_db.sock: connected
2021-02-01T02:56:09.926Z|00007|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
2021-02-01T02:56:10.000Z|00008|ovn_northd|WARN|duplicate logical port lr1p1 

<==== not crash

Comment 6 Jianlin Shi 2021-02-22 02:26:14 UTC
set VERIFIED per comment 3

Comment 8 errata-xmlrpc 2021-03-15 14:36:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0836