Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2126450

Summary: Potential crash in ovn-controller when handling deleted port bindings
Product: Red Hat Enterprise Linux Fast Datapath Reporter: xsimonar
Component: ovn22.06Assignee: xsimonar
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: unspecified Docs Contact:
Priority: urgent    
Version: FDP 22.HCC: ctrautma, dceara, i.maximets, jiji, mmichels
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-21 18:42:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description xsimonar 2022-09-13 12:48:49 UTC
Adding and deleting a port might cause a crash in ovn-controller, when both notifications from sb are received in the same batch by ovn-controller.

Backtrace:

#0  0x000000000040d947 in handle_deleted_lport (pb=0xc87480, b_ctx_in=0x7ffe44f6a4e0, b_ctx_out=0x7ffe44f6a470) at controller/binding.c:2514
#1  0x000000000040e204 in handle_deleted_vif_lport (pb=0xc87480, lport_type=lport_type@entry=LP_VIF, b_ctx_in=b_ctx_in@entry=0x7ffe44f6a4e0, b_ctx_out=b_ctx_out@entry=0x7ffe44f6a470) at controller/binding.c:2587
#2  0x0000000000412286 in binding_handle_port_binding_changes (b_ctx_in=b_ctx_in@entry=0x7ffe44f6a4e0, b_ctx_out=b_ctx_out@entry=0x7ffe44f6a470) at controller/binding.c:2921
#3  0x000000000043daf9 in runtime_data_sb_port_binding_handler (node=0x7ffe44f6d9a0, data=0xbbb380) at controller/ovn-controller.c:1617
#4  0x000000000045df8e in engine_compute (recompute_allowed=<optimized out>, node=<optimized out>) at lib/inc-proc-eng.c:414
#5  engine_run_node (recompute_allowed=true, node=0x7ffe44f6d9a0) at lib/inc-proc-eng.c:476
#6  engine_run (recompute_allowed=recompute_allowed@entry=true) at lib/inc-proc-eng.c:501
#7  0x000000000040a835 in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:4127

(gdb) p pb->datapath
$3 = (struct sbrec_datapath_binding *) 0x0

The issue can be reproduced using the following ovn test case:
OVN_FOR_EACH_NORTHD([
AT_SETUP([ovn-controller port addition and deletion])
ovn_start
net_add n1

sim_add hv1
as hv1
ovs-vsctl add-br br-phys
ovn_attach n1 br-phys 192.168.0.1
ovn-appctl vlog/set dbg

ovs-vsctl set interface p1 external-ids:iface-id=sw0-port1
check ovn-nbctl --wait=hv sync
ovn-appctl debug/pause
OVS_WAIT_UNTIL([test x$(as hv1 ovn-appctl -t ovn-controller debug/status) = "xpaused"])

ovn-nbctl ls-add sw0 -- lsp-add sw0 sw0-port1 -- lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2"
ovn-nbctl lsp-del sw0-port1

ovn-appctl debug/resume
check ovn-nbctl --wait=hv sync

ovn-nbctl ls-del sw0
check ovn-nbctl --wait=hv sync

OVN_CLEANUP([hv1])
AT_CLEANUP
])

Comment 1 xsimonar 2022-09-16 08:54:10 UTC
IDL issue fix posted upstream for review: https://patchwork.ozlabs.org/project/openvswitch/patch/20220916084006.884447-1-xsimonar@redhat.com/

Comment 2 xsimonar 2022-10-18 14:38:33 UTC
While the issue is fixed by the code change shown in https://bugzilla.redhat.com/show_bug.cgi?id=2126450#c1, this requires
an OVN patch to use the proper OVS submodule

Comment 3 xsimonar 2022-10-21 13:19:45 UTC
OVN submodule bump posted and merged in 22.06: https://github.com/ovn-org/ovn/commit/d6b59b35fb4c0bc174e9f1cace4a1b9b244f3c58

Comment 6 Jianlin Shi 2022-11-04 01:45:11 UTC
with reproducer in https://bugzilla.redhat.com/show_bug.cgi?id=2132964#c4, reproduced on ovn22.06-22.06.0-64.el8:
+ grep ovn-controller
openvsw+   97120  0.0  0.0 282252  5740 ?        S<sl 21:42   0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach
root       97125  0.0  0.0  12140  1044 pts/0    S+   21:42   0:00 grep ovn-controller                
+ ovn-appctl vlog/set dbg                                                                             
+ ovs-vsctl add-port br-int p1 -- set interface p1 type=internal external-ids:iface-id=sw0-port1      
+ ovn-nbctl --wait=hv sync                                                                            
+ ovn-appctl debug/pause
+ ovn-appctl -t ovn-controller debug/status                                                           
paused
+ ovn-nbctl ls-add sw0 -- lsp-add sw0 sw0-port1 -- lsp-set-addresses sw0-port1 '50:54:00:00:00:01 192.168.0.2'
+ ovn-nbctl lsp-del sw0-port1                                                                         
+ ovn-appctl debug/resume                                                                             
+ ovn-nbctl --wait=hv sync                                                                            
+ ps aux
+ grep ovn-controller
openvsw+   97267  0.0  0.0 282400  5072 ?        S<sl 21:42   0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach
root       97272  0.0  0.0  12140  1092 pts/0    S+   21:42   0:00 grep ovn-controller                
+ ovn-nbctl ls-del sw0
+ ovn-nbctl --wait=hv sync                                                                            
+ ps aux
+ grep ovn-controller
openvsw+   97267  0.0  0.0 282400  5072 ?        S<sl 21:42   0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach                       
root       97276  0.0  0.0  12140  1100 pts/0    S+   21:42   0:00 grep ovn-controller                
+ check_coredump                                                                                      
+ coredumpctl list                                                                                    
TIME                            PID   UID   GID SIG COREFILE  EXE                                     
Thu 2022-11-03 21:42:18 EDT   97120   993   990  11 none      /usr/bin/ovn-controller                 
[root@dell-per750-18 bz2132964]# rpm -qa | grep -E "openvswitch2.17|ovn22.06"                         
ovn22.06-central-22.06.0-64.el8fdp.x86_64                                                             
ovn22.06-22.06.0-64.el8fdp.x86_64                                                                     
ovn22.06-host-22.06.0-64.el8fdp.x86_64                                                                
openvswitch2.17-2.17.0-61.el8fdp.x86_64

Verified on ovn22.06-22.06.0-75.el8:

+ grep ovn-controller
openvsw+   98310  0.0  0.0 282296  5428 ?        S<sl 21:44   0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach
root       98315  0.0  0.0  12140  1208 pts/0    S+   21:44   0:00 grep ovn-controller
+ ovn-appctl vlog/set dbg
+ ovs-vsctl add-port br-int p1 -- set interface p1 type=internal external-ids:iface-id=sw0-port1
+ ovn-nbctl --wait=hv sync
+ ovn-appctl debug/pause
+ ovn-appctl -t ovn-controller debug/status
paused
+ ovn-nbctl ls-add sw0 -- lsp-add sw0 sw0-port1 -- lsp-set-addresses sw0-port1 '50:54:00:00:00:01 192.168.0.2'
+ ovn-nbctl lsp-del sw0-port1
+ ovn-appctl debug/resume
+ ovn-nbctl --wait=hv sync
+ ps aux
+ grep ovn-controller
openvsw+   98310  0.0  0.0 282436  5428 ?        S<sl 21:44   0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach
root       98420  0.0  0.0  12140  1208 pts/0    S+   21:44   0:00 grep ovn-controller
+ ovn-nbctl ls-del sw0
+ ovn-nbctl --wait=hv sync
+ ps aux
+ grep ovn-controller
openvsw+   98310  0.0  0.0 282436  5428 ?        S<sl 21:44   0:00 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn/ovn-controller.pid --detach
root       98424  0.0  0.0  12140  1176 pts/0    S+   21:44   0:00 grep ovn-controller
+ check_coredump
+ coredumpctl list
No coredumps found.
[root@dell-per750-18 bz2132964]# rpm -qa | grep -E "openvswitch2.17|ovn22.06"
ovn22.06-host-22.06.0-75.el8fdp.x86_64
ovn22.06-central-22.06.0-75.el8fdp.x86_64
openvswitch2.17-2.17.0-61.el8fdp.x86_64
ovn22.06-22.06.0-75.el8fdp.x86_64

Comment 8 errata-xmlrpc 2022-11-21 18:42:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn22.06 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8572