If an interface with an qos option is deleted at the same time as an ofport notification from ovs (causing runtime_data recompute) is received, the binding module is trying to delete twice the same qos queue, causing ovs to raise an exception. #0 0x00007f378d15b2a2 in raise () from /lib64/libc.so.6 #1 0x00007f378d1448a4 in abort () from /lib64/libc.so.6 #2 0x0000000000509c4e in ovs_abort_valist (err_no=err_no@entry=0, format=format@entry=0x604270 "%s: assertion %s failed in %s()", args=args@entry=0x7ffe27e25d78) at lib/util.c:447 #3 0x00000000005116a1 in vlog_abort_valist (module_=<optimized out>, message=0x604270 "%s: assertion %s failed in %s()", args=args@entry=0x7ffe27e25d78) at lib/vlog.c:1286 #4 0x0000000000511737 in vlog_abort (module=module@entry=0x6d73e0 <this_module>, message=message@entry=0x604270 "%s: assertion %s failed in %s()") at lib/vlog.c:1300 #5 0x0000000000509981 in ovs_assert_failure (where=where@entry=0x5ffcb4 "lib/ovsdb-idl.c:3774", function=function@entry=0x6008f0 <__func__.3> "ovsdb_idl_txn_delete", condition=condition@entry=0x5ffa37 "row->new_datum != NULL") at lib/util.c:89 #6 0x00000000004f37f4 in ovsdb_idl_txn_delete (row_=0x1a353c0) at lib/ovsdb-idl.c:3774 #7 0x0000000000411cfb in ovs_qos_entries_gc (queue_map=0x1996670, qos_table=<optimized out>, ovsrec_port_by_qos=0x193ad10, ovs_idl_txn=<optimized out>) at controller/binding.c:427 #8 binding_run (b_ctx_in=b_ctx_in@entry=0x7ffe27e25fc0, b_ctx_out=b_ctx_out@entry=0x7ffe27e25f50) at controller/binding.c:2128 #9 0x000000000043b309 in en_runtime_data_run (node=0x7ffe27e2a610, data=0x1996590) at controller/ovn-controller.c:1670 #10 0x0000000000463c58 in engine_recompute (node=node@entry=0x7ffe27e2a610, allowed=allowed@entry=true, reason_fmt=reason_fmt@entry=0x5d91a3 "failed handler for input %s") at lib/inc-proc-eng.c:415 #11 0x00000000004645ed in engine_compute (recompute_allowed=<optimized out>, node=<optimized out>) at lib/inc-proc-eng.c:454 #12 engine_run_node (recompute_allowed=true, node=0x7ffe27e2a610) at lib/inc-proc-eng.c:503 #13 engine_run (recompute_allowed=recompute_allowed@entry=true) at lib/inc-proc-eng.c:528 #14 0x000000000040ac0f in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:5242 Reproduced using following unit test on origin/main: sleep_controller() { echo Controller $hv going to sleep hv=$1 as $hv check ovn-appctl debug/pause OVS_WAIT_UNTIL([test x$(ovn-appctl -t ovn-controller debug/status) = "xpaused"]) } wake_up_controller() { hv=$1 as $hv echo Controller $hv waking up ovn-appctl debug/resume OVS_WAIT_UNTIL([test x$(ovn-appctl -t ovn-controller debug/status) = "xrunning"]) } sleep_ovs() { hv=$1 echo ovs $hv going to sleep AT_CHECK([kill -STOP $(cat $hv/ovs-vswitchd.pid)]) } wake_up_ovs() { hv=$1 echo ovs $hv going to sleep AT_CHECK([kill -CONT $(cat $hv/ovs-vswitchd.pid)]) } OVN_FOR_EACH_NORTHD([ AT_SETUP([OVN QoS port deletion]) ovn_start check ovn-nbctl ls-add ls1 check ovn-nbctl lsp-add ls1 public1 check ovn-nbctl lsp-set-addresses public1 unknown check ovn-nbctl lsp-set-type public1 localnet check ovn-nbctl lsp-set-options public1 network_name=phys net_add n # two hypervisors, each connected to the same network for i in 1 2; do sim_add hv-$i as hv-$i ovs-vsctl add-br br-phys ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys ovn_attach n br-phys 192.168.0.$i done check ovn-nbctl lsp-add ls1 lsp1 check ovn-nbctl lsp-set-addresses lsp1 f0:00:00:00:00:03 as hv-1 ovs-vsctl add-port br-int vif1 -- \ set Interface vif1 external-ids:iface-id=lsp1 \ ofport-request=3 OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp1` = xup]) check ovn-nbctl set Logical_Switch_Port lsp1 options:qos_max_rate=800000 check ovn-nbctl --wait=hv set Logical_Switch_Port lsp1 options:qos_burst=9000000 AS_BOX([$(date +%H:%M:%S.%03N) checking deletion of port with qos options]) check ovn-nbctl ls-add ls2 check ovn-nbctl lsp-add ls2 lsp2 check ovn-nbctl lsp-set-addresses lsp2 f0:00:00:00:00:05 as hv-1 ovs-vsctl add-port br-int vif2 -- \ set Interface vif2 external-ids:iface-id=lsp2 \ ofport-request=5 OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp2` = xup]) # Sleep ovs to postpone ofport notification to ovn sleep_ovs hv-1 # Create localnet; this will cause patch-port creation check ovn-nbctl lsp-add ls2 public2 check ovn-nbctl lsp-set-addresses public2 unknown check ovn-nbctl lsp-set-type public2 localnet check ovn-nbctl --wait=sb set Logical_Switch_Port public2 options:qos_min_rate=6000000000 options:qos_max_rate=7000000000 options:qos_burst=8000000000 options:network_name=phys # Let's now send ovn controller to sleep, so it will receive both ofport notification and ls deletion simultaneously sleep_controller hv-1 # Tme to wake up ovs wake_up_ovs hv-1 # Delete lsp1 check ovn-nbctl --wait=sb lsp-del lsp1 # And finally wake up controller wake_up_controller hv-1 # Make sure ovn-controller is still OK ovn-nbctl --wait=hv sync OVS_WAIT_UNTIL([test $(as hv-1 ovs-vsctl list qos | grep -c linux-htb) -eq 1]) AT_CLEANUP ])
Upstream patch series posted here: https://patchwork.ozlabs.org/project/ovn/list/?series=358637
ovn23.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2223477
*** This bug has been marked as a duplicate of bug 2223477 ***