Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2213219

Summary: Potential exception in ovn-controller when deleting a swith port on which qos was applied
Product: Red Hat Enterprise Linux Fast Datapath Reporter: xsimonar
Component: ovn23.06Assignee: xsimonar
Status: CLOSED DUPLICATE QA Contact: ying xu <yinxu>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: FDP 22.HCC: ctrautma, jiji, jishi, mmichels
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn23.06-23.06.0-36.el8fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-17 13:59:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description xsimonar 2023-06-07 14:02:36 UTC
If an interface with an qos option is deleted at the same time as an ofport notification from ovs (causing runtime_data recompute) is received, the binding module is trying to delete twice the same qos queue, causing ovs to raise an exception.

#0  0x00007f378d15b2a2 in raise () from /lib64/libc.so.6
#1  0x00007f378d1448a4 in abort () from /lib64/libc.so.6
#2  0x0000000000509c4e in ovs_abort_valist (err_no=err_no@entry=0, format=format@entry=0x604270 "%s: assertion %s failed in %s()", args=args@entry=0x7ffe27e25d78) at lib/util.c:447
#3  0x00000000005116a1 in vlog_abort_valist (module_=<optimized out>, message=0x604270 "%s: assertion %s failed in %s()", args=args@entry=0x7ffe27e25d78) at lib/vlog.c:1286
#4  0x0000000000511737 in vlog_abort (module=module@entry=0x6d73e0 <this_module>, message=message@entry=0x604270 "%s: assertion %s failed in %s()") at lib/vlog.c:1300
#5  0x0000000000509981 in ovs_assert_failure (where=where@entry=0x5ffcb4 "lib/ovsdb-idl.c:3774", function=function@entry=0x6008f0 <__func__.3> "ovsdb_idl_txn_delete", condition=condition@entry=0x5ffa37 "row->new_datum != NULL")
    at lib/util.c:89
#6  0x00000000004f37f4 in ovsdb_idl_txn_delete (row_=0x1a353c0) at lib/ovsdb-idl.c:3774
#7  0x0000000000411cfb in ovs_qos_entries_gc (queue_map=0x1996670, qos_table=<optimized out>, ovsrec_port_by_qos=0x193ad10, ovs_idl_txn=<optimized out>) at controller/binding.c:427
#8  binding_run (b_ctx_in=b_ctx_in@entry=0x7ffe27e25fc0, b_ctx_out=b_ctx_out@entry=0x7ffe27e25f50) at controller/binding.c:2128
#9  0x000000000043b309 in en_runtime_data_run (node=0x7ffe27e2a610, data=0x1996590) at controller/ovn-controller.c:1670
#10 0x0000000000463c58 in engine_recompute (node=node@entry=0x7ffe27e2a610, allowed=allowed@entry=true, reason_fmt=reason_fmt@entry=0x5d91a3 "failed handler for input %s") at lib/inc-proc-eng.c:415
#11 0x00000000004645ed in engine_compute (recompute_allowed=<optimized out>, node=<optimized out>) at lib/inc-proc-eng.c:454
#12 engine_run_node (recompute_allowed=true, node=0x7ffe27e2a610) at lib/inc-proc-eng.c:503
#13 engine_run (recompute_allowed=recompute_allowed@entry=true) at lib/inc-proc-eng.c:528
#14 0x000000000040ac0f in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:5242

Reproduced using following unit test on origin/main:

sleep_controller() {
  echo Controller $hv going to sleep
  hv=$1
  as $hv
  check ovn-appctl debug/pause
  OVS_WAIT_UNTIL([test x$(ovn-appctl -t ovn-controller debug/status) = "xpaused"])
}
wake_up_controller() {
  hv=$1
  as $hv
  echo Controller $hv waking up
  ovn-appctl debug/resume
  OVS_WAIT_UNTIL([test x$(ovn-appctl -t ovn-controller debug/status) = "xrunning"])
}
sleep_ovs() {
  hv=$1
  echo ovs $hv going to sleep
  AT_CHECK([kill -STOP $(cat $hv/ovs-vswitchd.pid)])
}

wake_up_ovs() {
  hv=$1
  echo ovs $hv going to sleep
  AT_CHECK([kill -CONT $(cat $hv/ovs-vswitchd.pid)])
}

OVN_FOR_EACH_NORTHD([
AT_SETUP([OVN QoS port deletion])
ovn_start

check ovn-nbctl ls-add ls1
check ovn-nbctl lsp-add ls1 public1
check ovn-nbctl lsp-set-addresses public1 unknown
check ovn-nbctl lsp-set-type public1 localnet
check ovn-nbctl lsp-set-options public1 network_name=phys
net_add n

# two hypervisors, each connected to the same network
for i in 1 2; do
    sim_add hv-$i
    as hv-$i
    ovs-vsctl add-br br-phys
    ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
    ovn_attach n br-phys 192.168.0.$i
done

check ovn-nbctl lsp-add ls1 lsp1
check ovn-nbctl lsp-set-addresses lsp1 f0:00:00:00:00:03
as hv-1
ovs-vsctl add-port br-int vif1 -- \
    set Interface vif1 external-ids:iface-id=lsp1 \
    ofport-request=3

OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp1` = xup])

check ovn-nbctl set Logical_Switch_Port lsp1 options:qos_max_rate=800000
check ovn-nbctl --wait=hv set Logical_Switch_Port lsp1 options:qos_burst=9000000

AS_BOX([$(date +%H:%M:%S.%03N) checking deletion of port with qos options])
check ovn-nbctl ls-add ls2
check ovn-nbctl lsp-add ls2 lsp2
check ovn-nbctl lsp-set-addresses lsp2 f0:00:00:00:00:05
as hv-1
ovs-vsctl add-port br-int vif2 -- \
    set Interface vif2 external-ids:iface-id=lsp2 \
    ofport-request=5
OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp2` = xup])

# Sleep ovs to postpone ofport notification to ovn
sleep_ovs hv-1
# Create localnet; this will cause patch-port creation
check ovn-nbctl lsp-add ls2 public2
check ovn-nbctl lsp-set-addresses public2 unknown
check ovn-nbctl lsp-set-type public2 localnet
check ovn-nbctl --wait=sb set Logical_Switch_Port public2 options:qos_min_rate=6000000000 options:qos_max_rate=7000000000 options:qos_burst=8000000000 options:network_name=phys

# Let's now send ovn controller to sleep, so it will receive both ofport notification and ls deletion simultaneously
sleep_controller hv-1

# Tme to wake up ovs
wake_up_ovs hv-1

# Delete lsp1
check ovn-nbctl --wait=sb lsp-del lsp1

# And finally wake up controller
wake_up_controller hv-1

# Make sure ovn-controller is still OK
ovn-nbctl --wait=hv sync
OVS_WAIT_UNTIL([test $(as hv-1 ovs-vsctl list qos | grep -c linux-htb) -eq 1])

AT_CLEANUP
])

Comment 1 Mark Michelson 2023-06-09 12:48:43 UTC
Upstream patch series posted here: https://patchwork.ozlabs.org/project/ovn/list/?series=358637

Comment 2 OVN Bot 2023-07-18 04:08:38 UTC
ovn23.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2223477

Comment 3 Mark Michelson 2023-08-17 13:59:09 UTC

*** This bug has been marked as a duplicate of bug 2223477 ***