The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 2213219 - Potential exception in ovn-controller when deleting a swith port on which qos was applied
Summary: Potential exception in ovn-controller when deleting a swith port on which qos...
Keywords:
Status: CLOSED DUPLICATE of bug 2223477
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn23.06
Version: FDP 22.H
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: xsimonar
QA Contact: ying xu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-07 14:02 UTC by xsimonar
Modified: 2023-08-17 13:59 UTC (History)
4 users (show)

Fixed In Version: ovn23.06-23.06.0-36.el8fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-17 13:59:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2942 0 None None None 2023-06-09 12:50:53 UTC

Description xsimonar 2023-06-07 14:02:36 UTC
If an interface with an qos option is deleted at the same time as an ofport notification from ovs (causing runtime_data recompute) is received, the binding module is trying to delete twice the same qos queue, causing ovs to raise an exception.

#0  0x00007f378d15b2a2 in raise () from /lib64/libc.so.6
#1  0x00007f378d1448a4 in abort () from /lib64/libc.so.6
#2  0x0000000000509c4e in ovs_abort_valist (err_no=err_no@entry=0, format=format@entry=0x604270 "%s: assertion %s failed in %s()", args=args@entry=0x7ffe27e25d78) at lib/util.c:447
#3  0x00000000005116a1 in vlog_abort_valist (module_=<optimized out>, message=0x604270 "%s: assertion %s failed in %s()", args=args@entry=0x7ffe27e25d78) at lib/vlog.c:1286
#4  0x0000000000511737 in vlog_abort (module=module@entry=0x6d73e0 <this_module>, message=message@entry=0x604270 "%s: assertion %s failed in %s()") at lib/vlog.c:1300
#5  0x0000000000509981 in ovs_assert_failure (where=where@entry=0x5ffcb4 "lib/ovsdb-idl.c:3774", function=function@entry=0x6008f0 <__func__.3> "ovsdb_idl_txn_delete", condition=condition@entry=0x5ffa37 "row->new_datum != NULL")
    at lib/util.c:89
#6  0x00000000004f37f4 in ovsdb_idl_txn_delete (row_=0x1a353c0) at lib/ovsdb-idl.c:3774
#7  0x0000000000411cfb in ovs_qos_entries_gc (queue_map=0x1996670, qos_table=<optimized out>, ovsrec_port_by_qos=0x193ad10, ovs_idl_txn=<optimized out>) at controller/binding.c:427
#8  binding_run (b_ctx_in=b_ctx_in@entry=0x7ffe27e25fc0, b_ctx_out=b_ctx_out@entry=0x7ffe27e25f50) at controller/binding.c:2128
#9  0x000000000043b309 in en_runtime_data_run (node=0x7ffe27e2a610, data=0x1996590) at controller/ovn-controller.c:1670
#10 0x0000000000463c58 in engine_recompute (node=node@entry=0x7ffe27e2a610, allowed=allowed@entry=true, reason_fmt=reason_fmt@entry=0x5d91a3 "failed handler for input %s") at lib/inc-proc-eng.c:415
#11 0x00000000004645ed in engine_compute (recompute_allowed=<optimized out>, node=<optimized out>) at lib/inc-proc-eng.c:454
#12 engine_run_node (recompute_allowed=true, node=0x7ffe27e2a610) at lib/inc-proc-eng.c:503
#13 engine_run (recompute_allowed=recompute_allowed@entry=true) at lib/inc-proc-eng.c:528
#14 0x000000000040ac0f in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:5242

Reproduced using following unit test on origin/main:

sleep_controller() {
  echo Controller $hv going to sleep
  hv=$1
  as $hv
  check ovn-appctl debug/pause
  OVS_WAIT_UNTIL([test x$(ovn-appctl -t ovn-controller debug/status) = "xpaused"])
}
wake_up_controller() {
  hv=$1
  as $hv
  echo Controller $hv waking up
  ovn-appctl debug/resume
  OVS_WAIT_UNTIL([test x$(ovn-appctl -t ovn-controller debug/status) = "xrunning"])
}
sleep_ovs() {
  hv=$1
  echo ovs $hv going to sleep
  AT_CHECK([kill -STOP $(cat $hv/ovs-vswitchd.pid)])
}

wake_up_ovs() {
  hv=$1
  echo ovs $hv going to sleep
  AT_CHECK([kill -CONT $(cat $hv/ovs-vswitchd.pid)])
}

OVN_FOR_EACH_NORTHD([
AT_SETUP([OVN QoS port deletion])
ovn_start

check ovn-nbctl ls-add ls1
check ovn-nbctl lsp-add ls1 public1
check ovn-nbctl lsp-set-addresses public1 unknown
check ovn-nbctl lsp-set-type public1 localnet
check ovn-nbctl lsp-set-options public1 network_name=phys
net_add n

# two hypervisors, each connected to the same network
for i in 1 2; do
    sim_add hv-$i
    as hv-$i
    ovs-vsctl add-br br-phys
    ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
    ovn_attach n br-phys 192.168.0.$i
done

check ovn-nbctl lsp-add ls1 lsp1
check ovn-nbctl lsp-set-addresses lsp1 f0:00:00:00:00:03
as hv-1
ovs-vsctl add-port br-int vif1 -- \
    set Interface vif1 external-ids:iface-id=lsp1 \
    ofport-request=3

OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp1` = xup])

check ovn-nbctl set Logical_Switch_Port lsp1 options:qos_max_rate=800000
check ovn-nbctl --wait=hv set Logical_Switch_Port lsp1 options:qos_burst=9000000

AS_BOX([$(date +%H:%M:%S.%03N) checking deletion of port with qos options])
check ovn-nbctl ls-add ls2
check ovn-nbctl lsp-add ls2 lsp2
check ovn-nbctl lsp-set-addresses lsp2 f0:00:00:00:00:05
as hv-1
ovs-vsctl add-port br-int vif2 -- \
    set Interface vif2 external-ids:iface-id=lsp2 \
    ofport-request=5
OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up lsp2` = xup])

# Sleep ovs to postpone ofport notification to ovn
sleep_ovs hv-1
# Create localnet; this will cause patch-port creation
check ovn-nbctl lsp-add ls2 public2
check ovn-nbctl lsp-set-addresses public2 unknown
check ovn-nbctl lsp-set-type public2 localnet
check ovn-nbctl --wait=sb set Logical_Switch_Port public2 options:qos_min_rate=6000000000 options:qos_max_rate=7000000000 options:qos_burst=8000000000 options:network_name=phys

# Let's now send ovn controller to sleep, so it will receive both ofport notification and ls deletion simultaneously
sleep_controller hv-1

# Tme to wake up ovs
wake_up_ovs hv-1

# Delete lsp1
check ovn-nbctl --wait=sb lsp-del lsp1

# And finally wake up controller
wake_up_controller hv-1

# Make sure ovn-controller is still OK
ovn-nbctl --wait=hv sync
OVS_WAIT_UNTIL([test $(as hv-1 ovs-vsctl list qos | grep -c linux-htb) -eq 1])

AT_CLEANUP
])

Comment 1 Mark Michelson 2023-06-09 12:48:43 UTC
Upstream patch series posted here: https://patchwork.ozlabs.org/project/ovn/list/?series=358637

Comment 2 OVN Bot 2023-07-18 04:08:38 UTC
ovn23.06 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2223477

Comment 3 Mark Michelson 2023-08-17 13:59:09 UTC

*** This bug has been marked as a duplicate of bug 2223477 ***


Note You need to log in before you can comment on or make changes to this bug.