Created attachment 1714556 [details] core dump Description of problem: With the recent addition of incremental processing for OVS flow installation (ofctrl.c) a bug was introduced that causes ovn-controller to abort when an assertion fails while removing a logical flow. This BZ is to track the upstream work to fix the issue. #0 0x00007f90031cd625 in raise () from /lib64/libc.so.6 #1 0x00007f90031b68d9 in abort () from /lib64/libc.so.6 #2 0x00005607d7416b84 in ovs_abort_valist (err_no=err_no@entry=0, format=format@entry=0x5607d74f6b70 "%s: assertion %s failed in %s()", args=args@entry=0x7ffeccadb7a0) at lib/util.c:419 #3 0x00005607d741e965 in vlog_abort_valist (module_=<optimized out>, message=0x5607d74f6b70 "%s: assertion %s failed in %s()", args=args@entry=0x7ffeccadb7a0) at lib/vlog.c:1249 #4 0x00005607d741ea0a in vlog_abort (module=module@entry=0x5607d75b0cc0 <this_module>, message=message@entry=0x5607d74f6b70 "%s: assertion %s failed in %s()") at lib/vlog.c:1263 #5 0x00005607d741689b in ovs_assert_failure (where=where@entry=0x5607d74d2244 "controller/ofctrl.c:1108", function=function@entry=0x5607d74d29e0 <__func__.33006> "flood_remove_flows_for_sb_uuid", condition=condition@entry=0x5607d74d27f0 "ovs_list_is_empty(&f->list_node)") at lib/util.c:86 #6 0x00005607d73502fa in flood_remove_flows_for_sb_uuid (flow_table=flow_table@entry=0x5607d8c4b380, sb_uuid=sb_uuid@entry=0x5607d8f91430, flood_remove_nodes=flood_remove_nodes@entry=0x7ffeccadb9c0) at controller/ofctrl.c:1135 #7 0x00005607d73503f2 in ofctrl_flood_remove_flows (flow_table=0x5607d8c4b380, flood_remove_nodes=flood_remove_nodes@entry=0x7ffeccadb9c0) at controller/ofctrl.c:1160 #8 0x00005607d734a8ea in lflow_handle_changed_flows (l_ctx_in=<optimized out>, l_ctx_out=0x7ffeccadba50) at controller/lflow.c:467 #9 0x00005607d7364375 in flow_output_sb_logical_flow_handler (node=0x7ffeccae0f90, data=0x5607d8c4b380) at controller/ovn-controller.c:1865 #10 0x00005607d737c933 in engine_compute (recompute_allowed=<optimized out>, node=<optimized out>) at lib/inc-proc-eng.c:306 #11 engine_run_node (recompute_allowed=<optimized out>, node=0x7ffeccae0f90) at lib/inc-proc-eng.c:352 #12 engine_run (recompute_allowed=<optimized out>) at lib/inc-proc-eng.c:377 #13 0x00005607d733fbcf in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:2546 Version-Release number of selected component (if applicable): The code that introduced this is only available in upstream: OVS revision: 5198e8a06928e3324e6fd11f6209c336611dffd2 OVN revision: 520189bf313054702f5f802acd7944cca3b6baaa Steps to load the core dump: docker run --detach --name ovn-bug --rm fedora:31 sleep infinity docker cp ovn-controller-6.core.1698.ovn-master.gz ovn-bug:/ docker exec -it ovn-bug bash # in the container dnf install -y gcc gdb git make libtool autoconf rpm-build dnf install -y checkpolicy desktop-file-utils gcc-c++ graphviz groff libcap-ng-devel openssl-devel procps-ng python3-devel selinux-policy-devel systemd-units unbound unbound-devel python3-sphinx git clone https://github.com/openvswitch/ovs pushd ovs git checkout 5198e8a06928e3324e6fd11f6209c336611dffd2 ./boot.sh && ./configure && make rpm-fedora popd git clone https://github.com/ovn-org/ovn pushd ovn git checkout 520189bf313054702f5f802acd7944cca3b6baaa ./boot.sh && ./configure --with-ovs-source=/ovs && make rpm-fedora popd dnf localinstall -y /ovs/rpm/rpmbuild/RPMS/x86_64/openvswitch-debuginfo-2.14.90-1.fc31.x86_64.rpm /ovs/rpm/rpmbuild/RPMS/noarch/openvswitch-selinux-policy-2.14.90-1.fc31.noarch.rpm /ovs/rpm/rpmbuild/RPMS/x86_64/openvswitch-2.14.90-1.fc31.x86_64.rpm /ovn/rpm/rpmbuild/RPMS/x86_64/ovn-20.06.90-1.fc31.x86_64.rpm /ovn/rpm/rpmbuild/RPMS/x86_64/ovn-debuginfo-20.06.90-1.fc31.x86_64.rpm /ovn/rpm/rpmbuild/RPMS/x86_64/ovn-host-20.06.90-1.fc31.x86_64.rpm /ovn/rpm/rpmbuild/RPMS/x86_64/ovn-host-debuginfo-20.06.90-1.fc31.x86_64.rpm gunzip ovn-controller-6.core.1698.ovn-master.gz gdb /usr/bin/ovn-controller ovn-controller-6.core.1698.ovn-master Also attaching the NB/SB databases and the conf.db of the node where ovn-controller crashed.
Created attachment 1714559 [details] NB/SB/conf databases.
Simpler way to replicate the bug: make sandbox ovn-nbctl ls-add ls ovn-nbctl lsp-add ls vm1 ovn-nbctl acl-add lsp-set-addresses vm1 "0a:58:fc:09:1d:4e fd00:10:244:1::5" ovn-nbctl lsp-set-addresses vm1 "0a:58:fc:09:1d:4e fd00:10:244:1::5" ovn-nbctl lsp-set-port-security vm1 "0a:58:fc:09:1d:4e fd00:10:244:1::5" ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal -- set interface vm1 external-ids:iface-id=vm1 sleep 1 ovs-vsctl set interface vm1 external-ids:iface-id=foo sleep 1 ovs-vsctl set interface vm1 external-ids:iface-id=vm1 sleep 1 ovs-vsctl set interface vm1 external-ids:iface-id=foo
Fixed upstream by: https://github.com/ovn-org/ovn/commit/5b0c2dc286770663656befb8080b309869845c4a