Description of problem: When upgrading an OCP cluster from 4.6 to 4.7 which runs ovn2.13-20.12.0-24, ovn-controller might crash as follows: (gdb) bt #0 0x00007fed2fcf284f in raise () from /lib64/libc.so.6 #1 0x00007fed2fcdcc45 in abort () from /lib64/libc.so.6 #2 0x0000556945a939a4 in ovs_abort_valist (err_no=err_no@entry=0, format=format@entry=0x556945b73eb0 "%s: assertion %s failed in %s()", args=args@entry=0x7ffd05af0b10) at lib/util.c:419 #3 0x0000556945a9b794 in vlog_abort_valist (module_=<optimized out>, message=0x556945b73eb0 "%s: assertion %s failed in %s()", args=args@entry=0x7ffd05af0b10) at lib/vlog.c:1249 #4 0x0000556945a9b83a in vlog_abort (module=module@entry=0x556945e30e80 <this_module>, message=message@entry=0x556945b73eb0 "%s: assertion %s failed in %s()") at lib/vlog.c:1263 #5 0x0000556945a936bb in ovs_assert_failure (where=where@entry=0x556945b4e9a6 "controller/ofctrl.c:1917", function=function@entry=0x556945b4eff0 <__func__.34362> "merge_tracked_flows", condition=condition@entry=0x556945b4e990 "del_f->installed_flow") at lib/util.c:86 #6 0x00005569459bfbaa in merge_tracked_flows (flow_table=0x5569470c5b40) at /usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283 #7 update_installed_flows_by_track (msgs=0x7ffd05af0c40, flow_table=0x5569470c5b40) at controller/ofctrl.c:1946 #8 ofctrl_put (flow_table=flow_table@entry=0x5569470c5b40, pending_ct_zones=pending_ct_zones@entry=0x5569470ee560, meter_table=<optimized out>, req_cfg=req_cfg@entry=0, flow_changed=flow_changed@entry=true) at controller/ofctrl.c:2130 #9 0x00005569459abdae in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:2931 (gdb) frame 6 #6 0x00005569459bfbaa in merge_tracked_flows (flow_table=0x5569470c5b40) at /usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283 283 hmap_expand_at(hmap, where); (gdb) p del_f->installed_flow $2 = (struct installed_flow *) 0x0 (gdb) list ofctrl.c:1917 1912 continue; 1913 } 1914 1915 /* del_f must have been installed, otherwise it should have been 1916 * removed during track_flow_add_or_modify. */ 1917 ovs_assert(del_f->installed_flow); 1918 1919 if (!f->installed_flow) { 1920 /* f is not installed yet. */ 1921 replace_installed_to_desired(del_f->installed_flow, del_f, f); Coredumps are stored in the original OCP BZ, at: https://bugzilla.redhat.com/attachment.cgi?id=1772554 Steps to decode the coredump on coreos: https://bugzilla.redhat.com/show_bug.cgi?id=1943413#c36 This BZ is opened to investigate if the crash is a new issue or if it's indirectly adressed by recent fixes to ofctrl I-P: [1] https://github.com/ovn-org/ovn/commit/c6c61b4e3462fb5201a61a226c2acaf6f4caf917 [2] https://github.com/ovn-org/ovn/commit/858d1dd716db1a1e664a7c1737fd34f04fcbda5e [3] https://github.com/ovn-org/ovn/commit/6975c649f932633042ca54df2d8f8f0eb866c344 Version-Release number of selected component (if applicable): ovn2.13-20.12.0-24.el8fdp.x86_64 How reproducible: Exact steps unknown yet, the problem was only spotted until now on OCP 4.7 deployments after upgrades from 4.6.
After closer investigation the -24 version includes all upstream ofctrl fixes. This looks like a new issue.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days