Bug 1951502
| Summary: | [ovn-controller] Crash in merge_tracked_flows() due to tracked deleted flow not having an installed reference. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Dumitru Ceara <dceara> |
| Component: | ovn2.13 | Assignee: | Numan Siddique <nusiddiq> |
| Status: | CLOSED NOTABUG | QA Contact: | Jianlin Shi <jishi> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | FDP 20.H | CC: | ctrautma, dcbw, jishi, nusiddiq, ralongi, rkhan, sbatsche, wking, zzhao |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-19 21:36:57 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1943413 | ||
After closer investigation the -24 version includes all upstream ofctrl fixes. This looks like a new issue. The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |
Description of problem: When upgrading an OCP cluster from 4.6 to 4.7 which runs ovn2.13-20.12.0-24, ovn-controller might crash as follows: (gdb) bt #0 0x00007fed2fcf284f in raise () from /lib64/libc.so.6 #1 0x00007fed2fcdcc45 in abort () from /lib64/libc.so.6 #2 0x0000556945a939a4 in ovs_abort_valist (err_no=err_no@entry=0, format=format@entry=0x556945b73eb0 "%s: assertion %s failed in %s()", args=args@entry=0x7ffd05af0b10) at lib/util.c:419 #3 0x0000556945a9b794 in vlog_abort_valist (module_=<optimized out>, message=0x556945b73eb0 "%s: assertion %s failed in %s()", args=args@entry=0x7ffd05af0b10) at lib/vlog.c:1249 #4 0x0000556945a9b83a in vlog_abort (module=module@entry=0x556945e30e80 <this_module>, message=message@entry=0x556945b73eb0 "%s: assertion %s failed in %s()") at lib/vlog.c:1263 #5 0x0000556945a936bb in ovs_assert_failure (where=where@entry=0x556945b4e9a6 "controller/ofctrl.c:1917", function=function@entry=0x556945b4eff0 <__func__.34362> "merge_tracked_flows", condition=condition@entry=0x556945b4e990 "del_f->installed_flow") at lib/util.c:86 #6 0x00005569459bfbaa in merge_tracked_flows (flow_table=0x5569470c5b40) at /usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283 #7 update_installed_flows_by_track (msgs=0x7ffd05af0c40, flow_table=0x5569470c5b40) at controller/ofctrl.c:1946 #8 ofctrl_put (flow_table=flow_table@entry=0x5569470c5b40, pending_ct_zones=pending_ct_zones@entry=0x5569470ee560, meter_table=<optimized out>, req_cfg=req_cfg@entry=0, flow_changed=flow_changed@entry=true) at controller/ofctrl.c:2130 #9 0x00005569459abdae in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:2931 (gdb) frame 6 #6 0x00005569459bfbaa in merge_tracked_flows (flow_table=0x5569470c5b40) at /usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283 283 hmap_expand_at(hmap, where); (gdb) p del_f->installed_flow $2 = (struct installed_flow *) 0x0 (gdb) list ofctrl.c:1917 1912 continue; 1913 } 1914 1915 /* del_f must have been installed, otherwise it should have been 1916 * removed during track_flow_add_or_modify. */ 1917 ovs_assert(del_f->installed_flow); 1918 1919 if (!f->installed_flow) { 1920 /* f is not installed yet. */ 1921 replace_installed_to_desired(del_f->installed_flow, del_f, f); Coredumps are stored in the original OCP BZ, at: https://bugzilla.redhat.com/attachment.cgi?id=1772554 Steps to decode the coredump on coreos: https://bugzilla.redhat.com/show_bug.cgi?id=1943413#c36 This BZ is opened to investigate if the crash is a new issue or if it's indirectly adressed by recent fixes to ofctrl I-P: [1] https://github.com/ovn-org/ovn/commit/c6c61b4e3462fb5201a61a226c2acaf6f4caf917 [2] https://github.com/ovn-org/ovn/commit/858d1dd716db1a1e664a7c1737fd34f04fcbda5e [3] https://github.com/ovn-org/ovn/commit/6975c649f932633042ca54df2d8f8f0eb866c344 Version-Release number of selected component (if applicable): ovn2.13-20.12.0-24.el8fdp.x86_64 How reproducible: Exact steps unknown yet, the problem was only spotted until now on OCP 4.7 deployments after upgrades from 4.6.