The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1951502 - [ovn-controller] Crash in merge_tracked_flows() due to tracked deleted flow not having an installed reference.
Summary: [ovn-controller] Crash in merge_tracked_flows() due to tracked deleted flow ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn2.13
Version: FDP 20.H
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Numan Siddique
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks: 1943413
TreeView+ depends on / blocked
 
Reported: 2021-04-20 09:49 UTC by Dumitru Ceara
Modified: 2023-09-15 01:05 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-19 21:36:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Dumitru Ceara 2021-04-20 09:49:33 UTC
Description of problem:

When upgrading an OCP cluster from 4.6 to 4.7 which runs ovn2.13-20.12.0-24, ovn-controller might crash as follows:

(gdb) bt
#0  0x00007fed2fcf284f in raise () from /lib64/libc.so.6
#1  0x00007fed2fcdcc45 in abort () from /lib64/libc.so.6
#2  0x0000556945a939a4 in ovs_abort_valist (err_no=err_no@entry=0, format=format@entry=0x556945b73eb0 "%s: assertion %s failed in %s()", args=args@entry=0x7ffd05af0b10) at lib/util.c:419
#3  0x0000556945a9b794 in vlog_abort_valist (module_=<optimized out>, message=0x556945b73eb0 "%s: assertion %s failed in %s()", args=args@entry=0x7ffd05af0b10) at lib/vlog.c:1249
#4  0x0000556945a9b83a in vlog_abort (module=module@entry=0x556945e30e80 <this_module>, message=message@entry=0x556945b73eb0 "%s: assertion %s failed in %s()") at lib/vlog.c:1263
#5  0x0000556945a936bb in ovs_assert_failure (where=where@entry=0x556945b4e9a6 "controller/ofctrl.c:1917", function=function@entry=0x556945b4eff0 <__func__.34362> "merge_tracked_flows", 
    condition=condition@entry=0x556945b4e990 "del_f->installed_flow") at lib/util.c:86
#6  0x00005569459bfbaa in merge_tracked_flows (flow_table=0x5569470c5b40) at /usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283
#7  update_installed_flows_by_track (msgs=0x7ffd05af0c40, flow_table=0x5569470c5b40) at controller/ofctrl.c:1946
#8  ofctrl_put (flow_table=flow_table@entry=0x5569470c5b40, pending_ct_zones=pending_ct_zones@entry=0x5569470ee560, meter_table=<optimized out>, req_cfg=req_cfg@entry=0, flow_changed=flow_changed@entry=true) at controller/ofctrl.c:2130
#9  0x00005569459abdae in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:2931

(gdb) frame 6
#6  0x00005569459bfbaa in merge_tracked_flows (flow_table=0x5569470c5b40) at /usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283
283             hmap_expand_at(hmap, where);

(gdb) p del_f->installed_flow
$2 = (struct installed_flow *) 0x0
(gdb) list ofctrl.c:1917
1912                    continue;
1913                }
1914
1915                /* del_f must have been installed, otherwise it should have been
1916                 * removed during track_flow_add_or_modify. */
1917                ovs_assert(del_f->installed_flow);
1918
1919                if (!f->installed_flow) {
1920                    /* f is not installed yet. */
1921                    replace_installed_to_desired(del_f->installed_flow, del_f, f);

Coredumps are stored in the original OCP BZ, at:
https://bugzilla.redhat.com/attachment.cgi?id=1772554

Steps to decode the coredump on coreos:
https://bugzilla.redhat.com/show_bug.cgi?id=1943413#c36

This BZ is opened to investigate if the crash is a new issue or if it's indirectly adressed by recent fixes to ofctrl I-P:
[1] https://github.com/ovn-org/ovn/commit/c6c61b4e3462fb5201a61a226c2acaf6f4caf917
[2] https://github.com/ovn-org/ovn/commit/858d1dd716db1a1e664a7c1737fd34f04fcbda5e
[3] https://github.com/ovn-org/ovn/commit/6975c649f932633042ca54df2d8f8f0eb866c344

Version-Release number of selected component (if applicable):

ovn2.13-20.12.0-24.el8fdp.x86_64


How reproducible:

Exact steps unknown yet, the problem was only spotted until now on OCP 4.7 deployments after upgrades from 4.6.

Comment 1 Dumitru Ceara 2021-04-20 12:41:08 UTC
After closer investigation the -24 version includes all upstream ofctrl
fixes.  This looks like a new issue.

Comment 16 Red Hat Bugzilla 2023-09-15 01:05:20 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.