Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1951502

Summary: [ovn-controller] Crash in merge_tracked_flows() due to tracked deleted flow not having an installed reference.
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Dumitru Ceara <dceara>
Component: ovn2.13Assignee: Numan Siddique <nusiddiq>
Status: CLOSED NOTABUG QA Contact: Jianlin Shi <jishi>
Severity: urgent Docs Contact:
Priority: urgent    
Version: FDP 20.HCC: ctrautma, dcbw, jishi, nusiddiq, ralongi, rkhan, sbatsche, wking, zzhao
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-19 21:36:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1943413    

Description Dumitru Ceara 2021-04-20 09:49:33 UTC
Description of problem:

When upgrading an OCP cluster from 4.6 to 4.7 which runs ovn2.13-20.12.0-24, ovn-controller might crash as follows:

(gdb) bt
#0  0x00007fed2fcf284f in raise () from /lib64/libc.so.6
#1  0x00007fed2fcdcc45 in abort () from /lib64/libc.so.6
#2  0x0000556945a939a4 in ovs_abort_valist (err_no=err_no@entry=0, format=format@entry=0x556945b73eb0 "%s: assertion %s failed in %s()", args=args@entry=0x7ffd05af0b10) at lib/util.c:419
#3  0x0000556945a9b794 in vlog_abort_valist (module_=<optimized out>, message=0x556945b73eb0 "%s: assertion %s failed in %s()", args=args@entry=0x7ffd05af0b10) at lib/vlog.c:1249
#4  0x0000556945a9b83a in vlog_abort (module=module@entry=0x556945e30e80 <this_module>, message=message@entry=0x556945b73eb0 "%s: assertion %s failed in %s()") at lib/vlog.c:1263
#5  0x0000556945a936bb in ovs_assert_failure (where=where@entry=0x556945b4e9a6 "controller/ofctrl.c:1917", function=function@entry=0x556945b4eff0 <__func__.34362> "merge_tracked_flows", 
    condition=condition@entry=0x556945b4e990 "del_f->installed_flow") at lib/util.c:86
#6  0x00005569459bfbaa in merge_tracked_flows (flow_table=0x5569470c5b40) at /usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283
#7  update_installed_flows_by_track (msgs=0x7ffd05af0c40, flow_table=0x5569470c5b40) at controller/ofctrl.c:1946
#8  ofctrl_put (flow_table=flow_table@entry=0x5569470c5b40, pending_ct_zones=pending_ct_zones@entry=0x5569470ee560, meter_table=<optimized out>, req_cfg=req_cfg@entry=0, flow_changed=flow_changed@entry=true) at controller/ofctrl.c:2130
#9  0x00005569459abdae in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:2931

(gdb) frame 6
#6  0x00005569459bfbaa in merge_tracked_flows (flow_table=0x5569470c5b40) at /usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283
283             hmap_expand_at(hmap, where);

(gdb) p del_f->installed_flow
$2 = (struct installed_flow *) 0x0
(gdb) list ofctrl.c:1917
1912                    continue;
1913                }
1914
1915                /* del_f must have been installed, otherwise it should have been
1916                 * removed during track_flow_add_or_modify. */
1917                ovs_assert(del_f->installed_flow);
1918
1919                if (!f->installed_flow) {
1920                    /* f is not installed yet. */
1921                    replace_installed_to_desired(del_f->installed_flow, del_f, f);

Coredumps are stored in the original OCP BZ, at:
https://bugzilla.redhat.com/attachment.cgi?id=1772554

Steps to decode the coredump on coreos:
https://bugzilla.redhat.com/show_bug.cgi?id=1943413#c36

This BZ is opened to investigate if the crash is a new issue or if it's indirectly adressed by recent fixes to ofctrl I-P:
[1] https://github.com/ovn-org/ovn/commit/c6c61b4e3462fb5201a61a226c2acaf6f4caf917
[2] https://github.com/ovn-org/ovn/commit/858d1dd716db1a1e664a7c1737fd34f04fcbda5e
[3] https://github.com/ovn-org/ovn/commit/6975c649f932633042ca54df2d8f8f0eb866c344

Version-Release number of selected component (if applicable):

ovn2.13-20.12.0-24.el8fdp.x86_64


How reproducible:

Exact steps unknown yet, the problem was only spotted until now on OCP 4.7 deployments after upgrades from 4.6.

Comment 1 Dumitru Ceara 2021-04-20 12:41:08 UTC
After closer investigation the -24 version includes all upstream ofctrl
fixes.  This looks like a new issue.

Comment 16 Red Hat Bugzilla 2023-09-15 01:05:20 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days