Bug 1930030

Summary: ovn-controller is crashing with the assertion - EMER|controller/binding.c:2507: assertion lb->pb && lb->iface failed in binding_seqno_run()
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Numan Siddique <nusiddiq>
Component: ovn2.13Assignee: Dumitru Ceara <dceara>
Status: CLOSED ERRATA QA Contact: Zhiqiang Fang <zfang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: FDP 20.HCC: ctrautma, dceara, jishi, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-15 14:34:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The test script and output none

Description Numan Siddique 2021-02-18 09:08:08 UTC
Description of problem:

This is seen with ovn2.13-20.12.0-20.

bt


(gdb) bt
#0  0x00007fc2cad7670f in raise () from /lib64/libc.so.6
#1  0x00007fc2cad60b25 in abort () from /lib64/libc.so.6
#2  0x00005613f7c76654 in ovs_abort_valist ()
#3  0x00005613f7c7e444 in vlog_abort_valist ()
#4  0x00005613f7c7e4ea in vlog_abort ()
#5  0x00005613f7c7636b in ovs_assert_failure ()
#6  0x00005613f7b9684a in binding_seqno_run ()
#7  0x00005613f7b90aa9 in main ()



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Dumitru Ceara 2021-02-18 16:43:49 UTC
Fix posted for review upstream: http://patchwork.ozlabs.org/project/ovn/patch/1613666578-23685-1-git-send-email-dceara@redhat.com/

Comment 6 Jianlin Shi 2021-03-05 06:52:46 UTC
Hi Dumitru,

Is there reproducer for the crash?

Comment 7 Dumitru Ceara 2021-03-05 08:09:38 UTC
(In reply to Jianlin Shi from comment #6)
> Hi Dumitru,

Hi Jianlin,

> 
> Is there reproducer for the crash?

Unfortunately, no.  The window to reproduce the crash is quite narrow as it
requires an OVSDB transaction towards the local OVS DB to still be in progress
when a logical port is removed from the Southbound DB.

The only way I could reproduce the crash was by instrumenting the
ovn-controller code.

I guess it would be ideal if we could do a smoke test of the whole
functionality of the RFE in bug 1839102.

Thanks,
Dumitru

Comment 8 Zhiqiang Fang 2021-03-10 18:10:40 UTC
We ran the tests from bug 1839102 by repeatedly adding 100 lsp or modifying the db. The result is good and we didn't see coredump.

[root@netqe20 ~]# rpm -qa | egrep "ovn|openv"
ovn2.13-20.12.0-22.el8fdn.x86_64
ovn2.13-central-20.12.0-22.el8fdn.x86_64
openvswitch2.13-2.13.0-79.5.el8fdp.x86_64
ovn2.13-host-20.12.0-22.el8fdn.x86_64
openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch

Attaching the test script and output.
Thanks.

Comment 9 Zhiqiang Fang 2021-03-10 18:12:02 UTC
Created attachment 1762395 [details]
The test script and output

Comment 11 errata-xmlrpc 2021-03-15 14:34:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0839