Bug 1916900

Summary: [OSP16.1.3] ovn-controller restarts intermittently
Product: Red Hat OpenStack Reporter: camorris@redhat.co <camorris>
Component: openvswitchAssignee: ffernand <ffernand>
Status: CLOSED ERRATA QA Contact: Eran Kuris <ekuris>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 16.1 (Train)CC: apevec, bhaley, camorris, chrisw, ctrautma, dalvarez, dceara, ekuris, ffernand, fhallal, jishi, jlibosva, ksambor, lhh, majopela, mamorim, pgrist, pmannidi, ralongi, rhos-maint, scohen, spower, tvignaud, zfang
Target Milestone: z4Keywords: Regression, TestOnly, Triaged
Target Release: 16.1 (Train on RHEL 8.2)Flags: fhallal: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn2.13-20.12.0-17.el8fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1915389
: 1917533 (view as bug list) Environment:
Last Closed: 2021-03-17 20:46:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1901880, 1915389, 1917533    
Bug Blocks: 1901881    

Description camorris@redhat.co 2021-01-15 19:09:43 UTC
+++ This bug was initially created as a clone of Bug #1915389 +++

+++ This bug was initially created as a clone of Bug #1901880 +++

Description of problem:

Previously, there was this Segfault:
~~~
Segfault is seen with the below trace.  This patch fixes the issue by
checking 'ovnsb_idl_txn' is not NULL before continuing in the function
send_garp_locally().

    #0  ovsdb_idl_txn_insert (txn=0x0, class=0x64b170 <sbrec_table_classes+816>, uuid=0x0) at ../lib/ovsdb-idl.c:3504
    #1  0x000000000041b068 in mac_binding_add (ovnsb_idl_txn=ovnsb_idl_txn@entry=0x0,
    sbrec_mac_binding_by_lport_ip=sbrec_mac_binding_by_lport_ip@entry=0xf6a7e0, logical_port=0xfd0d70 "lr0-pub", dp=0xfd0f20,
    ea=..., ip=0xf67be0 "172.24.4.221") at ../controller/pinctrl.c:3877
    #2  0x000000000041b18b in send_garp_locally (ovnsb_idl_txn=ovnsb_idl_txn@entry=0x0,
    sbrec_mac_binding_by_lport_ip=sbrec_mac_binding_by_lport_ip@entry=0xf6a7e0, local_datapaths=local_datapaths@entry=0xf766f0,
    in_pb=in_pb@entry=0xfd3370, ea=..., ip=3708033196) at ../controller/pinctrl.c:3913
    #3  0x000000000041d1be in send_garp_rarp_update (ovnsb_idl_txn=ovnsb_idl_txn@entry=0x0,
    sbrec_mac_binding_by_lport_ip=sbrec_mac_binding_by_lport_ip@entry=0xf6a7e0, local_datapaths=local_datapaths@entry=0xf766f0,
    binding_rec=0xfd3370, nat_addresses=nat_addresses@entry=0x7ffdb9295b80) at ../controller/pinctrl.c:4118
    #4  0x0000000000425c88 in send_garp_rarp_prepare (active_tunnels=0xf76770, local_datapaths=0xf766f0, chassis=0xfdb4e0,
    br_int=<optimized out>, sbrec_mac_binding_by_lport_ip=0xf6a7e0, sbrec_port_binding_by_name=0xf6a090,
    sbrec_port_binding_by_datapath=<optimized out>, ovnsb_idl_txn=0x0) at ../controller/pinctrl.c:5491
    #5  pinctrl_run (ovnsb_idl_txn=0x0, sbrec_datapath_binding_by_key=<optimized out>,
    sbrec_port_binding_by_datapath=<optimized out>, sbrec_port_binding_by_key=<optimized out>,
    sbrec_port_binding_by_name=0xf6a090, sbrec_mac_binding_by_lport_ip=0xf6a7e0, sbrec_igmp_groups=0xf6ab90,
    sbrec_ip_multicast_opts=0xf6a9c0, dns_table=0xf33960, ce_table=0xf33960, svc_mon_table=0xf33960, br_int=0xf67d50,
    chassis=0xfdb4e0, local_datapaths=0xf766f0, active_tunnels=0xf76770) at ../controller/pinctrl.c:3169
    #6  0x0000000000408e91 in main (argc=<optimized out>, argv=<optimized out>) at ../controller/ovn-controller.c:2789

Originally reported upstream:
http://patchwork.ozlabs.org/project/ovn/patch/20201126085207.645479-1-numans@ovn.org/
~~~

With the hotfix containing ovn2.13-20.12.0-1.el8fdp, ovn-controller restarts are less frequent but still happening.

A coredump will be attached to the bugzilla for further analysis.

Comment 6 ffernand 2021-01-26 15:52:23 UTC
This bz is a [TestOnly]. 

It depends on https://bugzilla.redhat.com/show_bug.cgi?id=1917533

Comment 11 Fouad Hallal 2021-02-05 13:43:48 UTC
I see that Daniel has already answered the needinfo question.  We are releasing 21.a.1 next week and it will have the necessary patches.

Comment 21 errata-xmlrpc 2021-03-17 20:46:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 containers bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0919