Bug 1916900 - [OSP16.1.3] ovn-controller restarts intermittently
Summary: [OSP16.1.3] ovn-controller restarts intermittently
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z4
: 16.1 (Train on RHEL 8.2)
Assignee: ffernand
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On: 1901880 1915389 1917533
Blocks: 1901881
TreeView+ depends on / blocked
 
Reported: 2021-01-15 19:09 UTC by camorris@redhat.co
Modified: 2021-03-17 20:47 UTC (History)
24 users (show)

Fixed In Version: ovn2.13-20.12.0-17.el8fdp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1915389
: 1917533 (view as bug list)
Environment:
Last Closed: 2021-03-17 20:46:59 UTC
Target Upstream Version:
Embargoed:
fhallal: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:0919 0 None None None 2021-03-17 20:47:09 UTC

Description camorris@redhat.co 2021-01-15 19:09:43 UTC
+++ This bug was initially created as a clone of Bug #1915389 +++

+++ This bug was initially created as a clone of Bug #1901880 +++

Description of problem:

Previously, there was this Segfault:
~~~
Segfault is seen with the below trace.  This patch fixes the issue by
checking 'ovnsb_idl_txn' is not NULL before continuing in the function
send_garp_locally().

    #0  ovsdb_idl_txn_insert (txn=0x0, class=0x64b170 <sbrec_table_classes+816>, uuid=0x0) at ../lib/ovsdb-idl.c:3504
    #1  0x000000000041b068 in mac_binding_add (ovnsb_idl_txn=ovnsb_idl_txn@entry=0x0,
    sbrec_mac_binding_by_lport_ip=sbrec_mac_binding_by_lport_ip@entry=0xf6a7e0, logical_port=0xfd0d70 "lr0-pub", dp=0xfd0f20,
    ea=..., ip=0xf67be0 "172.24.4.221") at ../controller/pinctrl.c:3877
    #2  0x000000000041b18b in send_garp_locally (ovnsb_idl_txn=ovnsb_idl_txn@entry=0x0,
    sbrec_mac_binding_by_lport_ip=sbrec_mac_binding_by_lport_ip@entry=0xf6a7e0, local_datapaths=local_datapaths@entry=0xf766f0,
    in_pb=in_pb@entry=0xfd3370, ea=..., ip=3708033196) at ../controller/pinctrl.c:3913
    #3  0x000000000041d1be in send_garp_rarp_update (ovnsb_idl_txn=ovnsb_idl_txn@entry=0x0,
    sbrec_mac_binding_by_lport_ip=sbrec_mac_binding_by_lport_ip@entry=0xf6a7e0, local_datapaths=local_datapaths@entry=0xf766f0,
    binding_rec=0xfd3370, nat_addresses=nat_addresses@entry=0x7ffdb9295b80) at ../controller/pinctrl.c:4118
    #4  0x0000000000425c88 in send_garp_rarp_prepare (active_tunnels=0xf76770, local_datapaths=0xf766f0, chassis=0xfdb4e0,
    br_int=<optimized out>, sbrec_mac_binding_by_lport_ip=0xf6a7e0, sbrec_port_binding_by_name=0xf6a090,
    sbrec_port_binding_by_datapath=<optimized out>, ovnsb_idl_txn=0x0) at ../controller/pinctrl.c:5491
    #5  pinctrl_run (ovnsb_idl_txn=0x0, sbrec_datapath_binding_by_key=<optimized out>,
    sbrec_port_binding_by_datapath=<optimized out>, sbrec_port_binding_by_key=<optimized out>,
    sbrec_port_binding_by_name=0xf6a090, sbrec_mac_binding_by_lport_ip=0xf6a7e0, sbrec_igmp_groups=0xf6ab90,
    sbrec_ip_multicast_opts=0xf6a9c0, dns_table=0xf33960, ce_table=0xf33960, svc_mon_table=0xf33960, br_int=0xf67d50,
    chassis=0xfdb4e0, local_datapaths=0xf766f0, active_tunnels=0xf76770) at ../controller/pinctrl.c:3169
    #6  0x0000000000408e91 in main (argc=<optimized out>, argv=<optimized out>) at ../controller/ovn-controller.c:2789

Originally reported upstream:
http://patchwork.ozlabs.org/project/ovn/patch/20201126085207.645479-1-numans@ovn.org/
~~~

With the hotfix containing ovn2.13-20.12.0-1.el8fdp, ovn-controller restarts are less frequent but still happening.

A coredump will be attached to the bugzilla for further analysis.

Comment 6 ffernand 2021-01-26 15:52:23 UTC
This bz is a [TestOnly]. 

It depends on https://bugzilla.redhat.com/show_bug.cgi?id=1917533

Comment 11 Fouad Hallal 2021-02-05 13:43:48 UTC
I see that Daniel has already answered the needinfo question.  We are releasing 21.a.1 next week and it will have the necessary patches.

Comment 21 errata-xmlrpc 2021-03-17 20:46:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.4 containers bug fix advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0919


Note You need to log in before you can comment on or make changes to this bug.