The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1929978 - [OVN] ovn-controller crashing with "failed in flood_remove_flows_for_sb_uuid()"
Summary: [OVN] ovn-controller crashing with "failed in flood_remove_flows_for_sb_uuid()"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn2.13
Version: FDP 20.H
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Numan Siddique
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On: 1928012
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-18 05:22 UTC by Numan Siddique
Modified: 2024-10-01 17:30 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1928012
Environment:
Last Closed: 2021-03-15 14:34:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1103 0 None None None 2024-06-14 00:24:19 UTC
Red Hat Product Errata RHBA-2021:0839 0 None None None 2021-03-15 14:34:59 UTC

Comment 2 Numan Siddique 2021-02-22 07:28:06 UTC
Submitted the patch for review - https://patchwork.ozlabs.org/project/ovn/patch/20210221113424.234801-1-numans@ovn.org/

Comment 5 Jianlin Shi 2021-02-24 02:07:36 UTC
tested with following script:

systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.173.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.173.25
systemctl restart ovn-controller

ovs-vsctl \
    -- add-port br-int vif1 \
    -- set Interface vif1 type=internal external_ids:iface-id=sw0-p1 \
    ofport-request=1

ovs-vsctl set open . external_ids:ovn-monitor-all=true

ovn-nbctl ls-add sw0
ovn-nbctl pg-add pg1
ovn-nbctl pg-add pg2
ovn-nbctl lsp-add sw0 sw0-p2
ovn-nbctl lsp-set-addresses sw0-p2 "00:00:00:00:00:02 192.168.47.2"
ovn-nbctl lsp-add sw0 sw0-p3
ovn-nbctl lsp-set-addresses sw0-p3 "00:00:00:00:00:03 192.168.47.3"

# Pause ovn-northd. When it is resumed, all the below NB updates
# will be sent in one transaction.

ovn-appctl -t ovn-northd pause

ovn-nbctl lsp-add sw0 sw0-p1
ovn-nbctl lsp-set-addresses sw0-p1 "00:00:00:00:00:01 192.168.47.1"
ovn-nbctl pg-set-ports pg1 sw0-p1 sw0-p2
ovn-nbctl pg-set-ports pg2 sw0-p3
ovn-nbctl acl-add pg1 to-lport 1002 "outport == @pg1 && ip4 && ip4.src == \$pg2_ip4 && udp && udp.dst >= 1 && udp.dst <= 65535" allow-related

# resume ovn-northd now. This should result in a single update message
# from SB ovsdb-server to ovn-controller for all the above NB updates.
ovn-appctl -t ovn-northd resume
sleep 5

ovn-nbctl --wait=hv pg-set-ports pg1 sw0-p1 sw0-p2 sw0-p3

reproduced on 20.12.0-20:

[root@wsfd-advnetlab21 bz1929978]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
ovn2.13-central-20.12.0-20.el8fdp.x86_64
openvswitch2.13-2.13.0-95.el8fdp.x86_64                            
ovn2.13-20.12.0-20.el8fdp.x86_64
ovn2.13-host-20.12.0-20.el8fdp.x86_64

[root@wsfd-advnetlab21 bz1929978]# coredumpctl info                                   
           PID: 189852 (ovn-controller)                                          
           UID: 991 (openvswitch)                                               
           GID: 989 (openvswitch)                                                        
        Signal: 6 (ABRT)                                          
     Timestamp: Tue 2021-02-23 21:03:17 EST (1min 31s ago)  
  Command Line: ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfi>
    Executable: /usr/bin/ovn-controller                       
 Control Group: /system.slice/ovn-controller.service
          Unit: ovn-controller.service       
         Slice: system.slice                             
       Boot ID: 63e0e8f704cf4baeae3173198acebc75                 
    Machine ID: 532695c076ab4d7696e8a30b5934d994                  
      Hostname: wsfd-advnetlab21.anl.lab.eng.bos.redhat.com             
       Storage: /var/lib/systemd/coredump/core.ovn-controller.991.63e0e8f704cf4baeae3173198ac>
       Message: Process 189852 (ovn-controller) of user 991 dumped core.
                                                          
                Stack trace of thread 189852:
                #0  0x00007fb4a630e7ff raise (libc.so.6)
                #1  0x00007fb4a62f8c35 abort (libc.so.6) 
                #2  0x000055e7bd513654 ovs_abort_valist (ovn-controller)
                #3  0x000055e7bd51b444 vlog_abort_valist (ovn-controller)
                #4  0x000055e7bd51b4ea vlog_abort (ovn-controller)     
                #5  0x000055e7bd51336b ovs_assert_failure (ovn-controller)
                #6  0x000055e7bd43fa22 flood_remove_flows_for_sb_uuid (ovn-controller)
                #7  0x000055e7bd43fe42 ofctrl_flood_remove_flows (ovn-controller)
                #8  0x000055e7bd43aaa4 lflow_handle_changed_ref (ovn-controller)
                #9  0x000055e7bd457978 _flow_output_resource_ref_handler (ovn-controller)
                #10 0x000055e7bd470853 engine_run (ovn-controller)
                #11 0x000055e7bd42d26c main (ovn-controller)     
                #12 0x00007fb4a62fa7b3 __libc_start_main (libc.so.6)
                #13 0x000055e7bd42e94e _start (ovn-controller) 

[root@wsfd-advnetlab21 bz1929978]# grep EMER /var/log/ovn/ovn-controller.log 
2021-02-24T02:03:17.122Z|00017|util|EMER|controller/ofctrl.c:1199: assertion ovs_list_is_empty(&f->list_node) failed in flood_remove_flows_for_sb_uuid()

Verified on 20.12.0-23:

[root@wsfd-advnetlab21 bz1929978]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
ovn2.13-host-20.12.0-23.el8fdp.x86_64
ovn2.13-central-20.12.0-23.el8fdp.x86_64
openvswitch2.13-2.13.0-95.el8fdp.x86_64
ovn2.13-20.12.0-23.el8fdp.x86_64

[root@wsfd-advnetlab21 bz1929978]# coredumpctl list
No coredumps found.
[root@wsfd-advnetlab21 bz1929978]# grep EMER /var/log/ovn/ovn-controller.log 

<=== ovn-controller didn't crash

Comment 11 Jianlin Shi 2021-03-08 02:52:17 UTC
also no crash on 20.13.0-24 with reproducer in comment 5.

set VERIFIED

Comment 13 errata-xmlrpc 2021-03-15 14:34:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0839


Note You need to log in before you can comment on or make changes to this bug.