Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1929978

Summary: [OVN] ovn-controller crashing with "failed in flood_remove_flows_for_sb_uuid()"
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Numan Siddique <nusiddiq>
Component: ovn2.13Assignee: Numan Siddique <nusiddiq>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: high Docs Contact:
Priority: high    
Version: FDP 20.HCC: ctrautma, dsedgmen, ffernand, jishi, mflusche, nusiddiq, pmannidi, ralongi, rkhan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1928012 Environment:
Last Closed: 2021-03-15 14:34:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1928012    
Bug Blocks:    

Comment 2 Numan Siddique 2021-02-22 07:28:06 UTC
Submitted the patch for review - https://patchwork.ozlabs.org/project/ovn/patch/20210221113424.234801-1-numans@ovn.org/

Comment 5 Jianlin Shi 2021-02-24 02:07:36 UTC
tested with following script:

systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:20.0.173.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=20.0.173.25
systemctl restart ovn-controller

ovs-vsctl \
    -- add-port br-int vif1 \
    -- set Interface vif1 type=internal external_ids:iface-id=sw0-p1 \
    ofport-request=1

ovs-vsctl set open . external_ids:ovn-monitor-all=true

ovn-nbctl ls-add sw0
ovn-nbctl pg-add pg1
ovn-nbctl pg-add pg2
ovn-nbctl lsp-add sw0 sw0-p2
ovn-nbctl lsp-set-addresses sw0-p2 "00:00:00:00:00:02 192.168.47.2"
ovn-nbctl lsp-add sw0 sw0-p3
ovn-nbctl lsp-set-addresses sw0-p3 "00:00:00:00:00:03 192.168.47.3"

# Pause ovn-northd. When it is resumed, all the below NB updates
# will be sent in one transaction.

ovn-appctl -t ovn-northd pause

ovn-nbctl lsp-add sw0 sw0-p1
ovn-nbctl lsp-set-addresses sw0-p1 "00:00:00:00:00:01 192.168.47.1"
ovn-nbctl pg-set-ports pg1 sw0-p1 sw0-p2
ovn-nbctl pg-set-ports pg2 sw0-p3
ovn-nbctl acl-add pg1 to-lport 1002 "outport == @pg1 && ip4 && ip4.src == \$pg2_ip4 && udp && udp.dst >= 1 && udp.dst <= 65535" allow-related

# resume ovn-northd now. This should result in a single update message
# from SB ovsdb-server to ovn-controller for all the above NB updates.
ovn-appctl -t ovn-northd resume
sleep 5

ovn-nbctl --wait=hv pg-set-ports pg1 sw0-p1 sw0-p2 sw0-p3

reproduced on 20.12.0-20:

[root@wsfd-advnetlab21 bz1929978]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
ovn2.13-central-20.12.0-20.el8fdp.x86_64
openvswitch2.13-2.13.0-95.el8fdp.x86_64                            
ovn2.13-20.12.0-20.el8fdp.x86_64
ovn2.13-host-20.12.0-20.el8fdp.x86_64

[root@wsfd-advnetlab21 bz1929978]# coredumpctl info                                   
           PID: 189852 (ovn-controller)                                          
           UID: 991 (openvswitch)                                               
           GID: 989 (openvswitch)                                                        
        Signal: 6 (ABRT)                                          
     Timestamp: Tue 2021-02-23 21:03:17 EST (1min 31s ago)  
  Command Line: ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfi>
    Executable: /usr/bin/ovn-controller                       
 Control Group: /system.slice/ovn-controller.service
          Unit: ovn-controller.service       
         Slice: system.slice                             
       Boot ID: 63e0e8f704cf4baeae3173198acebc75                 
    Machine ID: 532695c076ab4d7696e8a30b5934d994                  
      Hostname: wsfd-advnetlab21.anl.lab.eng.bos.redhat.com             
       Storage: /var/lib/systemd/coredump/core.ovn-controller.991.63e0e8f704cf4baeae3173198ac>
       Message: Process 189852 (ovn-controller) of user 991 dumped core.
                                                          
                Stack trace of thread 189852:
                #0  0x00007fb4a630e7ff raise (libc.so.6)
                #1  0x00007fb4a62f8c35 abort (libc.so.6) 
                #2  0x000055e7bd513654 ovs_abort_valist (ovn-controller)
                #3  0x000055e7bd51b444 vlog_abort_valist (ovn-controller)
                #4  0x000055e7bd51b4ea vlog_abort (ovn-controller)     
                #5  0x000055e7bd51336b ovs_assert_failure (ovn-controller)
                #6  0x000055e7bd43fa22 flood_remove_flows_for_sb_uuid (ovn-controller)
                #7  0x000055e7bd43fe42 ofctrl_flood_remove_flows (ovn-controller)
                #8  0x000055e7bd43aaa4 lflow_handle_changed_ref (ovn-controller)
                #9  0x000055e7bd457978 _flow_output_resource_ref_handler (ovn-controller)
                #10 0x000055e7bd470853 engine_run (ovn-controller)
                #11 0x000055e7bd42d26c main (ovn-controller)     
                #12 0x00007fb4a62fa7b3 __libc_start_main (libc.so.6)
                #13 0x000055e7bd42e94e _start (ovn-controller) 

[root@wsfd-advnetlab21 bz1929978]# grep EMER /var/log/ovn/ovn-controller.log 
2021-02-24T02:03:17.122Z|00017|util|EMER|controller/ofctrl.c:1199: assertion ovs_list_is_empty(&f->list_node) failed in flood_remove_flows_for_sb_uuid()

Verified on 20.12.0-23:

[root@wsfd-advnetlab21 bz1929978]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
ovn2.13-host-20.12.0-23.el8fdp.x86_64
ovn2.13-central-20.12.0-23.el8fdp.x86_64
openvswitch2.13-2.13.0-95.el8fdp.x86_64
ovn2.13-20.12.0-23.el8fdp.x86_64

[root@wsfd-advnetlab21 bz1929978]# coredumpctl list
No coredumps found.
[root@wsfd-advnetlab21 bz1929978]# grep EMER /var/log/ovn/ovn-controller.log 

<=== ovn-controller didn't crash

Comment 11 Jianlin Shi 2021-03-08 02:52:17 UTC
also no crash on 20.13.0-24 with reproducer in comment 5.

set VERIFIED

Comment 13 errata-xmlrpc 2021-03-15 14:34:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0839