The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1936331 - ovn-controller crashes due to use-after-free with a container logical port
Summary: ovn-controller crashes due to use-after-free with a container logical port
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn2.13
Version: FDP 20.H
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Numan Siddique
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-08 08:34 UTC by Numan Siddique
Modified: 2021-05-26 07:20 UTC (History)
6 users (show)

Fixed In Version: ovn2.13-20.12.0-99
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-20 19:28:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:2080 0 None None None 2021-05-20 19:28:27 UTC

Description Numan Siddique 2021-03-08 08:34:09 UTC
Description of problem:

ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls vm1
ovn-nbctl lsp-add ls vm-cont vm1 1
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovn-nbctl clear logical_switch_port vm-cont parent_name
ovs-vsctl set Interface vm1 external_ids:iface-id=foo
ovn-nbctl lsp-del vm-cont
ovn-nbctl ls-del ls
ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls vm1
ovn-nbctl lsp-add ls vm-cont vm1 1
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovn-appctl -t ovn-controller debug/pause
ovn-nbctl clear logical_switch_port vm-cont parent_name
ovn-nbctl lsp-del vm-cont
ovn-appctl -t ovn-controller debug/resume
ovs-vsctl set Interface vm1 external_ids:iface-id=foo
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jianlin Shi 2021-03-08 10:05:38 UTC
tested with following script:

enable_coredump()                                               
{                                                    
        ulimit -c unlimited                            
        ulimit -s unlimited                          
        sysctl -w fs.suid_dumpable=2
        if ! sysctl kernel.core_pattern | grep systemd-coredump
        then        
                sysctl -w kernel.core_pattern="|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e"
        fi                        
        rm -rf /var/lib/systemd/coredump/*           
        rm -rf /run/log/journal/*       
        rm -rf /var/log/journal/*                      
        systemctl restart systemd-journald
}                                        
enable_coredump                                      

for i in {1..10}   
do  
systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641       
ovn-sbctl set-connection ptcp:6642           
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:1.1.169.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.169.25
systemctl restart ovn-controller
                                    
systemctl status ovn-controller      
                                     
ovn-nbctl ls-add ls          
ovn-nbctl lsp-add ls vm1    
ovn-nbctl lsp-add ls vm-cont vm1 1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovn-nbctl clear logical_switch_port vm-cont parent_name
ovs-vsctl set Interface vm1 external_ids:iface-id=foo
ovn-nbctl lsp-del vm-cont
ovn-nbctl ls-del ls
ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls vm1
ovn-nbctl lsp-add ls vm-cont vm1 1
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovn-appctl -t ovn-controller debug/pause
ovn-nbctl clear logical_switch_port vm-cont parent_name
ovn-nbctl lsp-del vm-cont
ovn-appctl -t ovn-controller debug/resume
ovs-vsctl set Interface vm1 external_ids:iface-id=foo

if coredumpctl list
then
        break
fi
systemctl stop ovn-controller &>/dev/null
        systemctl stop ovn-northd &>/dev/null
        systemctl stop openvswitch &>/dev/null
        sleep 1
        rm -rf /etc/openvswitch/*.db
        rm -rf /etc/openvswitch/*.pem
        rm -rf /var/lib/openvswitch/*
        rm -rf /var/lib/ovn/*
        rm -rf /etc/ovn/*.db
        rm -rf /etc/ovn/*.pem
        # clean up log
        rm -rf /var/log/ovn/*
        rm -rf /var/log/openvswitch/*
        netns_clean.sh
        sync
done    
echo $i 
coredumpctl list

reproduced on 20.12.0-24:

[root@wsfd-advnetlab16 bz1936331]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
ovn2.13-20.12.0-24.el8fdp.x86_64
ovn2.13-host-20.12.0-24.el8fdp.x86_64
openvswitch2.13-2.13.0-95.el8fdp.x86_64
python3-openvswitch2.13-2.13.0-95.el8fdp.x86_64
ovn2.13-central-20.12.0-24.el8fdp.x86_64

+ ovs-vsctl set Interface vm1 external_ids:iface-id=foo
+ coredumpctl list        
TIME                            PID   UID   GID SIG COREFILE  EXE
Mon 2021-03-08 05:00:45 EST  122255   992   989   6 present   /usr/bin/ovn-controller
+ break                                                                                                                                                            
+ echo 3                        
3

[root@wsfd-advnetlab16 bz1936331]# coredumpctl info
           PID: 122255 (ovn-controller)
           UID: 992 (openvswitch)
           GID: 989 (openvswitch)
        Signal: 6 (ABRT)
     Timestamp: Mon 2021-03-08 05:00:45 EST (1min 44s ago)
  Command Line: ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info >
    Executable: /usr/bin/ovn-controller
 Control Group: /system.slice/ovn-controller.service
          Unit: ovn-controller.service
         Slice: system.slice
       Boot ID: 1a50ae1ea3394ba7a53685e24a1bc9b4
    Machine ID: 0350fa343ed14eea8d477b906349f017
      Hostname: wsfd-advnetlab16.anl.lab.eng.bos.redhat.com
       Storage: /var/lib/systemd/coredump/core.ovn-controller.992.1a50ae1ea3394ba7a53685e24a1bc9b4.12>
       Message: Process 122255 (ovn-controller) of user 992 dumped core.
                
                Stack trace of thread 122255:
                #0  0x00007f60ff65d7ff raise (libc.so.6)
                #1  0x00007f60ff647c35 abort (libc.so.6)
                #2  0x000056317026b9a4 ovs_abort_valist (ovn-controller)
                #3  0x0000563170273794 vlog_abort_valist (ovn-controller)
                #4  0x000056317027383a vlog_abort (ovn-controller)
                #5  0x000056317026b6bb ovs_assert_failure (ovn-controller)
                #6  0x0000563170251c0a ovsdb_idl_txn_write__ (ovn-controller)
                #7  0x00005631701e313d sbrec_port_binding_set_up (ovn-controller)
                #8  0x0000563170189eea binding_seqno_install (ovn-controller)
                #9  0x0000563170183dc3 main (ovn-controller)
                #10 0x00007f60ff6497b3 __libc_start_main (libc.so.6)
                #11 0x0000563170184bfe _start (ovn-controller)
                
                Stack trace of thread 122259:
                #0  0x00007f60ff717ca1 __poll (libc.so.6)
                #1  0x0000563170266de5 time_poll (ovn-controller)
                #2  0x000056317025c3fc poll_block (ovn-controller)
                #3  0x000056317025b578 stopwatch_thread (ovn-controller)
                #4  0x00005631702453e3 ovsthread_wrapper (ovn-controller)
                #5  0x00007f61002b014a start_thread (libpthread.so.0)
                #6  0x00007f60ff722f23 __clone (libc.so.6)
                
                Stack trace of thread 122256:
                #0  0x00007f60ff717ca1 __poll (libc.so.6)
                #1  0x0000563170266de5 time_poll (ovn-controller)
                #2  0x000056317025c3fc poll_block (ovn-controller)
                #3  0x00005631701a4d16 pinctrl_handler (ovn-controller)
                #4  0x00005631702453e3 ovsthread_wrapper (ovn-controller)
                #5  0x00007f61002b014a start_thread (libpthread.so.0)
                #6  0x00007f60ff722f23 __clone (libc.so.6)
                
                Stack trace of thread 122257:
                #0  0x00007f60ff717ca1 __poll (libc.so.6)
                #1  0x0000563170266de5 time_poll (ovn-controller)
                #2  0x000056317025c3fc poll_block (ovn-controller)
                #3  0x0000563170242dda ovsrcu_postpone_thread (ovn-controller)
                #4  0x00005631702453e3 ovsthread_wrapper (ovn-controller)
                #5  0x00007f61002b014a start_thread (libpthread.so.0)
                #6  0x00007f60ff722f23 __clone (libc.so.6)

Comment 3 Dan Williams 2021-03-30 20:54:27 UTC
Updated patch on review: http://patchwork.ozlabs.org/project/ovn/patch/20210329132159.2005894-1-numans@ovn.org/

Comment 5 Dan Williams 2021-04-13 20:50:09 UTC
I believe the issue was fixed in ovn2.13-20.12.0-99. Same patch as https://bugzilla.redhat.com/show_bug.cgi?id=1936328

Comment 8 Jianlin Shi 2021-04-26 00:49:56 UTC
Verified on ovn-2021-21.03.0-21.el8fdp.x86_64: no crash after run reproducer in comment 1.

[root@wsfd-advnetlab21 bz1936331]# rpm -qa | grep ovn-2021
ovn-2021-21.03.0-21.el8fdp.x86_64
ovn-2021-host-21.03.0-21.el8fdp.x86_64
ovn-2021-central-21.03.0-21.el8fdp.x86_64

Comment 9 Jianlin Shi 2021-04-26 00:55:21 UTC
also verified on ovn2.13-20.12.0-104.el8 and ovn2.13-host-20.12.0-104.el7fdp.x86_64:

[root@wsfd-advnetlab21 bz1936331]# rpm -qa | grep ovn2.13
ovn2.13-20.12.0-104.el8fdp.x86_64
ovn2.13-central-20.12.0-104.el8fdp.x86_64
ovn2.13-host-20.12.0-104.el8fdp.x86_64

[root@wsfd-advnetlab16 bz1936331]# rpm -qa | grep ovn2.13
ovn2.13-host-20.12.0-104.el7fdp.x86_64
ovn2.13-central-20.12.0-104.el7fdp.x86_64
ovn2.13-20.12.0-104.el7fdp.x86_64

Comment 11 errata-xmlrpc 2021-05-20 19:28:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2080


Note You need to log in before you can comment on or make changes to this bug.