Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1936331

Summary: ovn-controller crashes due to use-after-free with a container logical port
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Numan Siddique <nusiddiq>
Component: ovn2.13Assignee: Numan Siddique <nusiddiq>
Status: CLOSED ERRATA QA Contact: Jianlin Shi <jishi>
Severity: urgent Docs Contact:
Priority: urgent    
Version: FDP 20.HCC: ctrautma, dcbw, dsedgmen, jishi, ralongi, rkhan
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn2.13-20.12.0-99 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-20 19:28:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Numan Siddique 2021-03-08 08:34:09 UTC
Description of problem:

ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls vm1
ovn-nbctl lsp-add ls vm-cont vm1 1
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovn-nbctl clear logical_switch_port vm-cont parent_name
ovs-vsctl set Interface vm1 external_ids:iface-id=foo
ovn-nbctl lsp-del vm-cont
ovn-nbctl ls-del ls
ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls vm1
ovn-nbctl lsp-add ls vm-cont vm1 1
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovn-appctl -t ovn-controller debug/pause
ovn-nbctl clear logical_switch_port vm-cont parent_name
ovn-nbctl lsp-del vm-cont
ovn-appctl -t ovn-controller debug/resume
ovs-vsctl set Interface vm1 external_ids:iface-id=foo
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jianlin Shi 2021-03-08 10:05:38 UTC
tested with following script:

enable_coredump()                                               
{                                                    
        ulimit -c unlimited                            
        ulimit -s unlimited                          
        sysctl -w fs.suid_dumpable=2
        if ! sysctl kernel.core_pattern | grep systemd-coredump
        then        
                sysctl -w kernel.core_pattern="|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e"
        fi                        
        rm -rf /var/lib/systemd/coredump/*           
        rm -rf /run/log/journal/*       
        rm -rf /var/log/journal/*                      
        systemctl restart systemd-journald
}                                        
enable_coredump                                      

for i in {1..10}   
do  
systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641       
ovn-sbctl set-connection ptcp:6642           
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:1.1.169.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.169.25
systemctl restart ovn-controller
                                    
systemctl status ovn-controller      
                                     
ovn-nbctl ls-add ls          
ovn-nbctl lsp-add ls vm1    
ovn-nbctl lsp-add ls vm-cont vm1 1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovn-nbctl clear logical_switch_port vm-cont parent_name
ovs-vsctl set Interface vm1 external_ids:iface-id=foo
ovn-nbctl lsp-del vm-cont
ovn-nbctl ls-del ls
ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls vm1
ovn-nbctl lsp-add ls vm-cont vm1 1
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovn-appctl -t ovn-controller debug/pause
ovn-nbctl clear logical_switch_port vm-cont parent_name
ovn-nbctl lsp-del vm-cont
ovn-appctl -t ovn-controller debug/resume
ovs-vsctl set Interface vm1 external_ids:iface-id=foo

if coredumpctl list
then
        break
fi
systemctl stop ovn-controller &>/dev/null
        systemctl stop ovn-northd &>/dev/null
        systemctl stop openvswitch &>/dev/null
        sleep 1
        rm -rf /etc/openvswitch/*.db
        rm -rf /etc/openvswitch/*.pem
        rm -rf /var/lib/openvswitch/*
        rm -rf /var/lib/ovn/*
        rm -rf /etc/ovn/*.db
        rm -rf /etc/ovn/*.pem
        # clean up log
        rm -rf /var/log/ovn/*
        rm -rf /var/log/openvswitch/*
        netns_clean.sh
        sync
done    
echo $i 
coredumpctl list

reproduced on 20.12.0-24:

[root@wsfd-advnetlab16 bz1936331]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
ovn2.13-20.12.0-24.el8fdp.x86_64
ovn2.13-host-20.12.0-24.el8fdp.x86_64
openvswitch2.13-2.13.0-95.el8fdp.x86_64
python3-openvswitch2.13-2.13.0-95.el8fdp.x86_64
ovn2.13-central-20.12.0-24.el8fdp.x86_64

+ ovs-vsctl set Interface vm1 external_ids:iface-id=foo
+ coredumpctl list        
TIME                            PID   UID   GID SIG COREFILE  EXE
Mon 2021-03-08 05:00:45 EST  122255   992   989   6 present   /usr/bin/ovn-controller
+ break                                                                                                                                                            
+ echo 3                        
3

[root@wsfd-advnetlab16 bz1936331]# coredumpctl info
           PID: 122255 (ovn-controller)
           UID: 992 (openvswitch)
           GID: 989 (openvswitch)
        Signal: 6 (ABRT)
     Timestamp: Mon 2021-03-08 05:00:45 EST (1min 44s ago)
  Command Line: ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info >
    Executable: /usr/bin/ovn-controller
 Control Group: /system.slice/ovn-controller.service
          Unit: ovn-controller.service
         Slice: system.slice
       Boot ID: 1a50ae1ea3394ba7a53685e24a1bc9b4
    Machine ID: 0350fa343ed14eea8d477b906349f017
      Hostname: wsfd-advnetlab16.anl.lab.eng.bos.redhat.com
       Storage: /var/lib/systemd/coredump/core.ovn-controller.992.1a50ae1ea3394ba7a53685e24a1bc9b4.12>
       Message: Process 122255 (ovn-controller) of user 992 dumped core.
                
                Stack trace of thread 122255:
                #0  0x00007f60ff65d7ff raise (libc.so.6)
                #1  0x00007f60ff647c35 abort (libc.so.6)
                #2  0x000056317026b9a4 ovs_abort_valist (ovn-controller)
                #3  0x0000563170273794 vlog_abort_valist (ovn-controller)
                #4  0x000056317027383a vlog_abort (ovn-controller)
                #5  0x000056317026b6bb ovs_assert_failure (ovn-controller)
                #6  0x0000563170251c0a ovsdb_idl_txn_write__ (ovn-controller)
                #7  0x00005631701e313d sbrec_port_binding_set_up (ovn-controller)
                #8  0x0000563170189eea binding_seqno_install (ovn-controller)
                #9  0x0000563170183dc3 main (ovn-controller)
                #10 0x00007f60ff6497b3 __libc_start_main (libc.so.6)
                #11 0x0000563170184bfe _start (ovn-controller)
                
                Stack trace of thread 122259:
                #0  0x00007f60ff717ca1 __poll (libc.so.6)
                #1  0x0000563170266de5 time_poll (ovn-controller)
                #2  0x000056317025c3fc poll_block (ovn-controller)
                #3  0x000056317025b578 stopwatch_thread (ovn-controller)
                #4  0x00005631702453e3 ovsthread_wrapper (ovn-controller)
                #5  0x00007f61002b014a start_thread (libpthread.so.0)
                #6  0x00007f60ff722f23 __clone (libc.so.6)
                
                Stack trace of thread 122256:
                #0  0x00007f60ff717ca1 __poll (libc.so.6)
                #1  0x0000563170266de5 time_poll (ovn-controller)
                #2  0x000056317025c3fc poll_block (ovn-controller)
                #3  0x00005631701a4d16 pinctrl_handler (ovn-controller)
                #4  0x00005631702453e3 ovsthread_wrapper (ovn-controller)
                #5  0x00007f61002b014a start_thread (libpthread.so.0)
                #6  0x00007f60ff722f23 __clone (libc.so.6)
                
                Stack trace of thread 122257:
                #0  0x00007f60ff717ca1 __poll (libc.so.6)
                #1  0x0000563170266de5 time_poll (ovn-controller)
                #2  0x000056317025c3fc poll_block (ovn-controller)
                #3  0x0000563170242dda ovsrcu_postpone_thread (ovn-controller)
                #4  0x00005631702453e3 ovsthread_wrapper (ovn-controller)
                #5  0x00007f61002b014a start_thread (libpthread.so.0)
                #6  0x00007f60ff722f23 __clone (libc.so.6)

Comment 3 Dan Williams 2021-03-30 20:54:27 UTC
Updated patch on review: http://patchwork.ozlabs.org/project/ovn/patch/20210329132159.2005894-1-numans@ovn.org/

Comment 5 Dan Williams 2021-04-13 20:50:09 UTC
I believe the issue was fixed in ovn2.13-20.12.0-99. Same patch as https://bugzilla.redhat.com/show_bug.cgi?id=1936328

Comment 8 Jianlin Shi 2021-04-26 00:49:56 UTC
Verified on ovn-2021-21.03.0-21.el8fdp.x86_64: no crash after run reproducer in comment 1.

[root@wsfd-advnetlab21 bz1936331]# rpm -qa | grep ovn-2021
ovn-2021-21.03.0-21.el8fdp.x86_64
ovn-2021-host-21.03.0-21.el8fdp.x86_64
ovn-2021-central-21.03.0-21.el8fdp.x86_64

Comment 9 Jianlin Shi 2021-04-26 00:55:21 UTC
also verified on ovn2.13-20.12.0-104.el8 and ovn2.13-host-20.12.0-104.el7fdp.x86_64:

[root@wsfd-advnetlab21 bz1936331]# rpm -qa | grep ovn2.13
ovn2.13-20.12.0-104.el8fdp.x86_64
ovn2.13-central-20.12.0-104.el8fdp.x86_64
ovn2.13-host-20.12.0-104.el8fdp.x86_64

[root@wsfd-advnetlab16 bz1936331]# rpm -qa | grep ovn2.13
ovn2.13-host-20.12.0-104.el7fdp.x86_64
ovn2.13-central-20.12.0-104.el7fdp.x86_64
ovn2.13-20.12.0-104.el7fdp.x86_64

Comment 11 errata-xmlrpc 2021-05-20 19:28:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2080