Bug 1936331
| Summary: | ovn-controller crashes due to use-after-free with a container logical port | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Numan Siddique <nusiddiq> |
| Component: | ovn2.13 | Assignee: | Numan Siddique <nusiddiq> |
| Status: | CLOSED ERRATA | QA Contact: | Jianlin Shi <jishi> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | FDP 20.H | CC: | ctrautma, dcbw, dsedgmen, jishi, ralongi, rkhan |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ovn2.13-20.12.0-99 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-20 19:28:16 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Numan Siddique
2021-03-08 08:34:09 UTC
tested with following script:
enable_coredump()
{
ulimit -c unlimited
ulimit -s unlimited
sysctl -w fs.suid_dumpable=2
if ! sysctl kernel.core_pattern | grep systemd-coredump
then
sysctl -w kernel.core_pattern="|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e"
fi
rm -rf /var/lib/systemd/coredump/*
rm -rf /run/log/journal/*
rm -rf /var/log/journal/*
systemctl restart systemd-journald
}
enable_coredump
for i in {1..10}
do
systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:1.1.169.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.169.25
systemctl restart ovn-controller
systemctl status ovn-controller
ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls vm1
ovn-nbctl lsp-add ls vm-cont vm1 1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovn-nbctl clear logical_switch_port vm-cont parent_name
ovs-vsctl set Interface vm1 external_ids:iface-id=foo
ovn-nbctl lsp-del vm-cont
ovn-nbctl ls-del ls
ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls vm1
ovn-nbctl lsp-add ls vm-cont vm1 1
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovn-appctl -t ovn-controller debug/pause
ovn-nbctl clear logical_switch_port vm-cont parent_name
ovn-nbctl lsp-del vm-cont
ovn-appctl -t ovn-controller debug/resume
ovs-vsctl set Interface vm1 external_ids:iface-id=foo
if coredumpctl list
then
break
fi
systemctl stop ovn-controller &>/dev/null
systemctl stop ovn-northd &>/dev/null
systemctl stop openvswitch &>/dev/null
sleep 1
rm -rf /etc/openvswitch/*.db
rm -rf /etc/openvswitch/*.pem
rm -rf /var/lib/openvswitch/*
rm -rf /var/lib/ovn/*
rm -rf /etc/ovn/*.db
rm -rf /etc/ovn/*.pem
# clean up log
rm -rf /var/log/ovn/*
rm -rf /var/log/openvswitch/*
netns_clean.sh
sync
done
echo $i
coredumpctl list
reproduced on 20.12.0-24:
[root@wsfd-advnetlab16 bz1936331]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
ovn2.13-20.12.0-24.el8fdp.x86_64
ovn2.13-host-20.12.0-24.el8fdp.x86_64
openvswitch2.13-2.13.0-95.el8fdp.x86_64
python3-openvswitch2.13-2.13.0-95.el8fdp.x86_64
ovn2.13-central-20.12.0-24.el8fdp.x86_64
+ ovs-vsctl set Interface vm1 external_ids:iface-id=foo
+ coredumpctl list
TIME PID UID GID SIG COREFILE EXE
Mon 2021-03-08 05:00:45 EST 122255 992 989 6 present /usr/bin/ovn-controller
+ break
+ echo 3
3
[root@wsfd-advnetlab16 bz1936331]# coredumpctl info
PID: 122255 (ovn-controller)
UID: 992 (openvswitch)
GID: 989 (openvswitch)
Signal: 6 (ABRT)
Timestamp: Mon 2021-03-08 05:00:45 EST (1min 44s ago)
Command Line: ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info >
Executable: /usr/bin/ovn-controller
Control Group: /system.slice/ovn-controller.service
Unit: ovn-controller.service
Slice: system.slice
Boot ID: 1a50ae1ea3394ba7a53685e24a1bc9b4
Machine ID: 0350fa343ed14eea8d477b906349f017
Hostname: wsfd-advnetlab16.anl.lab.eng.bos.redhat.com
Storage: /var/lib/systemd/coredump/core.ovn-controller.992.1a50ae1ea3394ba7a53685e24a1bc9b4.12>
Message: Process 122255 (ovn-controller) of user 992 dumped core.
Stack trace of thread 122255:
#0 0x00007f60ff65d7ff raise (libc.so.6)
#1 0x00007f60ff647c35 abort (libc.so.6)
#2 0x000056317026b9a4 ovs_abort_valist (ovn-controller)
#3 0x0000563170273794 vlog_abort_valist (ovn-controller)
#4 0x000056317027383a vlog_abort (ovn-controller)
#5 0x000056317026b6bb ovs_assert_failure (ovn-controller)
#6 0x0000563170251c0a ovsdb_idl_txn_write__ (ovn-controller)
#7 0x00005631701e313d sbrec_port_binding_set_up (ovn-controller)
#8 0x0000563170189eea binding_seqno_install (ovn-controller)
#9 0x0000563170183dc3 main (ovn-controller)
#10 0x00007f60ff6497b3 __libc_start_main (libc.so.6)
#11 0x0000563170184bfe _start (ovn-controller)
Stack trace of thread 122259:
#0 0x00007f60ff717ca1 __poll (libc.so.6)
#1 0x0000563170266de5 time_poll (ovn-controller)
#2 0x000056317025c3fc poll_block (ovn-controller)
#3 0x000056317025b578 stopwatch_thread (ovn-controller)
#4 0x00005631702453e3 ovsthread_wrapper (ovn-controller)
#5 0x00007f61002b014a start_thread (libpthread.so.0)
#6 0x00007f60ff722f23 __clone (libc.so.6)
Stack trace of thread 122256:
#0 0x00007f60ff717ca1 __poll (libc.so.6)
#1 0x0000563170266de5 time_poll (ovn-controller)
#2 0x000056317025c3fc poll_block (ovn-controller)
#3 0x00005631701a4d16 pinctrl_handler (ovn-controller)
#4 0x00005631702453e3 ovsthread_wrapper (ovn-controller)
#5 0x00007f61002b014a start_thread (libpthread.so.0)
#6 0x00007f60ff722f23 __clone (libc.so.6)
Stack trace of thread 122257:
#0 0x00007f60ff717ca1 __poll (libc.so.6)
#1 0x0000563170266de5 time_poll (ovn-controller)
#2 0x000056317025c3fc poll_block (ovn-controller)
#3 0x0000563170242dda ovsrcu_postpone_thread (ovn-controller)
#4 0x00005631702453e3 ovsthread_wrapper (ovn-controller)
#5 0x00007f61002b014a start_thread (libpthread.so.0)
#6 0x00007f60ff722f23 __clone (libc.so.6)
Updated patch on review: http://patchwork.ozlabs.org/project/ovn/patch/20210329132159.2005894-1-numans@ovn.org/ I believe the issue was fixed in ovn2.13-20.12.0-99. Same patch as https://bugzilla.redhat.com/show_bug.cgi?id=1936328 Verified on ovn-2021-21.03.0-21.el8fdp.x86_64: no crash after run reproducer in comment 1. [root@wsfd-advnetlab21 bz1936331]# rpm -qa | grep ovn-2021 ovn-2021-21.03.0-21.el8fdp.x86_64 ovn-2021-host-21.03.0-21.el8fdp.x86_64 ovn-2021-central-21.03.0-21.el8fdp.x86_64 also verified on ovn2.13-20.12.0-104.el8 and ovn2.13-host-20.12.0-104.el7fdp.x86_64: [root@wsfd-advnetlab21 bz1936331]# rpm -qa | grep ovn2.13 ovn2.13-20.12.0-104.el8fdp.x86_64 ovn2.13-central-20.12.0-104.el8fdp.x86_64 ovn2.13-host-20.12.0-104.el8fdp.x86_64 [root@wsfd-advnetlab16 bz1936331]# rpm -qa | grep ovn2.13 ovn2.13-host-20.12.0-104.el7fdp.x86_64 ovn2.13-central-20.12.0-104.el7fdp.x86_64 ovn2.13-20.12.0-104.el7fdp.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2080 |