Bug 1936328
| Summary: | ovn-controller crashes when a container port is changed to normal port and then deleted | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Numan Siddique <nusiddiq> |
| Component: | ovn2.13 | Assignee: | Numan Siddique <nusiddiq> |
| Status: | CLOSED ERRATA | QA Contact: | Ehsan Elahi <eelahi> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | FDP 20.H | CC: | ctrautma, darya, dcbw, dsedgmen, jishi, ovnteam, ralongi, rkhan |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ovn2.13-20.12.0-99.fdp8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-20 19:28:16 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Numan Siddique
2021-03-08 08:25:59 UTC
tested with following script:
enable_coredump()
{
ulimit -c unlimited
ulimit -s unlimited
sysctl -w fs.suid_dumpable=2
if ! sysctl kernel.core_pattern | grep systemd-coredump
then
sysctl -w kernel.core_pattern="|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e"
fi
rm -rf /var/lib/systemd/coredump/*
rm -rf /run/log/journal/*
rm -rf /var/log/journal/*
systemctl restart systemd-journald
}
enable_coredump
for i in {1..10}
do
systemctl start openvswitch
systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:1.1.169.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.169.25
systemctl restart ovn-controller
systemctl status ovn-controller
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls vm1
ovn-nbctl lsp-add ls vm-cont vm1 1
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ovn-appctl -t ovn-controller debug/pause
ovn-nbctl clear logical_switch_port vm-cont parent_name
ovs-vsctl set Interface vm1 external_ids:iface-id=foo
ovn-nbctl lsp-del vm-cont
ovn-appctl -t ovn-controller debug/resume
if coredumpctl list
then
break
fi
systemctl stop ovn-controller &>/dev/null
systemctl stop ovn-northd &>/dev/null
systemctl stop openvswitch &>/dev/null
sleep 1
rm -rf /etc/openvswitch/*.db
rm -rf /etc/openvswitch/*.pem
rm -rf /var/lib/openvswitch/*
rm -rf /var/lib/ovn/*
rm -rf /etc/ovn/*.db
rm -rf /etc/ovn/*.pem
# clean up log
rm -rf /var/log/ovn/*
rm -rf /var/log/openvswitch/*
netns_clean.sh
sync
done
echo $i
reproduced on 20.12.0-24;
[root@wsfd-advnetlab16 bz1936328]# rpm -qa | grep -E "openvswitch2.13|ovn2.13"
ovn2.13-20.12.0-24.el8fdp.x86_64
ovn2.13-host-20.12.0-24.el8fdp.x86_64
openvswitch2.13-2.13.0-95.el8fdp.x86_64
python3-openvswitch2.13-2.13.0-95.el8fdp.x86_64
ovn2.13-central-20.12.0-24.el8fdp.x86_64
+ ovn-appctl -t ovn-controller debug/resume
+ coredumpctl list
TIME PID UID GID SIG COREFILE EXE
Mon 2021-03-08 05:04:22 EST 123461 992 989 6 present /usr/bin/ovn-controller
+ break
+ echo 2
[root@wsfd-advnetlab16 bz1936328]# coredumpctl info [63/1808]
PID: 123461 (ovn-controller)
UID: 992 (openvswitch)
GID: 989 (openvswitch)
Signal: 6 (ABRT)
Timestamp: Mon 2021-03-08 05:04:22 EST (18s ago)
Command Line: ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info >
Executable: /usr/bin/ovn-controller
Control Group: /system.slice/ovn-controller.service
Unit: ovn-controller.service
Slice: system.slice
Boot ID: 1a50ae1ea3394ba7a53685e24a1bc9b4
Machine ID: 0350fa343ed14eea8d477b906349f017
Hostname: wsfd-advnetlab16.anl.lab.eng.bos.redhat.com
Storage: /var/lib/systemd/coredump/core.ovn-controller.992.1a50ae1ea3394ba7a53685e24a1bc9b4.12>
Message: Process 123461 (ovn-controller) of user 992 dumped core.
Stack trace of thread 123461:
#0 0x00007f52086867ff raise (libc.so.6)
#1 0x00007f5208670c35 abort (libc.so.6)
#2 0x0000563d5a2b59a4 ovs_abort_valist (ovn-controller)
#3 0x0000563d5a2bd794 vlog_abort_valist (ovn-controller)
#4 0x0000563d5a2bd83a vlog_abort (ovn-controller)
#5 0x0000563d5a2b56bb ovs_assert_failure (ovn-controller)
#6 0x0000563d5a29bc0a ovsdb_idl_txn_write__ (ovn-controller)
#7 0x0000563d5a22c9e0 sbrec_port_binding_set_chassis (ovn-controller)
#8 0x0000563d5a1cfe67 release_lport (ovn-controller)
#9 0x0000563d5a1d07db release_local_binding_children (ovn-controller)
#10 0x0000563d5a1d2904 binding_handle_ovs_interface_changes (ovn-controller)
#11 0x0000563d5a1f7a0b runtime_data_ovs_interface_handler (ovn-controller)
#12 0x0000563d5a2122b3 engine_run (ovn-controller)
#13 0x0000563d5a1cd50b main (ovn-controller)
#14 0x00007f52086727b3 __libc_start_main (libc.so.6)
#15 0x0000563d5a1cebfe _start (ovn-controller)
Stack trace of thread 123465:
#0 0x00007f5208740ca1 __poll (libc.so.6)
#1 0x0000563d5a2b0de5 time_poll (ovn-controller)
#2 0x0000563d5a2a63fc poll_block (ovn-controller)
#3 0x0000563d5a2a5578 stopwatch_thread (ovn-controller)
#4 0x0000563d5a28f3e3 ovsthread_wrapper (ovn-controller)
#5 0x00007f52092d914a start_thread (libpthread.so.0)
#6 0x00007f520874bf23 __clone (libc.so.6)
Stack trace of thread 123463:
#0 0x00007f5208740ca1 __poll (libc.so.6)
#1 0x0000563d5a2b0de5 time_poll (ovn-controller)
#2 0x0000563d5a2a63fc poll_block (ovn-controller)
#3 0x0000563d5a28cdda ovsrcu_postpone_thread (ovn-controller)
#4 0x0000563d5a28f3e3 ovsthread_wrapper (ovn-controller)
#5 0x00007f52092d914a start_thread (libpthread.so.0)
#6 0x00007f520874bf23 __clone (libc.so.6)
Stack trace of thread 123462:
#0 0x00007f5208740ca1 __poll (libc.so.6)
#1 0x0000563d5a2b0de5 time_poll (ovn-controller)
#2 0x0000563d5a2a63fc poll_block (ovn-controller)
#3 0x0000563d5a1eed16 pinctrl_handler (ovn-controller)
#4 0x0000563d5a28f3e3 ovsthread_wrapper (ovn-controller)
#5 0x00007f52092d914a start_thread (libpthread.so.0)
#6 0x00007f520874bf23 __clone (libc.so.6)
Updated patch on review: http://patchwork.ozlabs.org/project/ovn/patch/20210329132159.2005894-1-numans@ovn.org/ Patch committed to upstream OVN, now waiting on downstream backport. Fixed in ovn2.13-20.12.0-99.fdp8 Tested on 3 versions: ovn2.13-20.12.0-97.fdp8, ovn2.13-20.12.0-101.fdp8 and ovn2.13-20.12.0-109.fdp.
ovn-controller crashes in version earlier than ovn2.13-20.12.0-99.fdp8 but works fine on later versions. Here's how the bug reproduced:
enable_coredump()
{
ulimit -c unlimited
ulimit -s unlimited
sysctl -w fs.suid_dumpable=2
if ! sysctl kernel.core_pattern | grep systemd-coredump
then
sysctl -w kernel.core_pattern="|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e"
fi
rm -rf /var/lib/systemd/coredump/*
rm -rf /run/log/journal/*
rm -rf /var/log/journal/*
systemctl restart systemd-journald
}
enable_coredump
systemctl start openvswitch
systemctl start ovn-northd
ovn-sbctl set-connection ptcp:6642
ovn-nbctl set-connection ptcp:6641
sleep 5
ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:1.1.169.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.169.25
systemctl start ovn-controller
sleep 20
systemctl status openvswitch
systemctl status ovn-northd
systemctl status ovn-controller
ovs-vsctl add-port br-int iface1 -- set interface iface1 type=internal
ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls lsp1
ovn-nbctl lsp-add ls lsp1c lsp1 1
ovs-vsctl set Interface iface1 external_ids:iface-id=lsp1
ovn-appctl -t ovn-controller debug/pause
# changed child port to normal
ovn-nbctl clear logical_switch_port lsp1c parent_name
#bind to an interface
ovs-vsctl set Interface iface1 external_ids:iface-id=lsp1c
# delete the same port
ovn-nbctl lsp-del lsp1c
ovn-appctl -t ovn-controller debug/resume
if coredumpctl list
then
break
fi
#check if there's any error in the ovn-controller.log
cat /var/log/ovn/ovn-controller.log|grep ERR"
# cleanup everything
systemctl stop ovn-controller &>/dev/null
systemctl stop ovn-northd &>/dev/null
systemctl stop openvswitch &>/dev/null
sleep 5
rm -rf /etc/openvswitch/*.db
rm -rf /etc/openvswitch/*.pem
rm -rf /var/lib/openvswitch/*
rm -rf /var/lib/ovn/*
rm -rf /etc/ovn/*.db
rm -rf /etc/ovn/*.pem
# clean up log
rm -rf /var/log/ovn/*
rm -rf /var/log/openvswitch/*
Tested and verified on following versions: ovn2.13-20.12.0-101.el8fdp ovn2.13-20.12.0-104.el8fdp ovn2.13-20.12.0-109.el8fdp ovn-2021-21.03.0-21.el8fdp ovn2.13-20.12.0-104.el7fdp Automated as per comment 9. Test result as below: :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovn_setup hv1' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovs-vsctl add-port br-int iface1 -- set interface iface1 type=internal' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovs-vsctl add-port br-int iface1 -- set interface iface1 type=internal' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovs-vsctl add-port br-int iface2 -- set interface iface2 type=internal' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovs-vsctl add-port br-int iface2 -- set interface iface2 type=internal' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovn-nbctl ls-add ls' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovn-nbctl ls-add ls' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovn-nbctl lsp-add ls lsp1' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovn-nbctl lsp-add ls lsp1' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovn-nbctl lsp-add ls lsp1c lsp1 1' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovn-nbctl lsp-add ls lsp1c lsp1 1' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovs-vsctl set Interface iface1 external_ids:iface-id=lsp1' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovs-vsctl set Interface iface1 external_ids:iface-id=lsp1' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovn-appctl -t ovn-controller debug/pause' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovn-appctl -t ovn-controller debug/pause' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovn-nbctl clear logical_switch_port lsp1c parent_name' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovn-nbctl clear logical_switch_port lsp1c parent_name' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovs-vsctl set Interface iface1 external_ids:iface-id=lsp1c' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovs-vsctl set Interface iface1 external_ids:iface-id=lsp1c' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovn-nbctl lsp-del lsp1c' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovn-nbctl lsp-del lsp1c' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovn-appctl -t ovn-controller debug/resume' :: [ 09:04:54 ] :: [ PASS ] :: Command 'ovn-appctl -t ovn-controller debug/resume' (Expected 0, got 0) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'grep -E -i "Duplicate IP set on switch" /var/log/ovn/ovn-northd.log' :: [ 09:04:54 ] :: [ PASS ] :: Command 'grep -E -i "Duplicate IP set on switch" /var/log/ovn/ovn-northd.log' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'grep -E -i "ovsdb_idl.*transaction error" /var/log/ovn/ovn-northd.log' :: [ 09:04:54 ] :: [ PASS ] :: Command 'grep -E -i "ovsdb_idl.*transaction error" /var/log/ovn/ovn-northd.log' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'grep -E -i "ERR.*crash" /var/log/ovn/ovn-northd.log' :: [ 09:04:54 ] :: [ PASS ] :: Command 'grep -E -i "ERR.*crash" /var/log/ovn/ovn-northd.log' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'grep -E -i "OpenFlow error" /var/log/ovn/ovn-controller.log' :: [ 09:04:54 ] :: [ PASS ] :: Command 'grep -E -i "OpenFlow error" /var/log/ovn/ovn-controller.log' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'grep -E -i "service check: Unsupported protocol" /var/log/ovn/ovn-controller.log' :: [ 09:04:54 ] :: [ PASS ] :: Command 'grep -E -i "service check: Unsupported protocol" /var/log/ovn/ovn-controller.log' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'grep -E -i "Failed to acquire udpif_key corresponding to unexpected flow" /var/log/openvswitch/ovs-vswitchd.log' :: [ 09:04:54 ] :: [ PASS ] :: Command 'grep -E -i "Failed to acquire udpif_key corresponding to unexpected flow" /var/log/openvswitch/ovs-vswitchd.log' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'grep -E -i "integrity violation" /var/log/ovn/ovsdb-server*' :: [ 09:04:54 ] :: [ PASS ] :: Command 'grep -E -i "integrity violation" /var/log/ovn/ovsdb-server*' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'cat /var/log/ovn/ovn-controller.log|grep ERR' :: [ 09:04:54 ] :: [ PASS ] :: Command 'cat /var/log/ovn/ovn-controller.log|grep ERR' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'ovn_cleanup' ovn_cleanup ... ovn_cleanup ... end :: [ 09:04:59 ] :: [ PASS ] :: Command 'ovn_cleanup' (Expected 0, got 0) :: [ 09:04:59 ] :: [ BEGIN ] :: Running 'coredumpctl list' No coredumps found. :: [ 09:04:59 ] :: [ PASS ] :: Command 'coredumpctl list' (Expected 1, got 1) :: [ 09:04:59 ] :: [ BEGIN ] :: Running 'coredumpctl list' No coredumps found. :: [ 09:04:59 ] :: [ PASS ] :: Command 'coredumpctl list' (Expected 0-255, got 1) fs.suid_dumpable = 0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2080 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |