Bug 1936328
Summary: | ovn-controller crashes when a container port is changed to normal port and then deleted | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Numan Siddique <nusiddiq> |
Component: | ovn2.13 | Assignee: | Numan Siddique <nusiddiq> |
Status: | CLOSED ERRATA | QA Contact: | Ehsan Elahi <eelahi> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | FDP 20.H | CC: | ctrautma, darya, dcbw, dsedgmen, jishi, ovnteam, ralongi, rkhan |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ovn2.13-20.12.0-99.fdp8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-05-20 19:28:16 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Numan Siddique
2021-03-08 08:25:59 UTC
tested with following script: enable_coredump() { ulimit -c unlimited ulimit -s unlimited sysctl -w fs.suid_dumpable=2 if ! sysctl kernel.core_pattern | grep systemd-coredump then sysctl -w kernel.core_pattern="|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" fi rm -rf /var/lib/systemd/coredump/* rm -rf /run/log/journal/* rm -rf /var/log/journal/* systemctl restart systemd-journald } enable_coredump for i in {1..10} do systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:1.1.169.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.169.25 systemctl restart ovn-controller systemctl status ovn-controller ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal ovn-nbctl ls-add ls ovn-nbctl lsp-add ls vm1 ovn-nbctl lsp-add ls vm-cont vm1 1 ovs-vsctl set Interface vm1 external_ids:iface-id=vm1 ovn-appctl -t ovn-controller debug/pause ovn-nbctl clear logical_switch_port vm-cont parent_name ovs-vsctl set Interface vm1 external_ids:iface-id=foo ovn-nbctl lsp-del vm-cont ovn-appctl -t ovn-controller debug/resume if coredumpctl list then break fi systemctl stop ovn-controller &>/dev/null systemctl stop ovn-northd &>/dev/null systemctl stop openvswitch &>/dev/null sleep 1 rm -rf /etc/openvswitch/*.db rm -rf /etc/openvswitch/*.pem rm -rf /var/lib/openvswitch/* rm -rf /var/lib/ovn/* rm -rf /etc/ovn/*.db rm -rf /etc/ovn/*.pem # clean up log rm -rf /var/log/ovn/* rm -rf /var/log/openvswitch/* netns_clean.sh sync done echo $i reproduced on 20.12.0-24; [root@wsfd-advnetlab16 bz1936328]# rpm -qa | grep -E "openvswitch2.13|ovn2.13" ovn2.13-20.12.0-24.el8fdp.x86_64 ovn2.13-host-20.12.0-24.el8fdp.x86_64 openvswitch2.13-2.13.0-95.el8fdp.x86_64 python3-openvswitch2.13-2.13.0-95.el8fdp.x86_64 ovn2.13-central-20.12.0-24.el8fdp.x86_64 + ovn-appctl -t ovn-controller debug/resume + coredumpctl list TIME PID UID GID SIG COREFILE EXE Mon 2021-03-08 05:04:22 EST 123461 992 989 6 present /usr/bin/ovn-controller + break + echo 2 [root@wsfd-advnetlab16 bz1936328]# coredumpctl info [63/1808] PID: 123461 (ovn-controller) UID: 992 (openvswitch) GID: 989 (openvswitch) Signal: 6 (ABRT) Timestamp: Mon 2021-03-08 05:04:22 EST (18s ago) Command Line: ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info > Executable: /usr/bin/ovn-controller Control Group: /system.slice/ovn-controller.service Unit: ovn-controller.service Slice: system.slice Boot ID: 1a50ae1ea3394ba7a53685e24a1bc9b4 Machine ID: 0350fa343ed14eea8d477b906349f017 Hostname: wsfd-advnetlab16.anl.lab.eng.bos.redhat.com Storage: /var/lib/systemd/coredump/core.ovn-controller.992.1a50ae1ea3394ba7a53685e24a1bc9b4.12> Message: Process 123461 (ovn-controller) of user 992 dumped core. Stack trace of thread 123461: #0 0x00007f52086867ff raise (libc.so.6) #1 0x00007f5208670c35 abort (libc.so.6) #2 0x0000563d5a2b59a4 ovs_abort_valist (ovn-controller) #3 0x0000563d5a2bd794 vlog_abort_valist (ovn-controller) #4 0x0000563d5a2bd83a vlog_abort (ovn-controller) #5 0x0000563d5a2b56bb ovs_assert_failure (ovn-controller) #6 0x0000563d5a29bc0a ovsdb_idl_txn_write__ (ovn-controller) #7 0x0000563d5a22c9e0 sbrec_port_binding_set_chassis (ovn-controller) #8 0x0000563d5a1cfe67 release_lport (ovn-controller) #9 0x0000563d5a1d07db release_local_binding_children (ovn-controller) #10 0x0000563d5a1d2904 binding_handle_ovs_interface_changes (ovn-controller) #11 0x0000563d5a1f7a0b runtime_data_ovs_interface_handler (ovn-controller) #12 0x0000563d5a2122b3 engine_run (ovn-controller) #13 0x0000563d5a1cd50b main (ovn-controller) #14 0x00007f52086727b3 __libc_start_main (libc.so.6) #15 0x0000563d5a1cebfe _start (ovn-controller) Stack trace of thread 123465: #0 0x00007f5208740ca1 __poll (libc.so.6) #1 0x0000563d5a2b0de5 time_poll (ovn-controller) #2 0x0000563d5a2a63fc poll_block (ovn-controller) #3 0x0000563d5a2a5578 stopwatch_thread (ovn-controller) #4 0x0000563d5a28f3e3 ovsthread_wrapper (ovn-controller) #5 0x00007f52092d914a start_thread (libpthread.so.0) #6 0x00007f520874bf23 __clone (libc.so.6) Stack trace of thread 123463: #0 0x00007f5208740ca1 __poll (libc.so.6) #1 0x0000563d5a2b0de5 time_poll (ovn-controller) #2 0x0000563d5a2a63fc poll_block (ovn-controller) #3 0x0000563d5a28cdda ovsrcu_postpone_thread (ovn-controller) #4 0x0000563d5a28f3e3 ovsthread_wrapper (ovn-controller) #5 0x00007f52092d914a start_thread (libpthread.so.0) #6 0x00007f520874bf23 __clone (libc.so.6) Stack trace of thread 123462: #0 0x00007f5208740ca1 __poll (libc.so.6) #1 0x0000563d5a2b0de5 time_poll (ovn-controller) #2 0x0000563d5a2a63fc poll_block (ovn-controller) #3 0x0000563d5a1eed16 pinctrl_handler (ovn-controller) #4 0x0000563d5a28f3e3 ovsthread_wrapper (ovn-controller) #5 0x00007f52092d914a start_thread (libpthread.so.0) #6 0x00007f520874bf23 __clone (libc.so.6) Updated patch on review: http://patchwork.ozlabs.org/project/ovn/patch/20210329132159.2005894-1-numans@ovn.org/ Patch committed to upstream OVN, now waiting on downstream backport. Fixed in ovn2.13-20.12.0-99.fdp8 Tested on 3 versions: ovn2.13-20.12.0-97.fdp8, ovn2.13-20.12.0-101.fdp8 and ovn2.13-20.12.0-109.fdp. ovn-controller crashes in version earlier than ovn2.13-20.12.0-99.fdp8 but works fine on later versions. Here's how the bug reproduced: enable_coredump() { ulimit -c unlimited ulimit -s unlimited sysctl -w fs.suid_dumpable=2 if ! sysctl kernel.core_pattern | grep systemd-coredump then sysctl -w kernel.core_pattern="|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" fi rm -rf /var/lib/systemd/coredump/* rm -rf /run/log/journal/* rm -rf /var/log/journal/* systemctl restart systemd-journald } enable_coredump systemctl start openvswitch systemctl start ovn-northd ovn-sbctl set-connection ptcp:6642 ovn-nbctl set-connection ptcp:6641 sleep 5 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:1.1.169.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.169.25 systemctl start ovn-controller sleep 20 systemctl status openvswitch systemctl status ovn-northd systemctl status ovn-controller ovs-vsctl add-port br-int iface1 -- set interface iface1 type=internal ovn-nbctl ls-add ls ovn-nbctl lsp-add ls lsp1 ovn-nbctl lsp-add ls lsp1c lsp1 1 ovs-vsctl set Interface iface1 external_ids:iface-id=lsp1 ovn-appctl -t ovn-controller debug/pause # changed child port to normal ovn-nbctl clear logical_switch_port lsp1c parent_name #bind to an interface ovs-vsctl set Interface iface1 external_ids:iface-id=lsp1c # delete the same port ovn-nbctl lsp-del lsp1c ovn-appctl -t ovn-controller debug/resume if coredumpctl list then break fi #check if there's any error in the ovn-controller.log cat /var/log/ovn/ovn-controller.log|grep ERR" # cleanup everything systemctl stop ovn-controller &>/dev/null systemctl stop ovn-northd &>/dev/null systemctl stop openvswitch &>/dev/null sleep 5 rm -rf /etc/openvswitch/*.db rm -rf /etc/openvswitch/*.pem rm -rf /var/lib/openvswitch/* rm -rf /var/lib/ovn/* rm -rf /etc/ovn/*.db rm -rf /etc/ovn/*.pem # clean up log rm -rf /var/log/ovn/* rm -rf /var/log/openvswitch/* Tested and verified on following versions: ovn2.13-20.12.0-101.el8fdp ovn2.13-20.12.0-104.el8fdp ovn2.13-20.12.0-109.el8fdp ovn-2021-21.03.0-21.el8fdp ovn2.13-20.12.0-104.el7fdp Automated as per comment 9. Test result as below: :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovn_setup hv1' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovs-vsctl add-port br-int iface1 -- set interface iface1 type=internal' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovs-vsctl add-port br-int iface1 -- set interface iface1 type=internal' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovs-vsctl add-port br-int iface2 -- set interface iface2 type=internal' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovs-vsctl add-port br-int iface2 -- set interface iface2 type=internal' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovn-nbctl ls-add ls' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovn-nbctl ls-add ls' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovn-nbctl lsp-add ls lsp1' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovn-nbctl lsp-add ls lsp1' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovn-nbctl lsp-add ls lsp1c lsp1 1' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovn-nbctl lsp-add ls lsp1c lsp1 1' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovs-vsctl set Interface iface1 external_ids:iface-id=lsp1' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovs-vsctl set Interface iface1 external_ids:iface-id=lsp1' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovn-appctl -t ovn-controller debug/pause' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovn-appctl -t ovn-controller debug/pause' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovn-nbctl clear logical_switch_port lsp1c parent_name' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovn-nbctl clear logical_switch_port lsp1c parent_name' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovs-vsctl set Interface iface1 external_ids:iface-id=lsp1c' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovs-vsctl set Interface iface1 external_ids:iface-id=lsp1c' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovn-nbctl lsp-del lsp1c' :: [ 09:04:53 ] :: [ PASS ] :: Command 'ovn-nbctl lsp-del lsp1c' (Expected 0, got 0) :: [ 09:04:53 ] :: [ BEGIN ] :: Running 'ovn-appctl -t ovn-controller debug/resume' :: [ 09:04:54 ] :: [ PASS ] :: Command 'ovn-appctl -t ovn-controller debug/resume' (Expected 0, got 0) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'grep -E -i "Duplicate IP set on switch" /var/log/ovn/ovn-northd.log' :: [ 09:04:54 ] :: [ PASS ] :: Command 'grep -E -i "Duplicate IP set on switch" /var/log/ovn/ovn-northd.log' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'grep -E -i "ovsdb_idl.*transaction error" /var/log/ovn/ovn-northd.log' :: [ 09:04:54 ] :: [ PASS ] :: Command 'grep -E -i "ovsdb_idl.*transaction error" /var/log/ovn/ovn-northd.log' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'grep -E -i "ERR.*crash" /var/log/ovn/ovn-northd.log' :: [ 09:04:54 ] :: [ PASS ] :: Command 'grep -E -i "ERR.*crash" /var/log/ovn/ovn-northd.log' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'grep -E -i "OpenFlow error" /var/log/ovn/ovn-controller.log' :: [ 09:04:54 ] :: [ PASS ] :: Command 'grep -E -i "OpenFlow error" /var/log/ovn/ovn-controller.log' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'grep -E -i "service check: Unsupported protocol" /var/log/ovn/ovn-controller.log' :: [ 09:04:54 ] :: [ PASS ] :: Command 'grep -E -i "service check: Unsupported protocol" /var/log/ovn/ovn-controller.log' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'grep -E -i "Failed to acquire udpif_key corresponding to unexpected flow" /var/log/openvswitch/ovs-vswitchd.log' :: [ 09:04:54 ] :: [ PASS ] :: Command 'grep -E -i "Failed to acquire udpif_key corresponding to unexpected flow" /var/log/openvswitch/ovs-vswitchd.log' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'grep -E -i "integrity violation" /var/log/ovn/ovsdb-server*' :: [ 09:04:54 ] :: [ PASS ] :: Command 'grep -E -i "integrity violation" /var/log/ovn/ovsdb-server*' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'cat /var/log/ovn/ovn-controller.log|grep ERR' :: [ 09:04:54 ] :: [ PASS ] :: Command 'cat /var/log/ovn/ovn-controller.log|grep ERR' (Expected 1, got 1) :: [ 09:04:54 ] :: [ BEGIN ] :: Running 'ovn_cleanup' ovn_cleanup ... ovn_cleanup ... end :: [ 09:04:59 ] :: [ PASS ] :: Command 'ovn_cleanup' (Expected 0, got 0) :: [ 09:04:59 ] :: [ BEGIN ] :: Running 'coredumpctl list' No coredumps found. :: [ 09:04:59 ] :: [ PASS ] :: Command 'coredumpctl list' (Expected 1, got 1) :: [ 09:04:59 ] :: [ BEGIN ] :: Running 'coredumpctl list' No coredumps found. :: [ 09:04:59 ] :: [ PASS ] :: Command 'coredumpctl list' (Expected 0-255, got 1) fs.suid_dumpable = 0 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2080 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |