Description of problem: ovn-nbctl ls-add ls ovn-nbctl lsp-add ls vm1 ovn-nbctl lsp-add ls vm-cont vm1 1 ovs-vsctl set Interface vm1 external_ids:iface-id=vm1 ovn-nbctl clear logical_switch_port vm-cont parent_name ovs-vsctl set Interface vm1 external_ids:iface-id=foo ovn-nbctl lsp-del vm-cont ovn-nbctl ls-del ls ovn-nbctl ls-add ls ovn-nbctl lsp-add ls vm1 ovn-nbctl lsp-add ls vm-cont vm1 1 ovs-vsctl set Interface vm1 external_ids:iface-id=vm1 ovn-appctl -t ovn-controller debug/pause ovn-nbctl clear logical_switch_port vm-cont parent_name ovn-nbctl lsp-del vm-cont ovn-appctl -t ovn-controller debug/resume ovs-vsctl set Interface vm1 external_ids:iface-id=foo Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
tested with following script: enable_coredump() { ulimit -c unlimited ulimit -s unlimited sysctl -w fs.suid_dumpable=2 if ! sysctl kernel.core_pattern | grep systemd-coredump then sysctl -w kernel.core_pattern="|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" fi rm -rf /var/lib/systemd/coredump/* rm -rf /run/log/journal/* rm -rf /var/log/journal/* systemctl restart systemd-journald } enable_coredump for i in {1..10} do systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:1.1.169.25:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=1.1.169.25 systemctl restart ovn-controller systemctl status ovn-controller ovn-nbctl ls-add ls ovn-nbctl lsp-add ls vm1 ovn-nbctl lsp-add ls vm-cont vm1 1 ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal ovs-vsctl set Interface vm1 external_ids:iface-id=vm1 ovn-nbctl clear logical_switch_port vm-cont parent_name ovs-vsctl set Interface vm1 external_ids:iface-id=foo ovn-nbctl lsp-del vm-cont ovn-nbctl ls-del ls ovn-nbctl ls-add ls ovn-nbctl lsp-add ls vm1 ovn-nbctl lsp-add ls vm-cont vm1 1 ovs-vsctl set Interface vm1 external_ids:iface-id=vm1 ovn-appctl -t ovn-controller debug/pause ovn-nbctl clear logical_switch_port vm-cont parent_name ovn-nbctl lsp-del vm-cont ovn-appctl -t ovn-controller debug/resume ovs-vsctl set Interface vm1 external_ids:iface-id=foo if coredumpctl list then break fi systemctl stop ovn-controller &>/dev/null systemctl stop ovn-northd &>/dev/null systemctl stop openvswitch &>/dev/null sleep 1 rm -rf /etc/openvswitch/*.db rm -rf /etc/openvswitch/*.pem rm -rf /var/lib/openvswitch/* rm -rf /var/lib/ovn/* rm -rf /etc/ovn/*.db rm -rf /etc/ovn/*.pem # clean up log rm -rf /var/log/ovn/* rm -rf /var/log/openvswitch/* netns_clean.sh sync done echo $i coredumpctl list reproduced on 20.12.0-24: [root@wsfd-advnetlab16 bz1936331]# rpm -qa | grep -E "openvswitch2.13|ovn2.13" ovn2.13-20.12.0-24.el8fdp.x86_64 ovn2.13-host-20.12.0-24.el8fdp.x86_64 openvswitch2.13-2.13.0-95.el8fdp.x86_64 python3-openvswitch2.13-2.13.0-95.el8fdp.x86_64 ovn2.13-central-20.12.0-24.el8fdp.x86_64 + ovs-vsctl set Interface vm1 external_ids:iface-id=foo + coredumpctl list TIME PID UID GID SIG COREFILE EXE Mon 2021-03-08 05:00:45 EST 122255 992 989 6 present /usr/bin/ovn-controller + break + echo 3 3 [root@wsfd-advnetlab16 bz1936331]# coredumpctl info PID: 122255 (ovn-controller) UID: 992 (openvswitch) GID: 989 (openvswitch) Signal: 6 (ABRT) Timestamp: Mon 2021-03-08 05:00:45 EST (1min 44s ago) Command Line: ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info > Executable: /usr/bin/ovn-controller Control Group: /system.slice/ovn-controller.service Unit: ovn-controller.service Slice: system.slice Boot ID: 1a50ae1ea3394ba7a53685e24a1bc9b4 Machine ID: 0350fa343ed14eea8d477b906349f017 Hostname: wsfd-advnetlab16.anl.lab.eng.bos.redhat.com Storage: /var/lib/systemd/coredump/core.ovn-controller.992.1a50ae1ea3394ba7a53685e24a1bc9b4.12> Message: Process 122255 (ovn-controller) of user 992 dumped core. Stack trace of thread 122255: #0 0x00007f60ff65d7ff raise (libc.so.6) #1 0x00007f60ff647c35 abort (libc.so.6) #2 0x000056317026b9a4 ovs_abort_valist (ovn-controller) #3 0x0000563170273794 vlog_abort_valist (ovn-controller) #4 0x000056317027383a vlog_abort (ovn-controller) #5 0x000056317026b6bb ovs_assert_failure (ovn-controller) #6 0x0000563170251c0a ovsdb_idl_txn_write__ (ovn-controller) #7 0x00005631701e313d sbrec_port_binding_set_up (ovn-controller) #8 0x0000563170189eea binding_seqno_install (ovn-controller) #9 0x0000563170183dc3 main (ovn-controller) #10 0x00007f60ff6497b3 __libc_start_main (libc.so.6) #11 0x0000563170184bfe _start (ovn-controller) Stack trace of thread 122259: #0 0x00007f60ff717ca1 __poll (libc.so.6) #1 0x0000563170266de5 time_poll (ovn-controller) #2 0x000056317025c3fc poll_block (ovn-controller) #3 0x000056317025b578 stopwatch_thread (ovn-controller) #4 0x00005631702453e3 ovsthread_wrapper (ovn-controller) #5 0x00007f61002b014a start_thread (libpthread.so.0) #6 0x00007f60ff722f23 __clone (libc.so.6) Stack trace of thread 122256: #0 0x00007f60ff717ca1 __poll (libc.so.6) #1 0x0000563170266de5 time_poll (ovn-controller) #2 0x000056317025c3fc poll_block (ovn-controller) #3 0x00005631701a4d16 pinctrl_handler (ovn-controller) #4 0x00005631702453e3 ovsthread_wrapper (ovn-controller) #5 0x00007f61002b014a start_thread (libpthread.so.0) #6 0x00007f60ff722f23 __clone (libc.so.6) Stack trace of thread 122257: #0 0x00007f60ff717ca1 __poll (libc.so.6) #1 0x0000563170266de5 time_poll (ovn-controller) #2 0x000056317025c3fc poll_block (ovn-controller) #3 0x0000563170242dda ovsrcu_postpone_thread (ovn-controller) #4 0x00005631702453e3 ovsthread_wrapper (ovn-controller) #5 0x00007f61002b014a start_thread (libpthread.so.0) #6 0x00007f60ff722f23 __clone (libc.so.6)
Updated patch on review: http://patchwork.ozlabs.org/project/ovn/patch/20210329132159.2005894-1-numans@ovn.org/
I believe the issue was fixed in ovn2.13-20.12.0-99. Same patch as https://bugzilla.redhat.com/show_bug.cgi?id=1936328
Verified on ovn-2021-21.03.0-21.el8fdp.x86_64: no crash after run reproducer in comment 1. [root@wsfd-advnetlab21 bz1936331]# rpm -qa | grep ovn-2021 ovn-2021-21.03.0-21.el8fdp.x86_64 ovn-2021-host-21.03.0-21.el8fdp.x86_64 ovn-2021-central-21.03.0-21.el8fdp.x86_64
also verified on ovn2.13-20.12.0-104.el8 and ovn2.13-host-20.12.0-104.el7fdp.x86_64: [root@wsfd-advnetlab21 bz1936331]# rpm -qa | grep ovn2.13 ovn2.13-20.12.0-104.el8fdp.x86_64 ovn2.13-central-20.12.0-104.el8fdp.x86_64 ovn2.13-host-20.12.0-104.el8fdp.x86_64 [root@wsfd-advnetlab16 bz1936331]# rpm -qa | grep ovn2.13 ovn2.13-host-20.12.0-104.el7fdp.x86_64 ovn2.13-central-20.12.0-104.el7fdp.x86_64 ovn2.13-20.12.0-104.el7fdp.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2080