Description of problem: ovs-vswitchd sigterms and has to be restarted often : (gdb) bt #0 rte_mempool_get_priv (mp=0x9e094c001003) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1653 #1 rte_pktmbuf_priv_size (mp=0x9e094c001003) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:987 #2 rte_pktmbuf_detach (m=0x7fdb53508d00) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:1180 #3 0x000055cd7a90fc18 in __rte_pktmbuf_prefree_seg (m=0x7fdb53508d00) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:1204 #4 ixgbe_tx_free_bufs (txq=0x7fe19294d140) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h:146 #5 ixgbe_xmit_pkts_vec (tx_queue=0x7fe19294d140, tx_pkts=0x7fe845ffa3b0, nb_pkts=<optimized out>) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:555 #6 0x000055cd7aa2fedf in rte_eth_tx_burst (nb_pkts=<optimized out>, tx_pkts=0x7fe845ffa3b0, queue_id=8, port_id=<optimized out>) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2819 #7 netdev_dpdk_eth_tx_burst (cnt=1, pkts=0x7fe845ffa3b0, qid=8, dev=0x7fe192a26700) at lib/netdev-dpdk.c:1215 #8 netdev_dpdk_send__ (concurrent_txq=<optimized out>, may_steal=<optimized out>, batch=<optimized out>, qid=8, dev=<optimized out>) at lib/netdev-dpdk.c:1701 #9 netdev_dpdk_eth_send (netdev=<optimized out>, qid=<optimized out>, batch=<optimized out>, may_steal=<optimized out>, concurrent_txq=false) at lib/netdev-dpdk.c:1725 #10 0x000055cd7a995cd2 in netdev_send (netdev=<optimized out>, qid=qid@entry=8, batch=batch@entry=0x7fe845ffa3a8, may_steal=may_steal@entry=true, concurrent_txq=concurrent_txq@entry=false) at lib/netdev.c:718 #11 0x000055cd7a975a3f in dp_execute_cb (aux_=aux_@entry=0x7fe845ffa750, packets_=packets_@entry=0x7fe845ffa3a8, a=a@entry=0x7fe840006f6c, may_steal=<optimized out>) at lib/dpif-netdev.c:4470 #12 0x000055cd7a99d68e in odp_execute_actions (dp=dp@entry=0x7fe845ffa750, batch=batch@entry=0x7fe845ffa3a8, steal=steal@entry=true, actions=<optimized out>, actions_len=<optimized out>, dp_execute_action=dp_execute_action@entry=0x55cd7a9757f0 <dp_execute_cb>) at lib/odp-execute.c:538 #13 0x000055cd7a9745f2 in dp_netdev_execute_actions (now=23068681, actions_len=<optimized out>, actions=<optimized out>, flow=0x7fe840018e80, may_steal=true, packets=<optimized out>, pmd=0x55cd7cab0be0) at lib/dpif-netdev.c:4671 #14 packet_batch_per_flow_execute (now=23068681, pmd=<optimized out>, batch=0x7fe845ffa398) at lib/dpif-netdev.c:3981 #15 dp_netdev_input__ (pmd=pmd@entry=0x55cd7cab0be0, packets=packets@entry=0x7fe845ffa7b0, md_is_valid=md_is_valid@entry=false, port_no=<optimized out>) at lib/dpif-netdev.c:4274 #16 0x000055cd7a9749fd in dp_netdev_input (port_no=<optimized out>, packets=0x7fe845ffa7b0, pmd=0x55cd7cab0be0) at lib/dpif-netdev.c:4283 #17 dp_netdev_process_rxq_port (pmd=pmd@entry=0x55cd7cab0be0, rxq=<optimized out>, port=<optimized out>, port=<optimized out>) at lib/dpif-netdev.c:2871 #18 0x000055cd7a974c9f in pmd_thread_main (f_=0x55cd7cab0be0) at lib/dpif-netdev.c:3133 #19 0x000055cd7a9e1d56 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:342 #20 0x00007fe8e021be25 in start_thread (arg=0x7fe845ffb700) at pthread_create.c:308 #21 0x00007fe8df61d34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Version-Release number of selected component (if applicable): How reproducible: Often Steps to Reproduce: 1. In this environment, it happens by itself 2. 3. Actual results: ovs-vswitchd segfaults Expected results: Shouldn't segfault Additional info: [dhill@collab-shell overcloud-compute-0]$ cat installed-rpms | grep openvs openstack-neutron-openvswitch-9.4.1-5.el7ost.noarch Wed Nov 8 18:14:35 2017 openvswitch-2.6.1-16.git20161206.el7ost.x86_64 Wed Nov 8 17:56:35 2017 openvswitch-ovn-central-2.6.1-16.git20161206.el7ost.x86_64 Wed Nov 8 18:14:35 2017 openvswitch-ovn-common-2.6.1-16.git20161206.el7ost.x86_64 Wed Nov 8 17:56:50 2017 openvswitch-ovn-host-2.6.1-16.git20161206.el7ost.x86_64 Wed Nov 8 18:14:34 2017 python-openvswitch-2.6.1-16.git20161206.el7ost.noarch Wed Nov 8 17:50:40 2017 [dhill@collab-shell overcloud-compute-0]$ cat installed-rpms | grep -i kernel erlang-kernel-18.3.4.5-3.el7ost.1.x86_64 Wed Nov 8 18:02:08 2017 kernel-3.10.0-693.5.2.el7.x86_64 Wed Nov 8 17:16:53 2017 kernel-devel-3.10.0-693.5.2.el7.x86_64 Wed Nov 8 18:21:11 2017 kernel-headers-3.10.0-693.5.2.el7.x86_64 Wed Nov 8 18:20:56 2017 kernel-tools-3.10.0-693.5.2.el7.x86_64 Wed Nov 8 17:17:03 2017 kernel-tools-libs-3.10.0-693.5.2.el7.x86_64 Wed Nov 8 17:15:25 2017 [dhill@collab-shell overcloud-compute-0]$ cat installed-rpms | grep -i dpdk dpdk-16.11.2-4.el7.x86_64 Wed Nov 8 18:14:36 2017
The issue seems to occur when VMs start on the second NUMA node. It doesn't happen every time, but it happens enough so that the second NUMA node is unusable in our environment.
It's not crashing at the smae place: #0 rte_mempool_default_cache (mp=<optimized out>, mp=<optimized out>, lcore_id=<optimized out>) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1017 #1 rte_mempool_put_bulk (n=1, obj_table=0x7f8c3dff96e8, mp=0x0) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1174 #2 rte_mempool_put (obj=0x7f7f54be7f00, mp=0x0) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1227 #3 ixgbe_tx_free_bufs (txq=0x7f859294d140) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h:148 #4 ixgbe_xmit_pkts_vec (tx_queue=0x7f859294d140, tx_pkts=0x7f8c3dffa3b0, nb_pkts=<optimized out>) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:555 #5 0x000055b69a8b2edf in rte_eth_tx_burst (nb_pkts=<optimized out>, tx_pkts=0x7f8c3dffa3b0, queue_id=8, port_id=<optimized out>) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2819 #6 netdev_dpdk_eth_tx_burst (cnt=1, pkts=0x7f8c3dffa3b0, qid=8, dev=0x7f8592a26700) at lib/netdev-dpdk.c:1215 #7 netdev_dpdk_send__ (concurrent_txq=<optimized out>, may_steal=<optimized out>, batch=<optimized out>, qid=8, dev=<optimized out>) at lib/netdev-dpdk.c:1701 #8 netdev_dpdk_eth_send (netdev=<optimized out>, qid=<optimized out>, batch=<optimized out>, may_steal=<optimized out>, concurrent_txq=false) at lib/netdev-dpdk.c:1725 #9 0x000055b69a818cd2 in netdev_send (netdev=<optimized out>, qid=qid@entry=8, batch=batch@entry=0x7f8c3dffa3a8, may_steal=may_steal@entry=true, concurrent_txq=concurrent_txq@entry=false) at lib/netdev.c:718 #10 0x000055b69a7f8a3f in dp_execute_cb (aux_=aux_@entry=0x7f8c3dffa750, packets_=packets_@entry=0x7f8c3dffa3a8, a=a@entry=0x7f8c28004c0c, may_steal=<optimized out>) at lib/dpif-netdev.c:4470 #11 0x000055b69a82068e in odp_execute_actions (dp=dp@entry=0x7f8c3dffa750, batch=batch@entry=0x7f8c3dffa3a8, steal=steal@entry=true, actions=<optimized out>, actions_len=<optimized out>, dp_execute_action=dp_execute_action@entry=0x55b69a7f87f0 <dp_execute_cb>) at lib/odp-execute.c:538 #12 0x000055b69a7f75f2 in dp_netdev_execute_actions (now=52435103, actions_len=<optimized out>, actions=<optimized out>, flow=0x7f8c28003800, may_steal=true, packets=<optimized out>, pmd=0x55b69d463a50) at lib/dpif-netdev.c:4671 #13 packet_batch_per_flow_execute (now=52435103, pmd=<optimized out>, batch=0x7f8c3dffa398) at lib/dpif-netdev.c:3981 #14 dp_netdev_input__ (pmd=pmd@entry=0x55b69d463a50, packets=packets@entry=0x7f8c3dffa7b0, md_is_valid=md_is_valid@entry=false, port_no=<optimized out>) at lib/dpif-netdev.c:4274 #15 0x000055b69a7f79fd in dp_netdev_input (port_no=<optimized out>, packets=0x7f8c3dffa7b0, pmd=0x55b69d463a50) at lib/dpif-netdev.c:4283 #16 dp_netdev_process_rxq_port (pmd=pmd@entry=0x55b69d463a50, rxq=<optimized out>, port=<optimized out>, port=<optimized out>) at lib/dpif-netdev.c:2871 #17 0x000055b69a7f7c9f in pmd_thread_main (f_=0x55b69d463a50) at lib/dpif-netdev.c:3133 #18 0x000055b69a864d56 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:342 #19 0x00007f8cd7cfae25 in start_thread (arg=0x7f8c3dffb700) at pthread_create.c:308 #20 0x00007f8cd70fc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Verified on puddle 2018-06-18.1 with the following Be aware to TripleO nova cpu quote when verifying [root@compute-0 ~]# rpm -qa | grep openvswitch-2.9 openvswitch-2.9.0-19.el7fdp.1.x86_64 python-openvswitch-2.9.0-19.el7fdp.1.noarch [root@compute-0 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2102
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days