Bug 1551682 - ovs-vswitchd segfaults after any instance is spawned on numa 1 socket
Summary: ovs-vswitchd segfaults after any instance is spawned on numa 1 socket
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: async
: 10.0 (Newton)
Assignee: Kevin Traynor
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-05 17:12 UTC by David Hill
Modified: 2023-09-15 01:26 UTC (History)
22 users (show)

Fixed In Version: openvswitch-2.6.1-30.git20180130.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-27 23:33:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker NFV-2352 0 None None None 2021-12-10 15:55:56 UTC
Red Hat Issue Tracker OSP-11353 0 None None None 2021-12-10 15:56:05 UTC
Red Hat Knowledge Base (Solution) 3374491 0 None None None 2018-03-07 14:24:15 UTC
Red Hat Product Errata RHSA-2018:2102 0 None None None 2018-06-27 23:35:37 UTC

Description David Hill 2018-03-05 17:12:18 UTC
Description of problem:

ovs-vswitchd sigterms and has to be restarted often :

(gdb) bt
#0  rte_mempool_get_priv (mp=0x9e094c001003) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1653
#1  rte_pktmbuf_priv_size (mp=0x9e094c001003) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:987
#2  rte_pktmbuf_detach (m=0x7fdb53508d00) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:1180
#3  0x000055cd7a90fc18 in __rte_pktmbuf_prefree_seg (m=0x7fdb53508d00) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:1204
#4  ixgbe_tx_free_bufs (txq=0x7fe19294d140) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h:146
#5  ixgbe_xmit_pkts_vec (tx_queue=0x7fe19294d140, tx_pkts=0x7fe845ffa3b0, nb_pkts=<optimized out>) at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:555
#6  0x000055cd7aa2fedf in rte_eth_tx_burst (nb_pkts=<optimized out>, tx_pkts=0x7fe845ffa3b0, queue_id=8, port_id=<optimized out>)
    at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2819
#7  netdev_dpdk_eth_tx_burst (cnt=1, pkts=0x7fe845ffa3b0, qid=8, dev=0x7fe192a26700) at lib/netdev-dpdk.c:1215
#8  netdev_dpdk_send__ (concurrent_txq=<optimized out>, may_steal=<optimized out>, batch=<optimized out>, qid=8, dev=<optimized out>) at lib/netdev-dpdk.c:1701
#9  netdev_dpdk_eth_send (netdev=<optimized out>, qid=<optimized out>, batch=<optimized out>, may_steal=<optimized out>, concurrent_txq=false) at lib/netdev-dpdk.c:1725
#10 0x000055cd7a995cd2 in netdev_send (netdev=<optimized out>, qid=qid@entry=8, batch=batch@entry=0x7fe845ffa3a8, may_steal=may_steal@entry=true, concurrent_txq=concurrent_txq@entry=false)
    at lib/netdev.c:718
#11 0x000055cd7a975a3f in dp_execute_cb (aux_=aux_@entry=0x7fe845ffa750, packets_=packets_@entry=0x7fe845ffa3a8, a=a@entry=0x7fe840006f6c, may_steal=<optimized out>)
    at lib/dpif-netdev.c:4470
#12 0x000055cd7a99d68e in odp_execute_actions (dp=dp@entry=0x7fe845ffa750, batch=batch@entry=0x7fe845ffa3a8, steal=steal@entry=true, actions=<optimized out>, actions_len=<optimized out>, 
    dp_execute_action=dp_execute_action@entry=0x55cd7a9757f0 <dp_execute_cb>) at lib/odp-execute.c:538
#13 0x000055cd7a9745f2 in dp_netdev_execute_actions (now=23068681, actions_len=<optimized out>, actions=<optimized out>, flow=0x7fe840018e80, may_steal=true, packets=<optimized out>, 
    pmd=0x55cd7cab0be0) at lib/dpif-netdev.c:4671
#14 packet_batch_per_flow_execute (now=23068681, pmd=<optimized out>, batch=0x7fe845ffa398) at lib/dpif-netdev.c:3981
#15 dp_netdev_input__ (pmd=pmd@entry=0x55cd7cab0be0, packets=packets@entry=0x7fe845ffa7b0, md_is_valid=md_is_valid@entry=false, port_no=<optimized out>) at lib/dpif-netdev.c:4274
#16 0x000055cd7a9749fd in dp_netdev_input (port_no=<optimized out>, packets=0x7fe845ffa7b0, pmd=0x55cd7cab0be0) at lib/dpif-netdev.c:4283
#17 dp_netdev_process_rxq_port (pmd=pmd@entry=0x55cd7cab0be0, rxq=<optimized out>, port=<optimized out>, port=<optimized out>) at lib/dpif-netdev.c:2871
#18 0x000055cd7a974c9f in pmd_thread_main (f_=0x55cd7cab0be0) at lib/dpif-netdev.c:3133
#19 0x000055cd7a9e1d56 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:342
#20 0x00007fe8e021be25 in start_thread (arg=0x7fe845ffb700) at pthread_create.c:308
#21 0x00007fe8df61d34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Version-Release number of selected component (if applicable):


How reproducible:
Often

Steps to Reproduce:
1. In this environment, it happens by itself
2.
3.

Actual results:
ovs-vswitchd segfaults

Expected results:
Shouldn't segfault

Additional info:
[dhill@collab-shell overcloud-compute-0]$ cat installed-rpms  | grep openvs
openstack-neutron-openvswitch-9.4.1-5.el7ost.noarch         Wed Nov  8 18:14:35 2017
openvswitch-2.6.1-16.git20161206.el7ost.x86_64              Wed Nov  8 17:56:35 2017
openvswitch-ovn-central-2.6.1-16.git20161206.el7ost.x86_64  Wed Nov  8 18:14:35 2017
openvswitch-ovn-common-2.6.1-16.git20161206.el7ost.x86_64   Wed Nov  8 17:56:50 2017
openvswitch-ovn-host-2.6.1-16.git20161206.el7ost.x86_64     Wed Nov  8 18:14:34 2017
python-openvswitch-2.6.1-16.git20161206.el7ost.noarch       Wed Nov  8 17:50:40 2017
[dhill@collab-shell overcloud-compute-0]$ cat installed-rpms  | grep -i kernel
erlang-kernel-18.3.4.5-3.el7ost.1.x86_64                    Wed Nov  8 18:02:08 2017
kernel-3.10.0-693.5.2.el7.x86_64                            Wed Nov  8 17:16:53 2017
kernel-devel-3.10.0-693.5.2.el7.x86_64                      Wed Nov  8 18:21:11 2017
kernel-headers-3.10.0-693.5.2.el7.x86_64                    Wed Nov  8 18:20:56 2017
kernel-tools-3.10.0-693.5.2.el7.x86_64                      Wed Nov  8 17:17:03 2017
kernel-tools-libs-3.10.0-693.5.2.el7.x86_64                 Wed Nov  8 17:15:25 2017
[dhill@collab-shell overcloud-compute-0]$ cat installed-rpms  | grep -i dpdk
dpdk-16.11.2-4.el7.x86_64                                   Wed Nov  8 18:14:36 2017

Comment 3 colleen.o.malley 2018-03-07 08:11:32 UTC
The issue seems to occur when VMs start on the second NUMA node. It doesn't happen every time, but it happens enough so that the second NUMA node is unusable in our environment.

Comment 6 David Hill 2018-03-07 14:26:18 UTC
It's not crashing at the smae place:

#0  rte_mempool_default_cache (mp=<optimized out>, mp=<optimized out>, lcore_id=<optimized out>)
    at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1017
#1  rte_mempool_put_bulk (n=1, obj_table=0x7f8c3dff96e8, mp=0x0)
    at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1174
#2  rte_mempool_put (obj=0x7f7f54be7f00, mp=0x0)
    at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1227
#3  ixgbe_tx_free_bufs (txq=0x7f859294d140)
    at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/drivers/net/ixgbe/ixgbe_rxtx_vec_common.h:148
#4  ixgbe_xmit_pkts_vec (tx_queue=0x7f859294d140, tx_pkts=0x7f8c3dffa3b0, nb_pkts=<optimized out>)
    at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c:555
#5  0x000055b69a8b2edf in rte_eth_tx_burst (nb_pkts=<optimized out>, tx_pkts=0x7f8c3dffa3b0, 
    queue_id=8, port_id=<optimized out>)
    at /usr/src/debug/openvswitch-2.6.1/dpdk-16.11/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:2819
#6  netdev_dpdk_eth_tx_burst (cnt=1, pkts=0x7f8c3dffa3b0, qid=8, dev=0x7f8592a26700)
    at lib/netdev-dpdk.c:1215
#7  netdev_dpdk_send__ (concurrent_txq=<optimized out>, may_steal=<optimized out>, 
    batch=<optimized out>, qid=8, dev=<optimized out>) at lib/netdev-dpdk.c:1701
#8  netdev_dpdk_eth_send (netdev=<optimized out>, qid=<optimized out>, batch=<optimized out>, 
    may_steal=<optimized out>, concurrent_txq=false) at lib/netdev-dpdk.c:1725
#9  0x000055b69a818cd2 in netdev_send (netdev=<optimized out>, qid=qid@entry=8, 
    batch=batch@entry=0x7f8c3dffa3a8, may_steal=may_steal@entry=true, 
    concurrent_txq=concurrent_txq@entry=false) at lib/netdev.c:718
#10 0x000055b69a7f8a3f in dp_execute_cb (aux_=aux_@entry=0x7f8c3dffa750, 
    packets_=packets_@entry=0x7f8c3dffa3a8, a=a@entry=0x7f8c28004c0c, may_steal=<optimized out>)
    at lib/dpif-netdev.c:4470
#11 0x000055b69a82068e in odp_execute_actions (dp=dp@entry=0x7f8c3dffa750, 
    batch=batch@entry=0x7f8c3dffa3a8, steal=steal@entry=true, actions=<optimized out>, 
    actions_len=<optimized out>, 
    dp_execute_action=dp_execute_action@entry=0x55b69a7f87f0 <dp_execute_cb>) at lib/odp-execute.c:538
#12 0x000055b69a7f75f2 in dp_netdev_execute_actions (now=52435103, actions_len=<optimized out>, 
    actions=<optimized out>, flow=0x7f8c28003800, may_steal=true, packets=<optimized out>, 
    pmd=0x55b69d463a50) at lib/dpif-netdev.c:4671
#13 packet_batch_per_flow_execute (now=52435103, pmd=<optimized out>, batch=0x7f8c3dffa398)
    at lib/dpif-netdev.c:3981
#14 dp_netdev_input__ (pmd=pmd@entry=0x55b69d463a50, packets=packets@entry=0x7f8c3dffa7b0, 
    md_is_valid=md_is_valid@entry=false, port_no=<optimized out>) at lib/dpif-netdev.c:4274
#15 0x000055b69a7f79fd in dp_netdev_input (port_no=<optimized out>, packets=0x7f8c3dffa7b0, 
    pmd=0x55b69d463a50) at lib/dpif-netdev.c:4283
#16 dp_netdev_process_rxq_port (pmd=pmd@entry=0x55b69d463a50, rxq=<optimized out>, 
    port=<optimized out>, port=<optimized out>) at lib/dpif-netdev.c:2871
#17 0x000055b69a7f7c9f in pmd_thread_main (f_=0x55b69d463a50) at lib/dpif-netdev.c:3133
#18 0x000055b69a864d56 in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:342
#19 0x00007f8cd7cfae25 in start_thread (arg=0x7f8c3dffb700) at pthread_create.c:308
#20 0x00007f8cd70fc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Comment 36 Yariv 2018-06-24 08:33:16 UTC
Verified on puddle 2018-06-18.1
with the following 

Be aware to TripleO nova cpu quote when verifying
[root@compute-0 ~]# rpm -qa | grep openvswitch-2.9
openvswitch-2.9.0-19.el7fdp.1.x86_64
python-openvswitch-2.9.0-19.el7fdp.1.noarch
[root@compute-0 ~]#

Comment 40 errata-xmlrpc 2018-06-27 23:33:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2102

Comment 47 Red Hat Bugzilla 2023-09-15 01:26:55 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.