Bug 1491898 - In PVP testing, dpdk's testpmd will "Segmentation fault" after booting VM
Summary: In PVP testing, dpdk's testpmd will "Segmentation fault" after booting VM
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: dpdk
Version: 7.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Kevin Traynor
QA Contact: Jean-Tsung Hsiao
URL:
Whiteboard:
Keywords: Extras, Regression
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-15 01:45 UTC by Pei Zhang
Modified: 2018-01-30 16:48 UTC (History)
9 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2018-01-30 16:48:19 UTC


Attachments (Terms of Use)
XML file of VM (3.07 KB, text/html)
2017-09-15 01:45 UTC, Pei Zhang
no flags Details

Description Pei Zhang 2017-09-15 01:45:09 UTC
Created attachment 1326261 [details]
XML file of VM

Description of problem:
This is PVP testing. Boot dpdk's testpmd in host and acts as vhost-user client mode, then start VM, the testpmd will core dump.

Version-Release number of selected component (if applicable):
dpdk-17.05-3.el7fdb.x86_64
qemu-kvm-rhev-2.9.0-16.el7.x86_64
3.10.0-693.el7.x86_64
libvirt-3.7.0-2.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. In host, boot testpmd as vhost-user client mode
# testpmd -l 19,17,15 --socket-mem=1024,1024 -n 4 \
--vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1' -- \
--portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 \
--nb-cores=2 --forward-mode=io

2. Boot VM as vhost-user server mode, full xml file please refer to attachment.
    <interface type='vhostuser'>
      <mac address='38:88:da:5f:dd:01'/>
      <source type='unix' path='/tmp/vhost-user1' mode='server'/>
      <model type='virtio'/>
      <driver name='vhost'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

3. testpmd core dump

testpmd> PMD: ixgbe_dev_link_status_print(): Port 0: Link Up - speed 10000 Mbps - full-duplex
PMD: ixgbe_dev_link_status_print(): Port 0: Link Up - speed 10000 Mbps - full-duplex

Port 0: LSC event
VHOST_CONFIG: /tmp/vhost-user1: connected
VHOST_CONFIG: new device, handle is 0
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_GET_PROTOCOL_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_PROTOCOL_FEATURES
VHOST_CONFIG: read message VHOST_USER_GET_QUEUE_NUM
VHOST_CONFIG: read message VHOST_USER_SET_OWNER
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:26
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 1 to qp idx: 0
PMD: vring0 is enabled

Port 1: Queue state event
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 1 to qp idx: 1
PMD: vring1 is enabled

Port 1: Queue state event
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 1 to qp idx: 0
PMD: vring0 is enabled

Port 1: Queue state event
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 1 to qp idx: 1
PMD: vring1 is enabled

Port 1: Queue state event
VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
VHOST_CONFIG: guest memory region 0, size: 0x40000000
	 guest physical addr: 0x100000000
	 guest virtual  addr: 0x7fb5c0000000
	 host  virtual  addr: 0x2aab80000000
	 mmap addr : 0x2aaac0000000
	 mmap size : 0x100000000
	 mmap align: 0x40000000
	 mmap off  : 0xc0000000
VHOST_CONFIG: guest memory region 1, size: 0xa0000
	 guest physical addr: 0x0
	 guest virtual  addr: 0x7fb500000000
	 host  virtual  addr: 0x2aabc0000000
	 mmap addr : 0x2aabc0000000
	 mmap size : 0x40000000
	 mmap align: 0x40000000
	 mmap off  : 0x0
VHOST_CONFIG: guest memory region 2, size: 0xbff40000
	 guest physical addr: 0xc0000
	 guest virtual  addr: 0x7fb5000c0000
	 host  virtual  addr: 0x2aac000c0000
	 mmap addr : 0x2aac00000000
	 mmap size : 0xc0000000
	 mmap align: 0x40000000
	 mmap off  : 0xc0000
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
VHOST_CONFIG: reallocate vq from 0 to 1 node
VHOST_CONFIG: reallocate dev from 0 to 1 node
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:0 file:30
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:31
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
VHOST_CONFIG: reallocate vq from 0 to 1 node
pvp_client.sh: line 4: 166463 Segmentation fault      testpmd -l 19,17,15 --socket-mem=1024,1024 -n 4 --vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1' -- --portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 --nb-cores=2 --forward-mode=io


Actual results:
testpmd core dump

Expected results:
testpmd should work well.

Additional info:
1. This is a regression issue.
dpdk-16.11-4.el7fdp.x86_64    works well

Comment 3 Aaron Conole 2017-09-15 13:34:32 UTC
Can you modify your PVP script to execute 'ulimit -c unlimited' at the very beginning.
Then please attach the core file which is created.

Comment 4 Pei Zhang 2017-09-18 02:13:59 UTC
(In reply to Aaron Conole from comment #3)
> Can you modify your PVP script to execute 'ulimit -c unlimited' at the very
> beginning.
> Then please attach the core file which is created.

Hi Aaron,

core file please refer to:

http://fileshare.englab.nay.redhat.com/pub/section2/coredump/var/crash/pezhang/bug1491898/


How I start the testpmd and execute 'ulimit -c unlimited':
# cat pvp_client.sh 
ulimit -c unlimited
testpmd -l 19,17,15 --socket-mem=1024,1024 -n 4 \
--vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1' -- \
--portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 \
--nb-cores=2 --forward-mode=io

# sh pvp_client.sh 


Best Regards,
Pei

Comment 5 Aaron Conole 2017-09-18 13:48:49 UTC
Thanks for posting the core file.  5 minutes glance:

stack trace:

..
> Program terminated with signal 11, Segmentation fault.
> #0  0x00007ff3200c44c2 in qva_to_vva (qva=<optimized out>, dev=<optimized out>)
>     at /usr/src/debug/dpdk-17.05/lib/librte_vhost/vhost_user.c:315
> 315		for (i = 0; i < dev->mem->nregions; i++) {
> (gdb) bt
> Thread 1 (Thread 0x7ff31764c700 (LWP 8778)):
> ---Type <return> to continue, or q <return> to quit---
> #0  0x00007ff3200c44c2 in qva_to_vva (qva=<optimized out>, dev=<optimized out>)
>     at /usr/src/debug/dpdk-17.05/lib/librte_vhost/vhost_user.c:315
> #1  vhost_user_set_vring_addr (addr=0x7ff31764bb6c, dev=<optimized out>)
>     at /usr/src/debug/dpdk-17.05/lib/librte_vhost/vhost_user.c:356
> #2  vhost_user_msg_handler (vid=<optimized out>, fd=fd@entry=23)
>     at /usr/src/debug/dpdk-17.05/lib/librte_vhost/vhost_user.c:1040
> #3  0x00007ff3200c1dff in vhost_user_read_cb (connfd=23, dat=0x7ff3080008c0, 
>     remove=0x7ff31764bd50)
>     at /usr/src/debug/dpdk-17.05/lib/librte_vhost/socket.c:274
> #4  0x00007ff3200c18f0 in fdset_event_dispatch (
>     arg=0x7ff3202ce1a0 <vhost_user+8192>)
>     at /usr/src/debug/dpdk-17.05/lib/librte_vhost/fd_man.c:273
> #5  0x00007ff31e165e25 in start_thread () from /lib64/libpthread.so.0
> #6  0x00007ff31de9334d in clone () from /lib64/libc.so.6

Looking at dev->mem (x/i $rip, p/x $rdi), it appears it's 0 (null).

This looks like it's going to be a bug in dpdk's virtio driver, but which side is not clear yet.

Comment 6 Aaron Conole 2017-09-18 13:57:18 UTC
Also, what version of the OvS package are you running in the host?

Comment 7 Pei Zhang 2017-09-18 14:54:05 UTC
(In reply to Aaron Conole from comment #6)
> Also, what version of the OvS package are you running in the host?

I didn't running ovs, only running dpdk in host.

Comment 8 Kevin Traynor 2017-09-19 10:40:22 UTC
There was some known bugs related to vhost memory reallocation and multi numa in DPDK 17.05.0. I suspect that you might be hitting it. They were fixed in DPDK 17.05.1. 

Can you confirm you are running on a multi-numa system?

as a workaround I think you can avoid reallocation with DPDK 17.05.0 by running testpmd lcores, setting emulatorpin and vcpus in libvirt to all use the same numa node.

Comment 12 Kevin Traynor 2017-10-25 16:57:42 UTC
Hi Pei,

I reproduced the issue with dpdk-17.05-3. There was an unrelated build problem with the local test rpm I built, so that was not a valid rpm.

The reported issue is with numa reallocation. I reproduced pass/fail criteria with dpdk-17.05-3 depending on whether numa reallocation was needed. The fixes for this are included with other bugfixs in DPDK 17.05.2, so I updated to that, tested and confirmed the seg fault does not occur. New rpm is available here: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=613371

thanks,
Kevin.

Comment 13 Pei Zhang 2017-11-07 10:09:56 UTC
(In reply to Kevin Traynor from comment #12)
> Hi Pei,
> 
> I reproduced the issue with dpdk-17.05-3. There was an unrelated build
> problem with the local test rpm I built, so that was not a valid rpm.
> 
> The reported issue is with numa reallocation. I reproduced pass/fail
> criteria with dpdk-17.05-3 depending on whether numa reallocation was
> needed. The fixes for this are included with other bugfixs in DPDK 17.05.2,
> so I updated to that, tested and confirmed the seg fault does not occur. New
> rpm is available here:
> https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=613371

Hi Kevin, this build works well, the issue has gone. Thank you.

And sorry for late reply, as I just got back to work this week.


Best Regards,
Pei


> thanks,
> Kevin.

Comment 14 Kevin Traynor 2018-01-30 16:48:19 UTC
The commits are in in DPDK 17.11.


Note You need to log in before you can comment on or make changes to this bug.