RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1491898 - In PVP testing, dpdk's testpmd will "Segmentation fault" after booting VM
Summary: In PVP testing, dpdk's testpmd will "Segmentation fault" after booting VM
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: dpdk
Version: 7.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Kevin Traynor
QA Contact: Jean-Tsung Hsiao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-15 01:45 UTC by Pei Zhang
Modified: 2018-01-30 16:48 UTC (History)
9 users (show)

Fixed In Version: dpdk-17.05.2-4.el7fdb
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-30 16:48:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
XML file of VM (3.07 KB, text/html)
2017-09-15 01:45 UTC, Pei Zhang
no flags Details

Description Pei Zhang 2017-09-15 01:45:09 UTC
Created attachment 1326261 [details]
XML file of VM

Description of problem:
This is PVP testing. Boot dpdk's testpmd in host and acts as vhost-user client mode, then start VM, the testpmd will core dump.

Version-Release number of selected component (if applicable):
dpdk-17.05-3.el7fdb.x86_64
qemu-kvm-rhev-2.9.0-16.el7.x86_64
3.10.0-693.el7.x86_64
libvirt-3.7.0-2.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. In host, boot testpmd as vhost-user client mode
# testpmd -l 19,17,15 --socket-mem=1024,1024 -n 4 \
--vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1' -- \
--portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 \
--nb-cores=2 --forward-mode=io

2. Boot VM as vhost-user server mode, full xml file please refer to attachment.
    <interface type='vhostuser'>
      <mac address='38:88:da:5f:dd:01'/>
      <source type='unix' path='/tmp/vhost-user1' mode='server'/>
      <model type='virtio'/>
      <driver name='vhost'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

3. testpmd core dump

testpmd> PMD: ixgbe_dev_link_status_print(): Port 0: Link Up - speed 10000 Mbps - full-duplex
PMD: ixgbe_dev_link_status_print(): Port 0: Link Up - speed 10000 Mbps - full-duplex

Port 0: LSC event
VHOST_CONFIG: /tmp/vhost-user1: connected
VHOST_CONFIG: new device, handle is 0
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_GET_PROTOCOL_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_PROTOCOL_FEATURES
VHOST_CONFIG: read message VHOST_USER_GET_QUEUE_NUM
VHOST_CONFIG: read message VHOST_USER_SET_OWNER
VHOST_CONFIG: read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:25
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:1 file:26
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 1 to qp idx: 0
PMD: vring0 is enabled

Port 1: Queue state event
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 1 to qp idx: 1
PMD: vring1 is enabled

Port 1: Queue state event
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 1 to qp idx: 0
PMD: vring0 is enabled

Port 1: Queue state event
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: set queue enable: 1 to qp idx: 1
PMD: vring1 is enabled

Port 1: Queue state event
VHOST_CONFIG: read message VHOST_USER_SET_FEATURES
VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE
VHOST_CONFIG: guest memory region 0, size: 0x40000000
	 guest physical addr: 0x100000000
	 guest virtual  addr: 0x7fb5c0000000
	 host  virtual  addr: 0x2aab80000000
	 mmap addr : 0x2aaac0000000
	 mmap size : 0x100000000
	 mmap align: 0x40000000
	 mmap off  : 0xc0000000
VHOST_CONFIG: guest memory region 1, size: 0xa0000
	 guest physical addr: 0x0
	 guest virtual  addr: 0x7fb500000000
	 host  virtual  addr: 0x2aabc0000000
	 mmap addr : 0x2aabc0000000
	 mmap size : 0x40000000
	 mmap align: 0x40000000
	 mmap off  : 0x0
VHOST_CONFIG: guest memory region 2, size: 0xbff40000
	 guest physical addr: 0xc0000
	 guest virtual  addr: 0x7fb5000c0000
	 host  virtual  addr: 0x2aac000c0000
	 mmap addr : 0x2aac00000000
	 mmap size : 0xc0000000
	 mmap align: 0x40000000
	 mmap off  : 0xc0000
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
VHOST_CONFIG: reallocate vq from 0 to 1 node
VHOST_CONFIG: reallocate dev from 0 to 1 node
VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK
VHOST_CONFIG: vring kick idx:0 file:30
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL
VHOST_CONFIG: vring call idx:0 file:31
VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM
VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE
VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR
VHOST_CONFIG: reallocate vq from 0 to 1 node
pvp_client.sh: line 4: 166463 Segmentation fault      testpmd -l 19,17,15 --socket-mem=1024,1024 -n 4 --vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1' -- --portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 --nb-cores=2 --forward-mode=io


Actual results:
testpmd core dump

Expected results:
testpmd should work well.

Additional info:
1. This is a regression issue.
dpdk-16.11-4.el7fdp.x86_64    works well

Comment 3 Aaron Conole 2017-09-15 13:34:32 UTC
Can you modify your PVP script to execute 'ulimit -c unlimited' at the very beginning.
Then please attach the core file which is created.

Comment 4 Pei Zhang 2017-09-18 02:13:59 UTC
(In reply to Aaron Conole from comment #3)
> Can you modify your PVP script to execute 'ulimit -c unlimited' at the very
> beginning.
> Then please attach the core file which is created.

Hi Aaron,

core file please refer to:

http://fileshare.englab.nay.redhat.com/pub/section2/coredump/var/crash/pezhang/bug1491898/


How I start the testpmd and execute 'ulimit -c unlimited':
# cat pvp_client.sh 
ulimit -c unlimited
testpmd -l 19,17,15 --socket-mem=1024,1024 -n 4 \
--vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1' -- \
--portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 \
--nb-cores=2 --forward-mode=io

# sh pvp_client.sh 


Best Regards,
Pei

Comment 5 Aaron Conole 2017-09-18 13:48:49 UTC
Thanks for posting the core file.  5 minutes glance:

stack trace:

..
> Program terminated with signal 11, Segmentation fault.
> #0  0x00007ff3200c44c2 in qva_to_vva (qva=<optimized out>, dev=<optimized out>)
>     at /usr/src/debug/dpdk-17.05/lib/librte_vhost/vhost_user.c:315
> 315		for (i = 0; i < dev->mem->nregions; i++) {
> (gdb) bt
> Thread 1 (Thread 0x7ff31764c700 (LWP 8778)):
> ---Type <return> to continue, or q <return> to quit---
> #0  0x00007ff3200c44c2 in qva_to_vva (qva=<optimized out>, dev=<optimized out>)
>     at /usr/src/debug/dpdk-17.05/lib/librte_vhost/vhost_user.c:315
> #1  vhost_user_set_vring_addr (addr=0x7ff31764bb6c, dev=<optimized out>)
>     at /usr/src/debug/dpdk-17.05/lib/librte_vhost/vhost_user.c:356
> #2  vhost_user_msg_handler (vid=<optimized out>, fd=fd@entry=23)
>     at /usr/src/debug/dpdk-17.05/lib/librte_vhost/vhost_user.c:1040
> #3  0x00007ff3200c1dff in vhost_user_read_cb (connfd=23, dat=0x7ff3080008c0, 
>     remove=0x7ff31764bd50)
>     at /usr/src/debug/dpdk-17.05/lib/librte_vhost/socket.c:274
> #4  0x00007ff3200c18f0 in fdset_event_dispatch (
>     arg=0x7ff3202ce1a0 <vhost_user+8192>)
>     at /usr/src/debug/dpdk-17.05/lib/librte_vhost/fd_man.c:273
> #5  0x00007ff31e165e25 in start_thread () from /lib64/libpthread.so.0
> #6  0x00007ff31de9334d in clone () from /lib64/libc.so.6

Looking at dev->mem (x/i $rip, p/x $rdi), it appears it's 0 (null).

This looks like it's going to be a bug in dpdk's virtio driver, but which side is not clear yet.

Comment 6 Aaron Conole 2017-09-18 13:57:18 UTC
Also, what version of the OvS package are you running in the host?

Comment 7 Pei Zhang 2017-09-18 14:54:05 UTC
(In reply to Aaron Conole from comment #6)
> Also, what version of the OvS package are you running in the host?

I didn't running ovs, only running dpdk in host.

Comment 8 Kevin Traynor 2017-09-19 10:40:22 UTC
There was some known bugs related to vhost memory reallocation and multi numa in DPDK 17.05.0. I suspect that you might be hitting it. They were fixed in DPDK 17.05.1. 

Can you confirm you are running on a multi-numa system?

as a workaround I think you can avoid reallocation with DPDK 17.05.0 by running testpmd lcores, setting emulatorpin and vcpus in libvirt to all use the same numa node.

Comment 12 Kevin Traynor 2017-10-25 16:57:42 UTC
Hi Pei,

I reproduced the issue with dpdk-17.05-3. There was an unrelated build problem with the local test rpm I built, so that was not a valid rpm.

The reported issue is with numa reallocation. I reproduced pass/fail criteria with dpdk-17.05-3 depending on whether numa reallocation was needed. The fixes for this are included with other bugfixs in DPDK 17.05.2, so I updated to that, tested and confirmed the seg fault does not occur. New rpm is available here: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=613371

thanks,
Kevin.

Comment 13 Pei Zhang 2017-11-07 10:09:56 UTC
(In reply to Kevin Traynor from comment #12)
> Hi Pei,
> 
> I reproduced the issue with dpdk-17.05-3. There was an unrelated build
> problem with the local test rpm I built, so that was not a valid rpm.
> 
> The reported issue is with numa reallocation. I reproduced pass/fail
> criteria with dpdk-17.05-3 depending on whether numa reallocation was
> needed. The fixes for this are included with other bugfixs in DPDK 17.05.2,
> so I updated to that, tested and confirmed the seg fault does not occur. New
> rpm is available here:
> https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=613371

Hi Kevin, this build works well, the issue has gone. Thank you.

And sorry for late reply, as I just got back to work this week.


Best Regards,
Pei


> thanks,
> Kevin.

Comment 14 Kevin Traynor 2018-01-30 16:48:19 UTC
The commits are in in DPDK 17.11.


Note You need to log in before you can comment on or make changes to this bug.