Bug 1491898
| Summary: | In PVP testing, dpdk's testpmd will "Segmentation fault" after booting VM | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Pei Zhang <pezhang> | ||||
| Component: | dpdk | Assignee: | Kevin Traynor <ktraynor> | ||||
| Status: | CLOSED NEXTRELEASE | QA Contact: | Jean-Tsung Hsiao <jhsiao> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 7.5 | CC: | aconole, chayang, jfreiman, juzhang, michen, pezhang, qding, rkhan, tredaelli | ||||
| Target Milestone: | rc | Keywords: | Extras, Regression | ||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | dpdk-17.05.2-4.el7fdb | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-01-30 16:48:19 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Can you modify your PVP script to execute 'ulimit -c unlimited' at the very beginning. Then please attach the core file which is created. (In reply to Aaron Conole from comment #3) > Can you modify your PVP script to execute 'ulimit -c unlimited' at the very > beginning. > Then please attach the core file which is created. Hi Aaron, core file please refer to: http://fileshare.englab.nay.redhat.com/pub/section2/coredump/var/crash/pezhang/bug1491898/ How I start the testpmd and execute 'ulimit -c unlimited': # cat pvp_client.sh ulimit -c unlimited testpmd -l 19,17,15 --socket-mem=1024,1024 -n 4 \ --vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1' -- \ --portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 \ --nb-cores=2 --forward-mode=io # sh pvp_client.sh Best Regards, Pei Thanks for posting the core file. 5 minutes glance:
stack trace:
..
> Program terminated with signal 11, Segmentation fault.
> #0 0x00007ff3200c44c2 in qva_to_vva (qva=<optimized out>, dev=<optimized out>)
> at /usr/src/debug/dpdk-17.05/lib/librte_vhost/vhost_user.c:315
> 315 for (i = 0; i < dev->mem->nregions; i++) {
> (gdb) bt
> Thread 1 (Thread 0x7ff31764c700 (LWP 8778)):
> ---Type <return> to continue, or q <return> to quit---
> #0 0x00007ff3200c44c2 in qva_to_vva (qva=<optimized out>, dev=<optimized out>)
> at /usr/src/debug/dpdk-17.05/lib/librte_vhost/vhost_user.c:315
> #1 vhost_user_set_vring_addr (addr=0x7ff31764bb6c, dev=<optimized out>)
> at /usr/src/debug/dpdk-17.05/lib/librte_vhost/vhost_user.c:356
> #2 vhost_user_msg_handler (vid=<optimized out>, fd=fd@entry=23)
> at /usr/src/debug/dpdk-17.05/lib/librte_vhost/vhost_user.c:1040
> #3 0x00007ff3200c1dff in vhost_user_read_cb (connfd=23, dat=0x7ff3080008c0,
> remove=0x7ff31764bd50)
> at /usr/src/debug/dpdk-17.05/lib/librte_vhost/socket.c:274
> #4 0x00007ff3200c18f0 in fdset_event_dispatch (
> arg=0x7ff3202ce1a0 <vhost_user+8192>)
> at /usr/src/debug/dpdk-17.05/lib/librte_vhost/fd_man.c:273
> #5 0x00007ff31e165e25 in start_thread () from /lib64/libpthread.so.0
> #6 0x00007ff31de9334d in clone () from /lib64/libc.so.6
Looking at dev->mem (x/i $rip, p/x $rdi), it appears it's 0 (null).
This looks like it's going to be a bug in dpdk's virtio driver, but which side is not clear yet.
Also, what version of the OvS package are you running in the host? (In reply to Aaron Conole from comment #6) > Also, what version of the OvS package are you running in the host? I didn't running ovs, only running dpdk in host. There was some known bugs related to vhost memory reallocation and multi numa in DPDK 17.05.0. I suspect that you might be hitting it. They were fixed in DPDK 17.05.1. Can you confirm you are running on a multi-numa system? as a workaround I think you can avoid reallocation with DPDK 17.05.0 by running testpmd lcores, setting emulatorpin and vcpus in libvirt to all use the same numa node. Hi Pei, I reproduced the issue with dpdk-17.05-3. There was an unrelated build problem with the local test rpm I built, so that was not a valid rpm. The reported issue is with numa reallocation. I reproduced pass/fail criteria with dpdk-17.05-3 depending on whether numa reallocation was needed. The fixes for this are included with other bugfixs in DPDK 17.05.2, so I updated to that, tested and confirmed the seg fault does not occur. New rpm is available here: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=613371 thanks, Kevin. (In reply to Kevin Traynor from comment #12) > Hi Pei, > > I reproduced the issue with dpdk-17.05-3. There was an unrelated build > problem with the local test rpm I built, so that was not a valid rpm. > > The reported issue is with numa reallocation. I reproduced pass/fail > criteria with dpdk-17.05-3 depending on whether numa reallocation was > needed. The fixes for this are included with other bugfixs in DPDK 17.05.2, > so I updated to that, tested and confirmed the seg fault does not occur. New > rpm is available here: > https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=613371 Hi Kevin, this build works well, the issue has gone. Thank you. And sorry for late reply, as I just got back to work this week. Best Regards, Pei > thanks, > Kevin. The commits are in in DPDK 17.11. |
Created attachment 1326261 [details] XML file of VM Description of problem: This is PVP testing. Boot dpdk's testpmd in host and acts as vhost-user client mode, then start VM, the testpmd will core dump. Version-Release number of selected component (if applicable): dpdk-17.05-3.el7fdb.x86_64 qemu-kvm-rhev-2.9.0-16.el7.x86_64 3.10.0-693.el7.x86_64 libvirt-3.7.0-2.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. In host, boot testpmd as vhost-user client mode # testpmd -l 19,17,15 --socket-mem=1024,1024 -n 4 \ --vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1' -- \ --portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 \ --nb-cores=2 --forward-mode=io 2. Boot VM as vhost-user server mode, full xml file please refer to attachment. <interface type='vhostuser'> <mac address='38:88:da:5f:dd:01'/> <source type='unix' path='/tmp/vhost-user1' mode='server'/> <model type='virtio'/> <driver name='vhost'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> 3. testpmd core dump testpmd> PMD: ixgbe_dev_link_status_print(): Port 0: Link Up - speed 10000 Mbps - full-duplex PMD: ixgbe_dev_link_status_print(): Port 0: Link Up - speed 10000 Mbps - full-duplex Port 0: LSC event VHOST_CONFIG: /tmp/vhost-user1: connected VHOST_CONFIG: new device, handle is 0 VHOST_CONFIG: read message VHOST_USER_GET_FEATURES VHOST_CONFIG: read message VHOST_USER_GET_PROTOCOL_FEATURES VHOST_CONFIG: read message VHOST_USER_SET_PROTOCOL_FEATURES VHOST_CONFIG: read message VHOST_USER_GET_QUEUE_NUM VHOST_CONFIG: read message VHOST_USER_SET_OWNER VHOST_CONFIG: read message VHOST_USER_GET_FEATURES VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:0 file:25 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:1 file:26 VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: set queue enable: 1 to qp idx: 0 PMD: vring0 is enabled Port 1: Queue state event VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: set queue enable: 1 to qp idx: 1 PMD: vring1 is enabled Port 1: Queue state event VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: set queue enable: 1 to qp idx: 0 PMD: vring0 is enabled Port 1: Queue state event VHOST_CONFIG: read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: set queue enable: 1 to qp idx: 1 PMD: vring1 is enabled Port 1: Queue state event VHOST_CONFIG: read message VHOST_USER_SET_FEATURES VHOST_CONFIG: read message VHOST_USER_SET_MEM_TABLE VHOST_CONFIG: guest memory region 0, size: 0x40000000 guest physical addr: 0x100000000 guest virtual addr: 0x7fb5c0000000 host virtual addr: 0x2aab80000000 mmap addr : 0x2aaac0000000 mmap size : 0x100000000 mmap align: 0x40000000 mmap off : 0xc0000000 VHOST_CONFIG: guest memory region 1, size: 0xa0000 guest physical addr: 0x0 guest virtual addr: 0x7fb500000000 host virtual addr: 0x2aabc0000000 mmap addr : 0x2aabc0000000 mmap size : 0x40000000 mmap align: 0x40000000 mmap off : 0x0 VHOST_CONFIG: guest memory region 2, size: 0xbff40000 guest physical addr: 0xc0000 guest virtual addr: 0x7fb5000c0000 host virtual addr: 0x2aac000c0000 mmap addr : 0x2aac00000000 mmap size : 0xc0000000 mmap align: 0x40000000 mmap off : 0xc0000 VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR VHOST_CONFIG: reallocate vq from 0 to 1 node VHOST_CONFIG: reallocate dev from 0 to 1 node VHOST_CONFIG: read message VHOST_USER_SET_VRING_KICK VHOST_CONFIG: vring kick idx:0 file:30 VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL VHOST_CONFIG: vring call idx:0 file:31 VHOST_CONFIG: read message VHOST_USER_SET_VRING_NUM VHOST_CONFIG: read message VHOST_USER_SET_VRING_BASE VHOST_CONFIG: read message VHOST_USER_SET_VRING_ADDR VHOST_CONFIG: reallocate vq from 0 to 1 node pvp_client.sh: line 4: 166463 Segmentation fault testpmd -l 19,17,15 --socket-mem=1024,1024 -n 4 --vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1' -- --portmask=3 --disable-hw-vlan -i --rxq=1 --txq=1 --nb-cores=2 --forward-mode=io Actual results: testpmd core dump Expected results: testpmd should work well. Additional info: 1. This is a regression issue. dpdk-16.11-4.el7fdp.x86_64 works well