Bug 1525446
Summary: | Host dpdk's testpmd "Segmentation fault" when migrating VM with vhost-user and packets flow | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Pei Zhang <pezhang> | ||||
Component: | dpdk | Assignee: | Aaron Conole <aconole> | ||||
Status: | CLOSED ERRATA | QA Contact: | Pei Zhang <pezhang> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 7.5 | CC: | atragler, chayang, fbaudin, jhsiao, juzhang, maxime.coquelin, michen, pezhang, vkaplans | ||||
Target Milestone: | rc | Keywords: | Extras, Regression | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | dpdk-17.11-6.el7.x86_64 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-04-10 23:59:23 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Additional info: 1. This is a regression issue. dpdk-16.11.2-6.el7.x86_64 works well. 2. More details about the error. (1) After migrating from src to des, the dpdk's testpmd in src host "Segmentation fault". (2) Another error in #dmesg [15979.767740] lcore-slave-4[27362]: segfault at 2 ip 00007fdb728fbab1 sp 00007fdb6d7fe950 error 4 [15979.767813] lcore-slave-6[27363]: segfault at 2 ip 00007fdb728fd93a sp 00007fdb6cffe900 error 4 [15979.767816] in librte_vhost.so.4[7fdb728f2000+10000] [15979.790956] in librte_vhost.so.4[7fdb728f2000+10000] Looking at the logs, it crashed withing the same second SET_VRING_ADDR is being handled while the device is running. I already reproduced such crash with DPDK v17.11 and posted a fix for this specific one [0]. However, this patch has been discarded as Victor has fixed async virtio_net struct changes more generally by introducing a new lock. Victor patch has been accepted upstream and queued for v17.11 LTS release: https://dpdk.org/dev/patchwork/patch/33921/ This patch is needed also for Bz1450680. Adding Victor in cc:. Regards, Maxime [0]: http://dpdk.org/dev/patchwork/patch/31659/ ==Verification== Versions: 3.10.0-843.el7.x86_64 qemu-kvm-rhev-2.10.0-19.el7.x86_64 libvirt-3.9.0-11.el7.x86_64 dpdk-17.11-7.el7.x86_64 Steps: Following steps in Description. All 10 migration runs work well, both dpdk and guest work well, no any error. So this bug has been fixed well. Move status to 'VERIFIED'. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1065 |
Created attachment 1367272 [details] XML of VM Description of problem: Boot dpdk's testpmd with vhost-user in host. Next boot VM using same vhost-user socket. Then generates packets from another host to this VM. Then testpmd will "Segmentation fault" after migration finished. Version-Release number of selected component (if applicable): qemu-kvm-rhev-2.10.0-12.el7.x86_64 3.10.0-820.el7.x86_64 dpdk-17.11-3.el7.x86_64/ How reproducible: 100% Steps to Reproduce: 1. Boot testpmd in src and des host # /usr/bin/testpmd -l 2,4,6,8,10,12,14 \ --socket-mem 1024,1024 \ -n 4 \ --vdev 'net_vhost0,iface=/tmp/vhost-user1,client=1' \ --vdev 'net_vhost1,iface=/tmp/vhost-user2,client=1' \ --vdev 'net_vhost2,iface=/tmp/vhost-user3,client=1' \ -- \ --portmask=3F \ --disable-hw-vlan \ -i \ --rxq=1 --txq=1 \ --nb-cores=6 \ --forward-mode=io 2. Boot VM See attachment of this Comment. 3. Start testpmd in guest modprobe vfio enable_unsafe_noiommu_mode=Y modprobe vfio-pci # /usr/bin/testpmd \ -l 1,2,3 \ -n 4 \ -d /usr/lib64/librte_pmd_virtio.so.1 \ -w 0000:00:03.0 -w 0000:00:06.0 \ -- \ --nb-cores=2 \ --disable-hw-vlan \ -i \ --disable-rss \ --rxq=1 --txq=1 4. Generator packets from another host ./build/MoonGen examples/l2-load-latency.lua 0 1 64 5. Do migration from src to des host # virsh migrate --verbose --persistent --live rhel7.5_nonrt qemu+ssh://192.168.1.2/system 6. After migration finished, dpdk quit with "Segmentation fault". Also there are error info in # dmesg. # dmesg [16105.282031] testpmd[24507]: segfault at 24 ip 00007fef3bcf42d7 sp 00007fef333f6c40 error 4 in librte_vhost.so.4[7fef3bcef000+10000] Actual results: dpdk's testpmd quit unexpected. Expected results: dpdk's testpmd should always work well. Additional info: