Red Hat Bugzilla – Bug 1447592
vhost-user/reply-ack: Wait for ack even if no request sent (one-time requests)
Last modified: 2017-08-02 00:38:29 EDT
Description of problem: When vhost-user backend supports VHOST_USER_PROTOCOL_F_REPLY_ACK, QEMU (for some requests) sets the VHOST_USER_NEED_REPLY flag in the message request. Once message sent, it waits for the reply from the backend. The problem is that this feature implementation is currently broken in QEMU, when used with one-time requests. When a requests listed as one-time is sent a second time, like VHOST_USER_SET_MEM_TABLE and VHOST_USER_NET_SET_MTU with multiqueue, the request is not sent by QEMU, but QEMU waits for a ACK from the backend (which will never arrive). Version-Release number of selected component (if applicable): Reproducible with vhost-user backends supporting REPLY_ACK feature, like upstream DPDK v17.02. Note that QEMU's libvhost-user doesn't implement the feature, so should not be used to reproduce/test the bug. How reproducible: Start DPDK v17.02's testpmd application with a vhost interface with multiple queues. See: http://dpdk.org/doc/guides/nics/vhost.html Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Upstream discussions: DPDK ML: http://dpdk.org/dev/patchwork/patch/23941/ QEMU ML: http://patchwork.ozlabs.org/patch/756249/
Steps to Reproduce: 1. Start testpmd: #> testpmd -l 0-3 -n 2 -m 4096 --vdev 'net_vhost0,iface=/tmp/sock0,queues=2' -- -i 2. Start QEMU: #> qemu-system-x86_64 \ -enable-kvm \ -cpu host \ -m 3072 \ -smp 2 \ -object memory-backend-file,id=mem,size=3072M,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem \ -mem-prealloc \ -chardev socket,id=chr0,path=/tmp/sock0 \ -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=$1 \ -device virtio-net-pci,netdev=net0,mq=on,vectors=6 \ <image file> 3. Wait for system to boot Actual results: Guest hangs during boot, qemu waiting for reply from dpdk Expected results: Guest is started with functioning network device
https://patchwork.kernel.org/patch/9712283/ Patch is reviewed upstream but not yet applied.
Patch was only posted upstream but not yet internally, so I'm changing the status back to ASSIGNED
patch posted to rhvirt-patches
Fix included in qemu-kvm-rhev-2.9.0-6.el7
Changing back to Assigned because follow-up patch fixing this one was posted upstream https://www.mail-archive.com/qemu-devel@nongnu.org/msg452448.html Let's wait for it to be applied upstream then apply both patches.
Follow-on patch mentioned in comment #8 posted to rhvirt-patches
Fix included in qemu-kvm-rhev-2.9.0-8.el7
==Reproduce== Versions: dpdk-17.02.1.tar.xz qemu-kvm-rhev-2.8.0-6.el7.x86_64 Steps: 1. Boot testpmd with net_vhost0 using 2 queues # /root/dpdk-stable-17.02.1/x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd \ -l 2,4,6,8 \ -d /root/dpdk-stable-17.02.1/x86_64-native-linuxapp-gcc/lib/librte_pmd_vhost.so \ --vdev 'net_vhost0,iface=/tmp/sock0,queues=2' \ -n 4 \ --socket-mem 1024,1024 \ -- -i \ 2. Boot VM using this socket # /usr/libexec/qemu-kvm \ -cpu host \ -m 3072 \ -smp 8 \ -object memory-backend-file,id=mem,size=3072M,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem \ -mem-prealloc \ -chardev socket,id=chr0,path=/tmp/sock0 \ -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=2 \ -device virtio-net-pci,netdev=net0,mq=on,vectors=6 \ /mnt/nfv/rhel7.4_nonrt.qcow2 \ -monitor stdio \ -vnc :2 3. Both qemu and guest become hang. So this bug has been reproduced. ==Verification== Versions: dpdk-17.02.1.tar.xz qemu-kvm-rhev-2.9.0-10.el7.x86_64 Steps: 1. Boot testpmd with net_vhost0 using 2 queues 2. Boot VM using this socket 3. Both qemu and VM works well So this bug has been fixed well. Thanks Maxime for helping me in irc to reproduce this bug.
Move status of this bug to 'VERIFIED' as Comment 12.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392