Bug 1447592

Summary: vhost-user/reply-ack: Wait for ack even if no request sent (one-time requests)
Product: Red Hat Enterprise Linux 7 Reporter: Maxime Coquelin <maxime.coquelin>
Component: qemu-kvm-rhevAssignee: Jens Freimann <jfreiman>
Status: CLOSED ERRATA QA Contact: Pei Zhang <pezhang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: ailan, chayang, drjones, jasowang, juzhang, lmiksik, marcandre.lureau, michen, mrezanin, mst, pezhang, virt-maint, xiywang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-8.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 04:38:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Maxime Coquelin 2017-05-03 09:35:08 UTC
Description of problem:

When vhost-user backend supports VHOST_USER_PROTOCOL_F_REPLY_ACK,
QEMU (for some requests) sets the VHOST_USER_NEED_REPLY flag in the
message request. Once message sent, it waits for the reply from the backend.

The problem is that this feature implementation is currently broken 
in QEMU, when used with one-time requests.
When a requests listed as one-time is sent a second time, like 
VHOST_USER_SET_MEM_TABLE and VHOST_USER_NET_SET_MTU with multiqueue,
the request is not sent by QEMU, but QEMU waits for a ACK from the backend
(which will never arrive).

Version-Release number of selected component (if applicable):

Reproducible with vhost-user backends supporting REPLY_ACK feature,
like upstream DPDK v17.02. Note that QEMU's libvhost-user doesn't implement
the feature, so should not be used to reproduce/test the bug.

How reproducible:

Start DPDK v17.02's testpmd application with a vhost interface with multiple queues.
See: http://dpdk.org/doc/guides/nics/vhost.html

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Upstream discussions:
DPDK ML: http://dpdk.org/dev/patchwork/patch/23941/
QEMU ML: http://patchwork.ozlabs.org/patch/756249/

Comment 2 Jens Freimann 2017-05-04 12:35:08 UTC
Steps to Reproduce:
1. Start testpmd:
#> testpmd -l 0-3 -n 2 -m 4096 --vdev 'net_vhost0,iface=/tmp/sock0,queues=2' -- -i

2. Start QEMU:
#> qemu-system-x86_64 \
        -enable-kvm \
        -cpu host \
        -m 3072 \
        -smp 2 \
        -object memory-backend-file,id=mem,size=3072M,mem-path=/dev/hugepages,share=on \
        -numa node,memdev=mem \
        -mem-prealloc \
        -chardev socket,id=chr0,path=/tmp/sock0 \
        -netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=$1 \
        -device virtio-net-pci,netdev=net0,mq=on,vectors=6 \
        <image file>
3. Wait for system to boot

Actual results: Guest hangs during boot, qemu waiting for reply from dpdk
Expected results: Guest is started with functioning network device

Comment 3 Jens Freimann 2017-05-09 08:14:29 UTC
https://patchwork.kernel.org/patch/9712283/

Patch is reviewed upstream but not yet applied.

Comment 4 Jens Freimann 2017-05-09 14:38:54 UTC
Patch was only posted upstream but not yet internally, so I'm changing the status back to ASSIGNED

Comment 5 Jens Freimann 2017-05-18 15:52:32 UTC
patch posted to rhvirt-patches

Comment 6 Miroslav Rezanina 2017-05-23 08:15:26 UTC
Fix included in qemu-kvm-rhev-2.9.0-6.el7

Comment 8 Jens Freimann 2017-05-24 09:11:59 UTC
Changing back to Assigned because follow-up patch fixing this one was posted upstream

https://www.mail-archive.com/qemu-devel@nongnu.org/msg452448.html

Let's wait for it to be applied upstream then apply both patches.

Comment 9 Jens Freimann 2017-06-01 13:52:14 UTC
Follow-on patch mentioned in comment #8 posted to rhvirt-patches

Comment 10 Miroslav Rezanina 2017-06-06 08:54:12 UTC
Fix included in qemu-kvm-rhev-2.9.0-8.el7

Comment 12 Pei Zhang 2017-06-17 12:18:06 UTC
==Reproduce==

Versions:
dpdk-17.02.1.tar.xz
qemu-kvm-rhev-2.8.0-6.el7.x86_64

Steps:
1. Boot testpmd with net_vhost0 using 2 queues

# /root/dpdk-stable-17.02.1/x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd \
-l 2,4,6,8 \
-d /root/dpdk-stable-17.02.1/x86_64-native-linuxapp-gcc/lib/librte_pmd_vhost.so \
--vdev 'net_vhost0,iface=/tmp/sock0,queues=2' \
-n 4 \
--socket-mem 1024,1024 \
-- -i \

2. Boot VM using this socket
# /usr/libexec/qemu-kvm \
-cpu host \
-m 3072 \
-smp 8 \
-object memory-backend-file,id=mem,size=3072M,mem-path=/dev/hugepages,share=on \
-numa node,memdev=mem \
-mem-prealloc \
-chardev socket,id=chr0,path=/tmp/sock0 \
-netdev vhost-user,id=net0,chardev=chr0,vhostforce,queues=2 \
-device virtio-net-pci,netdev=net0,mq=on,vectors=6 \
/mnt/nfv/rhel7.4_nonrt.qcow2 \
-monitor stdio \
-vnc :2


3. Both qemu and guest become hang. 

So this bug has been reproduced.

==Verification==
Versions:
dpdk-17.02.1.tar.xz
qemu-kvm-rhev-2.9.0-10.el7.x86_64

Steps:
1. Boot testpmd with net_vhost0 using 2 queues

2. Boot VM using this socket

3. Both qemu and VM works well


So this bug has been fixed well. 

Thanks Maxime for helping me in irc to reproduce this bug.

Comment 13 Pei Zhang 2017-06-17 12:18:43 UTC
Move status of this bug to 'VERIFIED' as Comment 12.

Comment 15 errata-xmlrpc 2017-08-02 04:38:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392