Bug 1801081

Summary: "qemu-kvm: Failed to read from slave." shows when boot qemu vhost-user 2 queues over ovs2.13
Product: Red Hat Enterprise Linux 7 Reporter: Pei Zhang <pezhang>
Component: qemu-kvmAssignee: Adrián Moreno <amorenoz>
Status: CLOSED WONTFIX QA Contact: Pei Zhang <pezhang>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.8CC: aadam, amorenoz, chayang, jinzhao, juzhang, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1793327 Environment:
Last Closed: 2020-05-17 12:14:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1793327, 1801542    
Bug Blocks:    

Description Pei Zhang 2020-02-10 08:37:26 UTC
+++ This bug was initially created as a clone of Bug #1793327 +++

Description of problem:
Boot ovs2.13 with vhost-user 2 queues, then boot qemu with vhost-user 2 queues too, qemu terminal will prompt fail message: "qemu-kvm: Failed to read from slave."

Version-Release number of selected component (if applicable):
3.10.0-1126.el7.x86_64
qemu-kvm-rhev-2.12.0-44.el7.x86_64
openvswitch2.13-2.13.0-0.20200121git2a4f006.el7fdp.x86_64


How reproducible:
100%

Steps to Reproduce:
1. In host, boot ovs with vhost-user 2 queues

2. Boot qemu with vhost-user 2 queues

/usr/libexec/qemu-kvm \
-name guest=rhel7.8 \
-machine q35,kernel_irqchip=split \
-cpu host \
-m 8192 \
-smp 6,sockets=6,cores=1,threads=1 \
-device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1 \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2 \
-device pcie-root-port,port=0x12,chassis=3,id=pci.3 \
-device pcie-root-port,port=0x13,chassis=4,id=pci.4 \
-device pcie-root-port,port=0x14,chassis=5,id=pci.5 \
-device pcie-root-port,port=0x15,chassis=6,id=pci.6 \
-device pcie-root-port,port=0x16,chassis=7,id=pci.7 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfv/rhel7.8.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my,file=my_file \
-device virtio-blk-pci,scsi=off,iommu_platform=on,ats=on,bus=pci.2,addr=0x0,drive=my,id=virtio-disk0,bootindex=1,write-cache=on \
-chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \
-netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \
-device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on \
-chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \
-netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \
-device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on \
-monitor stdio \
-vnc :2 \
-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=8589934592,host-nodes=0,policy=bind \
-numa node,memdev=ram-node0 \


3. Check qemu terminal, there are fault error shows

(qemu) qemu-kvm: Failed to read from slave.
qemu-kvm: Failed to read from slave.


Actual results:
Fault error shows in qemu terminal.

Expected results:
There should be no error shows in qemu terminal.

Additional info:
1. With vhost-user 1 queue, this issue can not be reproduced.
2. With openvswitch2.12-2.12.0-21.el7fdp.x86_64, this issue cannot be reproduced.

--- Additional comment from Adrián Moreno on 2020-01-22 00:55:43 HKT ---

Hi Pei,

When you say 2 queues you mean multiqueue in one vhost device (so "queues=2"), right? The issue has nothing to do with having two vhost devices right?

I have a feeling this is also one of the issues that triggers BZ 1788415. Anyhow, it's good that we have a separate BZ for it.

My bet is that the issue starts occurring when the following commit was introduced in DPDK:

commit 761d57651c51365354cefb624883fccf62aee67d
Author: Tiwei Bie <tiwei.bie>
Date:   Thu Sep 5 19:01:25 2019 +0800

    vhost: fix slave request fd leak
    
    We need to close the old slave request fd if any first
    before taking the new one.
    
    Fixes: 275c3f944730 ("vhost: support slave requests channel")
    Cc: stable
    
    Signed-off-by: Tiwei Bie <tiwei.bie>
    Reviewed-by: Maxime Coquelin <maxime.coquelin>

But that is not the issue. The problem is in qemu. When multiqueue is enabled, qemu will open one slave socket per queue-pair. When that happens, DPDK will close the first socket (what would generate the errors we see). Qemu should only open one slave channel on the firt vqueue pair.

--- Additional comment from Adrián Moreno on 2020-01-22 05:51:52 HKT ---

I've posted a patch upstream that should fix this: https://patchwork.ozlabs.org/patch/1226778/

Comment 2 Pei Zhang 2020-02-10 08:41:33 UTC
Hi Adrian,

This issue can also be reproduced with ovs2.13. So RHEL7 qemu-kvm-rhev also needs your patch fix. Thanks.

Best regards,

Pei