Bug 1801542

Summary: "qemu-kvm: Failed to read from slave." shows when boot qemu vhost-user 2 queues over dpdk 19.11[rhel8.2]
Product: Red Hat Enterprise Linux 8 Reporter: Pei Zhang <pezhang>
Component: qemu-kvmAssignee: Adrián Moreno <amorenoz>
qemu-kvm sub component: Networking QA Contact: Pei Zhang <pezhang>
Status: CLOSED WONTFIX Docs Contact:
Severity: medium    
Priority: medium CC: aadam, ailan, amorenoz, chayang, jinzhao, juzhang, knoel, virt-maint
Version: 8.2Flags: pm-rhel: mirror+
Target Milestone: rc   
Target Release: 8.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1793327 Environment:
Last Closed: 2020-08-15 18:18:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1793327    
Bug Blocks: 1801081    

Description Pei Zhang 2020-02-11 07:18:16 UTC
+++ This bug was initially created as a clone of Bug #1793327 +++

Description of problem:
Boot dpdk's testpmd with vhost-user 2 queues, then boot qemu with vhost-user 2 queues too, qemu terminal will prompt fail message: "qemu-kvm: Failed to read from slave."

Version-Release number of selected component (if applicable):
4.18.0-171.el8.x86_64
qemu-kvm-2.12.0-97.module+el8.2.0+5545+14c6799f.x86_64
dpdk-19.11-1.el8.x86_64


How reproducible:
100%

Steps to Reproduce:
1. In host, boot testpmd with vhost-user 2 queues

# /usr/bin/testpmd \
	-l 2,4,6,8,10,12,14,16,18 \
	--socket-mem 1024,1024 \
	-n 4 \
	-d /usr/lib64/librte_pmd_vhost.so  \
	--vdev 'net_vhost0,iface=/tmp/vhostuser0.sock,queues=2,client=1,iommu-support=1' \
	--vdev 'net_vhost1,iface=/tmp/vhostuser1.sock,queues=2,client=1,iommu-support=1' \
	-- \
	--portmask=f \
	-i \
	--rxd=512 --txd=512 \
	--rxq=2 --txq=2 \
	--nb-cores=8 \
	--forward-mode=io


2. Boot qemu with vhost-user 2 queues
/usr/libexec/qemu-kvm \
-name guest=rhel8.2 \
-machine q35,kernel_irqchip=split \
-cpu host \
-m 8192 \
-smp 6,sockets=6,cores=1,threads=1 \
-object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \
-numa node,memdev=mem -mem-prealloc \
-device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfv/rhel8.2.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my,file=my_file \
-device virtio-blk-pci,scsi=off,iommu_platform=on,ats=on,bus=pci.2,addr=0x0,drive=my,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,id=hostnet0 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=88:66:da:5f:dd:01,bus=pci.3,addr=0x0 \
-chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \
-netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \
-device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on \
-chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \
-netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \
-device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on \
-monitor stdio \
-vnc :2 \


3. Check qemu terminal, there are fault error shows

(qemu) qemu-kvm: Failed to read from slave.
qemu-kvm: Failed to read from slave.

Actual results:
Fault error shows in qemu terminal.

Expected results:
There should be no error shows in qemu terminal.

Additional info:
1. With vhost-user 1 queue, this issue can not be reproduced.
2. With dpdk-18.11.2-4.el8.x86_64, this issue cannot be reproduced.

--- Additional comment from Adrián Moreno on 2020-01-22 00:55:43 HKT ---

Hi Pei,

When you say 2 queues you mean multiqueue in one vhost device (so "queues=2"), right? The issue has nothing to do with having two vhost devices right?

I have a feeling this is also one of the issues that triggers BZ 1788415. Anyhow, it's good that we have a separate BZ for it.

My bet is that the issue starts occurring when the following commit was introduced in DPDK:

commit 761d57651c51365354cefb624883fccf62aee67d
Author: Tiwei Bie <tiwei.bie>
Date:   Thu Sep 5 19:01:25 2019 +0800

    vhost: fix slave request fd leak
    
    We need to close the old slave request fd if any first
    before taking the new one.
    
    Fixes: 275c3f944730 ("vhost: support slave requests channel")
    Cc: stable
    
    Signed-off-by: Tiwei Bie <tiwei.bie>
    Reviewed-by: Maxime Coquelin <maxime.coquelin>

But that is not the issue. The problem is in qemu. When multiqueue is enabled, qemu will open one slave socket per queue-pair. When that happens, DPDK will close the first socket (what would generate the errors we see). Qemu should only open one slave channel on the firt vqueue pair.

--- Additional comment from Adrián Moreno on 2020-01-22 05:51:52 HKT ---

I've posted a patch upstream that should fix this: https://patchwork.ozlabs.org/patch/1226778/