Description of problem: Boot dpdk's testpmd with vhost-user 2 queues, then boot qemu with vhost-user 2 queues too, qemu terminal will prompt fail message: "qemu-kvm: Failed to read from slave." Version-Release number of selected component (if applicable): 4.18.0-171.el8.x86_64 qemu-kvm-4.2.0-6.module+el8.2.0+5453+31b2b136.x86_64 dpdk-19.11-1.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1. In host, boot testpmd with vhost-user 2 queues # /usr/bin/testpmd \ -l 2,4,6,8,10,12,14,16,18 \ --socket-mem 1024,1024 \ -n 4 \ -d /usr/lib64/librte_pmd_vhost.so \ --vdev 'net_vhost0,iface=/tmp/vhostuser0.sock,queues=2,client=1,iommu-support=1' \ --vdev 'net_vhost1,iface=/tmp/vhostuser1.sock,queues=2,client=1,iommu-support=1' \ -- \ --portmask=f \ -i \ --rxd=512 --txd=512 \ --rxq=2 --txq=2 \ --nb-cores=8 \ --forward-mode=io 2. Boot qemu with vhost-user 2 queues /usr/libexec/qemu-kvm \ -name guest=rhel8.2 \ -machine q35,kernel_irqchip=split \ -cpu host \ -m 8192 \ -smp 6,sockets=6,cores=1,threads=1 \ -object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem -mem-prealloc \ -device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \ -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \ -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfv/rhel8.2.qcow2,node-name=my_file \ -blockdev driver=qcow2,node-name=my,file=my_file \ -device virtio-blk-pci,scsi=off,iommu_platform=on,ats=on,bus=pci.2,addr=0x0,drive=my,id=virtio-disk0,bootindex=1,write-cache=on \ -netdev tap,id=hostnet0 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=88:66:da:5f:dd:01,bus=pci.3,addr=0x0 \ -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \ -netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on \ -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \ -netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on \ -monitor stdio \ -vnc :2 \ 3. Check qemu terminal, there are fault error shows (qemu) qemu-kvm: Failed to read from slave. qemu-kvm: Failed to read from slave. Actual results: Fault error shows in qemu terminal. Expected results: There should be no error shows in qemu terminal. Additional info: 1. With vhost-user 1 queue, this issue can not be reproduced. 2. With dpdk-18.11.2-4.el8.x86_64, this issue cannot be reproduced.
Hi Pei, When you say 2 queues you mean multiqueue in one vhost device (so "queues=2"), right? The issue has nothing to do with having two vhost devices right? I have a feeling this is also one of the issues that triggers BZ 1788415. Anyhow, it's good that we have a separate BZ for it. My bet is that the issue starts occurring when the following commit was introduced in DPDK: commit 761d57651c51365354cefb624883fccf62aee67d Author: Tiwei Bie <tiwei.bie> Date: Thu Sep 5 19:01:25 2019 +0800 vhost: fix slave request fd leak We need to close the old slave request fd if any first before taking the new one. Fixes: 275c3f944730 ("vhost: support slave requests channel") Cc: stable Signed-off-by: Tiwei Bie <tiwei.bie> Reviewed-by: Maxime Coquelin <maxime.coquelin> But that is not the issue. The problem is in qemu. When multiqueue is enabled, qemu will open one slave socket per queue-pair. When that happens, DPDK will close the first socket (what would generate the errors we see). Qemu should only open one slave channel on the firt vqueue pair.
I've posted a patch upstream that should fix this: https://patchwork.ozlabs.org/patch/1226778/
(In reply to Adrián Moreno from comment #1) > Hi Pei, > > When you say 2 queues you mean multiqueue in one vhost device (so > "queues=2"), right? The issue has nothing to do with having two vhost > devices right? Hi Adrian, That's right, 2 queues means "queues=2", it's multiqueue in one vhost device. > > I have a feeling this is also one of the issues that triggers BZ 1788415. > Anyhow, it's good that we have a separate BZ for it. I can re-test BZ 1788415 with this bug fix and verify if it can fix it. Thank you. Best regards, Pei
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
Verified with qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420.x86_64: No errors shows in vhost-user 2 queues test scenarios. Testcase: live_migration_nonrt_server_2Q_1G_iommu_ovs =======================Stream Rate: 1Mpps========================= No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss 0 1Mpps 231 19434 0 541974 1 1Mpps 246 18475 0 556005 2 1Mpps 251 17997 0 572175 3 1Mpps 224 17850 0 515903 Max 1Mpps 251 19434 0 572175 Min 1Mpps 224 17850 0 515903 Mean 1Mpps 238 18439 0 546514 Median 1Mpps 238 18236 0 548989 Stdev 0 12.61 714.98 0.0 23848.21 Testcase: live_migration_nonrt_server_2Q_1G_iommu_cross_numa_pvp No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss 0 1Mpps 233 18876 0 529292 1 1Mpps 218 18484 0 503991 2 1Mpps 259 19135 0 584834 3 1Mpps 197 18441 0 455816 Max 1Mpps 259 19135 0 584834 Min 1Mpps 197 18441 0 455816 Mean 1Mpps 226 18734 0 518483 Median 1Mpps 225 18680 0 516641 Stdev 0 26.1 331.32 0.0 53716.73 Testcase: nfv_acceptance_nonrt_server_2Q_1G_iommu Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 21.307340 21.30734 Testcase: pvp_performance_nonrt_server_2Q_iommu Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 20.833873 20.833863 0 64 1 20.833853 20.833863 ==Testing details info== Testcase: live_migration_nonrt_server_2Q_1G_iommu_ovs PASS Testcase: live_migration_nonrt_server_2Q_1G_iommu_cross_numa_pvp PASS Testcase: nfv_acceptance_nonrt_server_2Q_1G_iommu PASS Testcase: pvp_performance_nonrt_server_2Q_iommu PASS So this bug has been fixed well. Will move to 'VERIFIED' once on_qa.
Move to Verified as Comment 8.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137