Bug 1793327 - "qemu-kvm: Failed to read from slave." shows when boot qemu vhost-user 2 queues over dpdk 19.11[rhel8.2-av]
Summary: "qemu-kvm: Failed to read from slave." shows when boot qemu vhost-user 2 queu...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Adrián Moreno
QA Contact: Pei Zhang
URL:
Whiteboard:
Depends On:
Blocks: 1801081 1801542
TreeView+ depends on / blocked
 
Reported: 2020-01-21 06:57 UTC by Pei Zhang
Modified: 2020-11-17 17:47 UTC (History)
6 users (show)

Fixed In Version: qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1801081 1801542 (view as bug list)
Environment:
Last Closed: 2020-11-17 17:46:36 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Pei Zhang 2020-01-21 06:57:18 UTC
Description of problem:
Boot dpdk's testpmd with vhost-user 2 queues, then boot qemu with vhost-user 2 queues too, qemu terminal will prompt fail message: "qemu-kvm: Failed to read from slave."

Version-Release number of selected component (if applicable):
4.18.0-171.el8.x86_64
qemu-kvm-4.2.0-6.module+el8.2.0+5453+31b2b136.x86_64
dpdk-19.11-1.el8.x86_64


How reproducible:
100%

Steps to Reproduce:
1. In host, boot testpmd with vhost-user 2 queues

# /usr/bin/testpmd \
	-l 2,4,6,8,10,12,14,16,18 \
	--socket-mem 1024,1024 \
	-n 4 \
	-d /usr/lib64/librte_pmd_vhost.so  \
	--vdev 'net_vhost0,iface=/tmp/vhostuser0.sock,queues=2,client=1,iommu-support=1' \
	--vdev 'net_vhost1,iface=/tmp/vhostuser1.sock,queues=2,client=1,iommu-support=1' \
	-- \
	--portmask=f \
	-i \
	--rxd=512 --txd=512 \
	--rxq=2 --txq=2 \
	--nb-cores=8 \
	--forward-mode=io


2. Boot qemu with vhost-user 2 queues
/usr/libexec/qemu-kvm \
-name guest=rhel8.2 \
-machine q35,kernel_irqchip=split \
-cpu host \
-m 8192 \
-smp 6,sockets=6,cores=1,threads=1 \
-object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \
-numa node,memdev=mem -mem-prealloc \
-device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfv/rhel8.2.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my,file=my_file \
-device virtio-blk-pci,scsi=off,iommu_platform=on,ats=on,bus=pci.2,addr=0x0,drive=my,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,id=hostnet0 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=88:66:da:5f:dd:01,bus=pci.3,addr=0x0 \
-chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \
-netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \
-device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on \
-chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \
-netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \
-device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on \
-monitor stdio \
-vnc :2 \


3. Check qemu terminal, there are fault error shows

(qemu) qemu-kvm: Failed to read from slave.
qemu-kvm: Failed to read from slave.

Actual results:
Fault error shows in qemu terminal.

Expected results:
There should be no error shows in qemu terminal.

Additional info:
1. With vhost-user 1 queue, this issue can not be reproduced.
2. With dpdk-18.11.2-4.el8.x86_64, this issue cannot be reproduced.

Comment 1 Adrián Moreno 2020-01-21 16:55:43 UTC
Hi Pei,

When you say 2 queues you mean multiqueue in one vhost device (so "queues=2"), right? The issue has nothing to do with having two vhost devices right?

I have a feeling this is also one of the issues that triggers BZ 1788415. Anyhow, it's good that we have a separate BZ for it.

My bet is that the issue starts occurring when the following commit was introduced in DPDK:

commit 761d57651c51365354cefb624883fccf62aee67d
Author: Tiwei Bie <tiwei.bie>
Date:   Thu Sep 5 19:01:25 2019 +0800

    vhost: fix slave request fd leak
    
    We need to close the old slave request fd if any first
    before taking the new one.
    
    Fixes: 275c3f944730 ("vhost: support slave requests channel")
    Cc: stable
    
    Signed-off-by: Tiwei Bie <tiwei.bie>
    Reviewed-by: Maxime Coquelin <maxime.coquelin>

But that is not the issue. The problem is in qemu. When multiqueue is enabled, qemu will open one slave socket per queue-pair. When that happens, DPDK will close the first socket (what would generate the errors we see). Qemu should only open one slave channel on the firt vqueue pair.

Comment 2 Adrián Moreno 2020-01-21 21:51:52 UTC
I've posted a patch upstream that should fix this: https://patchwork.ozlabs.org/patch/1226778/

Comment 3 Pei Zhang 2020-01-22 02:52:17 UTC
(In reply to Adrián Moreno from comment #1)
> Hi Pei,
> 
> When you say 2 queues you mean multiqueue in one vhost device (so
> "queues=2"), right? The issue has nothing to do with having two vhost
> devices right?

Hi Adrian,

That's right, 2 queues means "queues=2", it's multiqueue in one vhost device.

> 
> I have a feeling this is also one of the issues that triggers BZ 1788415.
> Anyhow, it's good that we have a separate BZ for it.

I can re-test BZ 1788415 with this bug fix and verify if it can fix it.

Thank you.

Best regards,

Pei

Comment 4 Ademar Reis 2020-02-05 23:13:31 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 8 Pei Zhang 2020-06-22 06:06:49 UTC
Verified with qemu-kvm-5.0.0-0.module+el8.3.0+6620+5d5e1420.x86_64:

No errors shows in vhost-user 2 queues test scenarios.

Testcase: live_migration_nonrt_server_2Q_1G_iommu_ovs
=======================Stream Rate: 1Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss
0 1Mpps 231 19434 0 541974
1 1Mpps 246 18475 0 556005
2 1Mpps 251 17997 0 572175
3 1Mpps 224 17850 0 515903
Max 1Mpps 251 19434 0 572175
Min 1Mpps 224 17850 0 515903
Mean 1Mpps 238 18439 0 546514
Median 1Mpps 238 18236 0 548989
Stdev 0 12.61 714.98 0.0 23848.21


Testcase: live_migration_nonrt_server_2Q_1G_iommu_cross_numa_pvp
No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss
0 1Mpps 233 18876 0 529292
1 1Mpps 218 18484 0 503991
2 1Mpps 259 19135 0 584834
3 1Mpps 197 18441 0 455816
Max 1Mpps 259 19135 0 584834
Min 1Mpps 197 18441 0 455816
Mean 1Mpps 226 18734 0 518483
Median 1Mpps 225 18680 0 516641
Stdev 0 26.1 331.32 0.0 53716.73

Testcase: nfv_acceptance_nonrt_server_2Q_1G_iommu
Packets_loss Frame_Size Run_No Throughput Avg_Throughput
0 64 0 21.307340 21.30734

Testcase: pvp_performance_nonrt_server_2Q_iommu
Packets_loss Frame_Size Run_No Throughput Avg_Throughput
0 64 0 20.833873 20.833863
0 64 1 20.833853 20.833863

==Testing details info==
Testcase: live_migration_nonrt_server_2Q_1G_iommu_ovs
PASS

Testcase: live_migration_nonrt_server_2Q_1G_iommu_cross_numa_pvp
PASS

Testcase: nfv_acceptance_nonrt_server_2Q_1G_iommu
PASS

Testcase: pvp_performance_nonrt_server_2Q_iommu
PASS


So this bug has been fixed well. Will move to 'VERIFIED' once on_qa.

Comment 11 Pei Zhang 2020-07-02 09:41:40 UTC
Move to Verified as Comment 8.

Comment 14 errata-xmlrpc 2020-11-17 17:46:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5137


Note You need to log in before you can comment on or make changes to this bug.