Bug 2155173

Summary: [vhost-user] unable to start vhost net: 71: falling back on userspace
Product: Red Hat Enterprise Linux 9 Reporter: Yanghang Liu <yanghliu>
Component: qemu-kvmAssignee: Laurent Vivier <lvivier>
qemu-kvm sub component: Networking QA Contact: Yanghang Liu <yanghliu>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: chayang, coli, gkurz, hewang, jinzhao, juzhang, lvivier, maxime.coquelin, mhou, mst, tli, virt-maint, yanghliu
Version: 9.2Keywords: Regression, TestBlocker, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: qemu-kvm-7.2.0-8.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-09 07:20:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yanghang Liu 2022-12-20 10:01:25 UTC
Description of problem:
The domain with vhost-user interface sometimes hang when starting.
And The qemu-kvm will always throws the error info.

Version-Release number of selected component (if applicable):
host:
           5.14.0-212.el9.x86_64
           qemu-kvm-7.2.0-1.el9.x86_64
           tuned-2.19.0-1.el9.noarch
           libvirt-8.10.0-2.el9.x86_64
           python3-libvirt-8.10.0-1.el9.x86_64
           openvswitch2.17-2.17.0-58.el9fdp.x86_64
           dpdk-21.11.2-1.el9_1.x86_64
           edk2-ovmf-20220826gitba0e0e4c6a-2.el9.noarch
           seabios-bin-1.16.1-1.el9.noarch

How reproducible:
100%

Steps to Reproduce:
1. setup the nfv environment
# echo "isolated_cores=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,31,29,27,25,23,21,19,17,15,13,11"  >> /etc/tuned/cpu-partitioning-variables.conf  
# tuned-adm profile cpu-partitioning
# grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G pci=realloc" --update-kernel=`grubby --default-kernel` 
# reboot
# cat /proc/cmdline
...
skew_tick=1 nohz=on nohz_full=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,31,29,27,25,23,21,19,17,15,13,11 rcu_nocbs=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,31,29,27,25,23,21,19,17,15,13,11 tuned.non_isolcpus=ffffffff,000002ab intel_pstate=disable nosoftlockup iommu=pt intel_iommu=on default_hugepagesz=1G pci=realloc
# echo 20 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
# echo 20 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages

2. start a dpdk-testpmd or openvswitch 

3. start a domain with vhost-user interface(s)
/usr/libexec/qemu-kvm \
-name guest=rhel9.2,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-rhel9.2/master-key.aes"}' \
-blockdev '{"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/rhel9.2_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine pc-q35-rhel9.2.0,usb=off,vmport=off,smm=on,kernel_irqchip=split,dump-guest-core=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \
-accel kvm \
-cpu Skylake-Server-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rsba=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,tsc-deadline=on,pmu=off \
-global driver=cfi.pflash01,property=secure,value=on \
-m 8192 \
-overcommit mem-lock=on \
-smp 6,sockets=3,dies=1,cores=1,threads=2 \
-object '{"qom-type":"thread-context","id":"tc-ram-node0","node-affinity":[0]}' \
-object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/dev/hugepages/libvirt/qemu/1-rhel9.2","share":true,"prealloc":true,"size":8589934592,"host-nodes":[0],"policy":"bind","prealloc-context":"tc-ram-node0"}' \
-numa node,nodeid=0,cpus=0-5,memdev=ram-node0 \
-uuid dc791d54-929d-11ed-bc6b-20040fec000c \
-display none \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=22,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-boot strict=on \
-device '{"driver":"intel-iommu","id":"iommu0","intremap":"on","caching-mode":true,"device-iotlb":true}' \
-device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \
-device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \
-device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' \
-device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' \
-device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' \
-device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \
-device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' \
-device '{"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"}' \
-blockdev '{"driver":"file","filename":"/mnt/nfv//rhel9.2.qcow2","aio":"threads","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device '{"driver":"virtio-blk-pci","iommu_platform":true,"ats":true,"bus":"pci.2","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1,"write-cache":"on"}' \
-netdev '{"type":"tap","fd":"23","vhost":true,"vhostfd":"25","id":"hostnet0"}' \
-device '{"driver":"virtio-net-pci","iommu_platform":true,"ats":true,"netdev":"hostnet0","id":"net0","mac":"28:66:da:5f:dd:01","bus":"pci.1","addr":"0x0"}' \
-chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server=on \
-netdev '{"type":"vhost-user","chardev":"charnet1","queues":2,"id":"hostnet1"}' \
-device '{"driver":"virtio-net-pci","iommu_platform":true,"ats":true,"mq":true,"vectors":6,"rx_queue_size":1024,"netdev":"hostnet1","id":"net1","mac":"18:66:da:5f:dd:02","bus":"pci.6","addr":"0x0"}' \
-chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server=on \
-netdev '{"type":"vhost-user","chardev":"charnet2","queues":2,"id":"hostnet2"}' \
-device '{"driver":"virtio-net-pci","iommu_platform":true,"ats":true,"mq":true,"vectors":6,"rx_queue_size":1024,"netdev":"hostnet2","id":"net2","mac":"18:66:da:5f:dd:03","bus":"pci.7","addr":"0x0"}' \
-chardev socket,id=charnet3,path=/tmp/vhostuser2.sock,server=on \
-netdev '{"type":"vhost-user","chardev":"charnet3","id":"hostnet3"}' \
-device '{"driver":"virtio-net-pci","iommu_platform":true,"ats":true,"rx_queue_size":1024,"netdev":"hostnet3","id":"net3","mac":"18:66:da:5f:dd:04","bus":"pci.8","addr":"0x0"}' \
-chardev pty,id=charserial0 \
-device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' \
-chardev socket,id=chrtpm,path=/run/libvirt/qemu/swtpm/1-rhel9.2-swtpm.sock \
-tpmdev emulator,id=tpm-tpm0,chardev=chrtpm \
-device '{"driver":"tpm-crb","tpmdev":"tpm-tpm0","id":"tpm0"}' \
-audiodev '{"id":"audio1","driver":"none"}' \
-device '{"driver":"virtio-balloon-pci","iommu_platform":true,"ats":true,"id":"balloon0","bus":"pci.4","addr":"0x0"}' \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on



Actual results:
The qemu-kvm throws the following error:
2022-12-20T09:29:51.849295Z qemu-kvm: Received unexpected msg type. Expected 22 received 40
2022-12-20T09:29:51.849321Z qemu-kvm: Fail to update device iotlb
2022-12-20T09:29:51.849499Z qemu-kvm: Received unexpected msg type. Expected 40 received 22
2022-12-20T09:29:51.849598Z qemu-kvm: Received unexpected msg type. Expected 22 received 11
2022-12-20T09:29:51.849611Z qemu-kvm: Fail to update device iotlb
2022-12-20T09:29:51.849662Z qemu-kvm: Received unexpected msg type. Expected 11 received 22
2022-12-20T09:29:51.849677Z qemu-kvm: vhost VQ 1 ring restore failed: -71: Protocol error (71)
2022-12-20T09:29:51.849781Z qemu-kvm: Received unexpected msg type. Expected 22 received 11
2022-12-20T09:29:51.849794Z qemu-kvm: Fail to update device iotlb
2022-12-20T09:29:51.849844Z qemu-kvm: Received unexpected msg type. Expected 11 received 22
2022-12-20T09:29:51.849858Z qemu-kvm: vhost VQ 0 ring restore failed: -71: Protocol error (71)
2022-12-20T09:29:51.853130Z qemu-kvm: unable to start vhost net: 71: falling back on userspace virtio



Expected results:
No any error

Additional info:
(1) this bug can not be reproduced in qemu-kvm-7.1.0-7.el9.x86_64

Comment 1 Laurent Vivier 2023-01-02 15:19:40 UTC
(In reply to Yanghang Liu from comment #0)
...
> The qemu-kvm throws the following error:
> 2022-12-20T09:29:51.849295Z qemu-kvm: Received unexpected msg type. Expected
> 22 received 40

22 is VHOST_USER_IOTLB_MSG
40 is VHOST_USER_GET_STATUS

> 2022-12-20T09:29:51.849321Z qemu-kvm: Fail to update device iotlb
> 2022-12-20T09:29:51.849499Z qemu-kvm: Received unexpected msg type. Expected
> 40 received 22
> 2022-12-20T09:29:51.849598Z qemu-kvm: Received unexpected msg type. Expected
> 22 received 11

11 is VHOST_USER_GET_VRING_BASE

> 2022-12-20T09:29:51.849611Z qemu-kvm: Fail to update device iotlb
> 2022-12-20T09:29:51.849662Z qemu-kvm: Received unexpected msg type. Expected
> 11 received 22

VHOST_USER_GET_STATUS is added by:

923b8921d210 ("vhost-user: Support vhost_dev_start") in QEMU v7.2

but it should not be enabled if dpdk doesn't support VHOST_USER_PROTOCOL_F_STATUS feature.

Maxime, do you have any idea why QEMU and DPDK disagree on the protocol sequence ?

Comment 2 Yanghang Liu 2023-01-04 05:42:47 UTC
QE's nfv auto tests is blocked by this bug currently.

Comment 9 Maxime Coquelin 2023-01-12 09:51:52 UTC
We only have a single thread on DPDK side to handle Vhost-user requests,
it will read a request, handle it and reply to it. Then it reads the
next one, etc... So I don't think it is possible to mix request replies
order on DPDK side.

Maybe there are two threads concurrently sending requests on QEMU side?

What could be done is placing a BP in QEMU when unexpected reply is detected.
When it breaks, dump the backtraces of the different threads to understand why multiple threads are sending requests.

I don't have a setup right now to reproduce quickly this issue, if you have one ready I can help getting the backtraces.

Regards,
Maxime

Comment 10 Laurent Vivier 2023-01-12 15:08:24 UTC
Yanghang,

could you also provide the QEMU command line that is generated by libvirt and the boot parameters of the host kernel?

Thanks

Comment 11 Laurent Vivier 2023-01-12 15:48:24 UTC
(In reply to Maxime Coquelin from comment #9)
> We only have a single thread on DPDK side to handle Vhost-user requests,
> it will read a request, handle it and reply to it. Then it reads the
> next one, etc... So I don't think it is possible to mix request replies
> order on DPDK side.
> 
> Maybe there are two threads concurrently sending requests on QEMU side?
> 

Or perhaps the iotlb function (VHOST_USER_IOTLB_MSG) could be called asynchronously to the vhost_dev_start() function (VHOST_USER_GET_STATUS)?

Michael, any idea?

Thanks

Comment 12 Maxime Coquelin 2023-01-12 16:53:36 UTC
(In reply to Laurent Vivier from comment #11)
> (In reply to Maxime Coquelin from comment #9)
> > We only have a single thread on DPDK side to handle Vhost-user requests,
> > it will read a request, handle it and reply to it. Then it reads the
> > next one, etc... So I don't think it is possible to mix request replies
> > order on DPDK side.
> > 
> > Maybe there are two threads concurrently sending requests on QEMU side?
> > 
> 
> Or perhaps the iotlb function (VHOST_USER_IOTLB_MSG) could be called
> asynchronously to the vhost_dev_start() function (VHOST_USER_GET_STATUS)?
> 
> Michael, any idea?
> 
> Thanks

Yes, I think that's what happens.

DPDK Vhost libray sends an IOTLB miss request (VHOST_USER_SLAVE_IOTLB_MSG)
to QEMU on the slave channel, which results in the thread handling it to send
a VHOST_USER_IOTLB_MSG with resulting IOTLB entry on the master channel.
In parallel, another QEMU thread performs the vhost_dev_start(), resulting in
two threads concurrently reading the master socket to get their replies.

We need a way to synchronize this, likely by introducing a lock to force
synchronization on the socket, between the vhost_user_write() and
process_message_reply() calls.

What do you think?

Comment 22 Maxime Coquelin 2023-01-19 20:01:58 UTC
*** Bug 2160718 has been marked as a duplicate of this bug. ***

Comment 23 Laurent Vivier 2023-01-23 13:02:10 UTC
*** Bug 2162729 has been marked as a duplicate of this bug. ***

Comment 24 Greg Kurz 2023-02-06 08:20:26 UTC
FYI the fix is now merged upstream as https://gitlab.com/qemu-project/qemu/-/commit/f340a59d5a852d75ae34555723694c7e8eafbd0c

commit f340a59d5a852d75ae34555723694c7e8eafbd0c
Author: Greg Kurz <groug>
Date:   Thu Jan 19 18:24:23 2023 +0100

    Revert "vhost-user: Monitor slave channel in vhost_user_read()"
    
    This reverts commit db8a3772e300c1a656331a92da0785d81667dc81.
    
    Motivation : this is breaking vhost-user with DPDK as reported in [0].
    
    Received unexpected msg type. Expected 22 received 40
    Fail to update device iotlb
    Received unexpected msg type. Expected 40 received 22
    Received unexpected msg type. Expected 22 received 11
    Fail to update device iotlb
    Received unexpected msg type. Expected 11 received 22
    vhost VQ 1 ring restore failed: -71: Protocol error (71)
    Received unexpected msg type. Expected 22 received 11
    Fail to update device iotlb
    Received unexpected msg type. Expected 11 received 22
    vhost VQ 0 ring restore failed: -71: Protocol error (71)
    unable to start vhost net: 71: falling back on userspace virtio
    
    The failing sequence that leads to the first error is :
    - QEMU sends a VHOST_USER_GET_STATUS (40) request to DPDK on the master
      socket
    - QEMU starts a nested event loop in order to wait for the
      VHOST_USER_GET_STATUS response and to be able to process messages from
      the slave channel
    - DPDK sends a couple of legitimate IOTLB miss messages on the slave
      channel
    - QEMU processes each IOTLB request and sends VHOST_USER_IOTLB_MSG (22)
      updates on the master socket
    - QEMU assumes to receive a response for the latest VHOST_USER_IOTLB_MSG
      but it gets the response for the VHOST_USER_GET_STATUS instead
    
    The subsequent errors have the same root cause : the nested event loop
    breaks the order by design. It lures QEMU to expect responses to the
    latest message sent on the master socket to arrive first.
    
    Since this was only needed for DAX enablement which is still not merged
    upstream, just drop the code for now. A working solution will have to
    be merged later on. Likely protect the master socket with a mutex
    and service the slave channel with a separate thread, as discussed with
    Maxime in the mail thread below.
    
    [0] https://lore.kernel.org/qemu-devel/43145ede-89dc-280e-b953-6a2b436de395@redhat.com/
    
    Reported-by: Yanghang Liu <yanghliu>
    Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2155173
    Signed-off-by: Greg Kurz <groug>
    Message-Id: <20230119172424.478268-2-groug>
    Reviewed-by: Michael S. Tsirkin <mst>
    Signed-off-by: Michael S. Tsirkin <mst>
    Acked-by: Stefan Hajnoczi <stefanha>
    Acked-by: Maxime Coquelin <maxime.coquelin>

Comment 25 Laurent Vivier 2023-02-07 13:17:39 UTC
Yanghang, please set ITM. Thanks

Comment 27 Yanghang Liu 2023-02-13 06:46:57 UTC
Test environment:
qemu-kvm-7.2.0-8.el9.x86_64
tuned-2.19.0-1.el9.noarch
libvirt-9.0.0-4.el9.x86_64
python3-libvirt-9.0.0-1.el9.x86_64
openvswitch2.17-2.17.0-63.el9fdp.x86_64
dpdk-21.11.2-1.el9_1.x86_64
edk2-ovmf-20221207gitfff6d81270b5-5.el9.noarch
seabios-bin-1.16.1-1.el9.noarch


Test result : The domain with vhost-user interface(s) can be started without any qemu-kvm error.

Comment 28 Yanan Fu 2023-02-13 08:55:35 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 31 Yanghang Liu 2023-02-14 08:57:46 UTC
Move this bug to VERIFIED based on the comment 27.

If new issue(s) are found during the regression tests, QE will open new bugs to track.

Comment 32 jingzhao 2023-02-21 06:13:00 UTC
*** Bug 2165278 has been marked as a duplicate of this bug. ***

Comment 34 errata-xmlrpc 2023-05-09 07:20:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2162