RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2155173 - [vhost-user] unable to start vhost net: 71: falling back on userspace
Summary: [vhost-user] unable to start vhost net: 71: falling back on userspace
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.2
Hardware: x86_64
OS: All
high
high
Target Milestone: rc
: ---
Assignee: Laurent Vivier
QA Contact: Yanghang Liu
URL:
Whiteboard:
: 2160718 2162729 2165278 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-12-20 10:01 UTC by Yanghang Liu
Modified: 2023-05-09 07:48 UTC (History)
13 users (show)

Fixed In Version: qemu-kvm-7.2.0-8.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-09 07:20:55 UTC
Type: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/centos-stream/src qemu-kvm merge_requests 146 0 None opened Fix vhost-user with dpdk 2023-02-07 13:25:28 UTC
Red Hat Issue Tracker RHELPLAN-142947 0 None None None 2022-12-20 10:01:45 UTC
Red Hat Product Errata RHSA-2023:2162 0 None None None 2023-05-09 07:21:42 UTC

Description Yanghang Liu 2022-12-20 10:01:25 UTC
Description of problem:
The domain with vhost-user interface sometimes hang when starting.
And The qemu-kvm will always throws the error info.

Version-Release number of selected component (if applicable):
host:
           5.14.0-212.el9.x86_64
           qemu-kvm-7.2.0-1.el9.x86_64
           tuned-2.19.0-1.el9.noarch
           libvirt-8.10.0-2.el9.x86_64
           python3-libvirt-8.10.0-1.el9.x86_64
           openvswitch2.17-2.17.0-58.el9fdp.x86_64
           dpdk-21.11.2-1.el9_1.x86_64
           edk2-ovmf-20220826gitba0e0e4c6a-2.el9.noarch
           seabios-bin-1.16.1-1.el9.noarch

How reproducible:
100%

Steps to Reproduce:
1. setup the nfv environment
# echo "isolated_cores=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,31,29,27,25,23,21,19,17,15,13,11"  >> /etc/tuned/cpu-partitioning-variables.conf  
# tuned-adm profile cpu-partitioning
# grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G pci=realloc" --update-kernel=`grubby --default-kernel` 
# reboot
# cat /proc/cmdline
...
skew_tick=1 nohz=on nohz_full=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,31,29,27,25,23,21,19,17,15,13,11 rcu_nocbs=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,31,29,27,25,23,21,19,17,15,13,11 tuned.non_isolcpus=ffffffff,000002ab intel_pstate=disable nosoftlockup iommu=pt intel_iommu=on default_hugepagesz=1G pci=realloc
# echo 20 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
# echo 20 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages

2. start a dpdk-testpmd or openvswitch 

3. start a domain with vhost-user interface(s)
/usr/libexec/qemu-kvm \
-name guest=rhel9.2,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-rhel9.2/master-key.aes"}' \
-blockdev '{"driver":"file","filename":"/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/rhel9.2_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine pc-q35-rhel9.2.0,usb=off,vmport=off,smm=on,kernel_irqchip=split,dump-guest-core=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \
-accel kvm \
-cpu Skylake-Server-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rsba=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,tsc-deadline=on,pmu=off \
-global driver=cfi.pflash01,property=secure,value=on \
-m 8192 \
-overcommit mem-lock=on \
-smp 6,sockets=3,dies=1,cores=1,threads=2 \
-object '{"qom-type":"thread-context","id":"tc-ram-node0","node-affinity":[0]}' \
-object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/dev/hugepages/libvirt/qemu/1-rhel9.2","share":true,"prealloc":true,"size":8589934592,"host-nodes":[0],"policy":"bind","prealloc-context":"tc-ram-node0"}' \
-numa node,nodeid=0,cpus=0-5,memdev=ram-node0 \
-uuid dc791d54-929d-11ed-bc6b-20040fec000c \
-display none \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=22,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-boot strict=on \
-device '{"driver":"intel-iommu","id":"iommu0","intremap":"on","caching-mode":true,"device-iotlb":true}' \
-device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \
-device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \
-device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' \
-device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' \
-device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' \
-device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \
-device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' \
-device '{"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"}' \
-blockdev '{"driver":"file","filename":"/mnt/nfv//rhel9.2.qcow2","aio":"threads","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device '{"driver":"virtio-blk-pci","iommu_platform":true,"ats":true,"bus":"pci.2","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1,"write-cache":"on"}' \
-netdev '{"type":"tap","fd":"23","vhost":true,"vhostfd":"25","id":"hostnet0"}' \
-device '{"driver":"virtio-net-pci","iommu_platform":true,"ats":true,"netdev":"hostnet0","id":"net0","mac":"28:66:da:5f:dd:01","bus":"pci.1","addr":"0x0"}' \
-chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server=on \
-netdev '{"type":"vhost-user","chardev":"charnet1","queues":2,"id":"hostnet1"}' \
-device '{"driver":"virtio-net-pci","iommu_platform":true,"ats":true,"mq":true,"vectors":6,"rx_queue_size":1024,"netdev":"hostnet1","id":"net1","mac":"18:66:da:5f:dd:02","bus":"pci.6","addr":"0x0"}' \
-chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server=on \
-netdev '{"type":"vhost-user","chardev":"charnet2","queues":2,"id":"hostnet2"}' \
-device '{"driver":"virtio-net-pci","iommu_platform":true,"ats":true,"mq":true,"vectors":6,"rx_queue_size":1024,"netdev":"hostnet2","id":"net2","mac":"18:66:da:5f:dd:03","bus":"pci.7","addr":"0x0"}' \
-chardev socket,id=charnet3,path=/tmp/vhostuser2.sock,server=on \
-netdev '{"type":"vhost-user","chardev":"charnet3","id":"hostnet3"}' \
-device '{"driver":"virtio-net-pci","iommu_platform":true,"ats":true,"rx_queue_size":1024,"netdev":"hostnet3","id":"net3","mac":"18:66:da:5f:dd:04","bus":"pci.8","addr":"0x0"}' \
-chardev pty,id=charserial0 \
-device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' \
-chardev socket,id=chrtpm,path=/run/libvirt/qemu/swtpm/1-rhel9.2-swtpm.sock \
-tpmdev emulator,id=tpm-tpm0,chardev=chrtpm \
-device '{"driver":"tpm-crb","tpmdev":"tpm-tpm0","id":"tpm0"}' \
-audiodev '{"id":"audio1","driver":"none"}' \
-device '{"driver":"virtio-balloon-pci","iommu_platform":true,"ats":true,"id":"balloon0","bus":"pci.4","addr":"0x0"}' \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on



Actual results:
The qemu-kvm throws the following error:
2022-12-20T09:29:51.849295Z qemu-kvm: Received unexpected msg type. Expected 22 received 40
2022-12-20T09:29:51.849321Z qemu-kvm: Fail to update device iotlb
2022-12-20T09:29:51.849499Z qemu-kvm: Received unexpected msg type. Expected 40 received 22
2022-12-20T09:29:51.849598Z qemu-kvm: Received unexpected msg type. Expected 22 received 11
2022-12-20T09:29:51.849611Z qemu-kvm: Fail to update device iotlb
2022-12-20T09:29:51.849662Z qemu-kvm: Received unexpected msg type. Expected 11 received 22
2022-12-20T09:29:51.849677Z qemu-kvm: vhost VQ 1 ring restore failed: -71: Protocol error (71)
2022-12-20T09:29:51.849781Z qemu-kvm: Received unexpected msg type. Expected 22 received 11
2022-12-20T09:29:51.849794Z qemu-kvm: Fail to update device iotlb
2022-12-20T09:29:51.849844Z qemu-kvm: Received unexpected msg type. Expected 11 received 22
2022-12-20T09:29:51.849858Z qemu-kvm: vhost VQ 0 ring restore failed: -71: Protocol error (71)
2022-12-20T09:29:51.853130Z qemu-kvm: unable to start vhost net: 71: falling back on userspace virtio



Expected results:
No any error

Additional info:
(1) this bug can not be reproduced in qemu-kvm-7.1.0-7.el9.x86_64

Comment 1 Laurent Vivier 2023-01-02 15:19:40 UTC
(In reply to Yanghang Liu from comment #0)
...
> The qemu-kvm throws the following error:
> 2022-12-20T09:29:51.849295Z qemu-kvm: Received unexpected msg type. Expected
> 22 received 40

22 is VHOST_USER_IOTLB_MSG
40 is VHOST_USER_GET_STATUS

> 2022-12-20T09:29:51.849321Z qemu-kvm: Fail to update device iotlb
> 2022-12-20T09:29:51.849499Z qemu-kvm: Received unexpected msg type. Expected
> 40 received 22
> 2022-12-20T09:29:51.849598Z qemu-kvm: Received unexpected msg type. Expected
> 22 received 11

11 is VHOST_USER_GET_VRING_BASE

> 2022-12-20T09:29:51.849611Z qemu-kvm: Fail to update device iotlb
> 2022-12-20T09:29:51.849662Z qemu-kvm: Received unexpected msg type. Expected
> 11 received 22

VHOST_USER_GET_STATUS is added by:

923b8921d210 ("vhost-user: Support vhost_dev_start") in QEMU v7.2

but it should not be enabled if dpdk doesn't support VHOST_USER_PROTOCOL_F_STATUS feature.

Maxime, do you have any idea why QEMU and DPDK disagree on the protocol sequence ?

Comment 2 Yanghang Liu 2023-01-04 05:42:47 UTC
QE's nfv auto tests is blocked by this bug currently.

Comment 9 Maxime Coquelin 2023-01-12 09:51:52 UTC
We only have a single thread on DPDK side to handle Vhost-user requests,
it will read a request, handle it and reply to it. Then it reads the
next one, etc... So I don't think it is possible to mix request replies
order on DPDK side.

Maybe there are two threads concurrently sending requests on QEMU side?

What could be done is placing a BP in QEMU when unexpected reply is detected.
When it breaks, dump the backtraces of the different threads to understand why multiple threads are sending requests.

I don't have a setup right now to reproduce quickly this issue, if you have one ready I can help getting the backtraces.

Regards,
Maxime

Comment 10 Laurent Vivier 2023-01-12 15:08:24 UTC
Yanghang,

could you also provide the QEMU command line that is generated by libvirt and the boot parameters of the host kernel?

Thanks

Comment 11 Laurent Vivier 2023-01-12 15:48:24 UTC
(In reply to Maxime Coquelin from comment #9)
> We only have a single thread on DPDK side to handle Vhost-user requests,
> it will read a request, handle it and reply to it. Then it reads the
> next one, etc... So I don't think it is possible to mix request replies
> order on DPDK side.
> 
> Maybe there are two threads concurrently sending requests on QEMU side?
> 

Or perhaps the iotlb function (VHOST_USER_IOTLB_MSG) could be called asynchronously to the vhost_dev_start() function (VHOST_USER_GET_STATUS)?

Michael, any idea?

Thanks

Comment 12 Maxime Coquelin 2023-01-12 16:53:36 UTC
(In reply to Laurent Vivier from comment #11)
> (In reply to Maxime Coquelin from comment #9)
> > We only have a single thread on DPDK side to handle Vhost-user requests,
> > it will read a request, handle it and reply to it. Then it reads the
> > next one, etc... So I don't think it is possible to mix request replies
> > order on DPDK side.
> > 
> > Maybe there are two threads concurrently sending requests on QEMU side?
> > 
> 
> Or perhaps the iotlb function (VHOST_USER_IOTLB_MSG) could be called
> asynchronously to the vhost_dev_start() function (VHOST_USER_GET_STATUS)?
> 
> Michael, any idea?
> 
> Thanks

Yes, I think that's what happens.

DPDK Vhost libray sends an IOTLB miss request (VHOST_USER_SLAVE_IOTLB_MSG)
to QEMU on the slave channel, which results in the thread handling it to send
a VHOST_USER_IOTLB_MSG with resulting IOTLB entry on the master channel.
In parallel, another QEMU thread performs the vhost_dev_start(), resulting in
two threads concurrently reading the master socket to get their replies.

We need a way to synchronize this, likely by introducing a lock to force
synchronization on the socket, between the vhost_user_write() and
process_message_reply() calls.

What do you think?

Comment 22 Maxime Coquelin 2023-01-19 20:01:58 UTC
*** Bug 2160718 has been marked as a duplicate of this bug. ***

Comment 23 Laurent Vivier 2023-01-23 13:02:10 UTC
*** Bug 2162729 has been marked as a duplicate of this bug. ***

Comment 24 Greg Kurz 2023-02-06 08:20:26 UTC
FYI the fix is now merged upstream as https://gitlab.com/qemu-project/qemu/-/commit/f340a59d5a852d75ae34555723694c7e8eafbd0c

commit f340a59d5a852d75ae34555723694c7e8eafbd0c
Author: Greg Kurz <groug>
Date:   Thu Jan 19 18:24:23 2023 +0100

    Revert "vhost-user: Monitor slave channel in vhost_user_read()"
    
    This reverts commit db8a3772e300c1a656331a92da0785d81667dc81.
    
    Motivation : this is breaking vhost-user with DPDK as reported in [0].
    
    Received unexpected msg type. Expected 22 received 40
    Fail to update device iotlb
    Received unexpected msg type. Expected 40 received 22
    Received unexpected msg type. Expected 22 received 11
    Fail to update device iotlb
    Received unexpected msg type. Expected 11 received 22
    vhost VQ 1 ring restore failed: -71: Protocol error (71)
    Received unexpected msg type. Expected 22 received 11
    Fail to update device iotlb
    Received unexpected msg type. Expected 11 received 22
    vhost VQ 0 ring restore failed: -71: Protocol error (71)
    unable to start vhost net: 71: falling back on userspace virtio
    
    The failing sequence that leads to the first error is :
    - QEMU sends a VHOST_USER_GET_STATUS (40) request to DPDK on the master
      socket
    - QEMU starts a nested event loop in order to wait for the
      VHOST_USER_GET_STATUS response and to be able to process messages from
      the slave channel
    - DPDK sends a couple of legitimate IOTLB miss messages on the slave
      channel
    - QEMU processes each IOTLB request and sends VHOST_USER_IOTLB_MSG (22)
      updates on the master socket
    - QEMU assumes to receive a response for the latest VHOST_USER_IOTLB_MSG
      but it gets the response for the VHOST_USER_GET_STATUS instead
    
    The subsequent errors have the same root cause : the nested event loop
    breaks the order by design. It lures QEMU to expect responses to the
    latest message sent on the master socket to arrive first.
    
    Since this was only needed for DAX enablement which is still not merged
    upstream, just drop the code for now. A working solution will have to
    be merged later on. Likely protect the master socket with a mutex
    and service the slave channel with a separate thread, as discussed with
    Maxime in the mail thread below.
    
    [0] https://lore.kernel.org/qemu-devel/43145ede-89dc-280e-b953-6a2b436de395@redhat.com/
    
    Reported-by: Yanghang Liu <yanghliu>
    Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2155173
    Signed-off-by: Greg Kurz <groug>
    Message-Id: <20230119172424.478268-2-groug>
    Reviewed-by: Michael S. Tsirkin <mst>
    Signed-off-by: Michael S. Tsirkin <mst>
    Acked-by: Stefan Hajnoczi <stefanha>
    Acked-by: Maxime Coquelin <maxime.coquelin>

Comment 25 Laurent Vivier 2023-02-07 13:17:39 UTC
Yanghang, please set ITM. Thanks

Comment 27 Yanghang Liu 2023-02-13 06:46:57 UTC
Test environment:
qemu-kvm-7.2.0-8.el9.x86_64
tuned-2.19.0-1.el9.noarch
libvirt-9.0.0-4.el9.x86_64
python3-libvirt-9.0.0-1.el9.x86_64
openvswitch2.17-2.17.0-63.el9fdp.x86_64
dpdk-21.11.2-1.el9_1.x86_64
edk2-ovmf-20221207gitfff6d81270b5-5.el9.noarch
seabios-bin-1.16.1-1.el9.noarch


Test result : The domain with vhost-user interface(s) can be started without any qemu-kvm error.

Comment 28 Yanan Fu 2023-02-13 08:55:35 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 31 Yanghang Liu 2023-02-14 08:57:46 UTC
Move this bug to VERIFIED based on the comment 27.

If new issue(s) are found during the regression tests, QE will open new bugs to track.

Comment 32 jingzhao 2023-02-21 06:13:00 UTC
*** Bug 2165278 has been marked as a duplicate of this bug. ***

Comment 34 errata-xmlrpc 2023-05-09 07:20:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2162


Note You need to log in before you can comment on or make changes to this bug.