RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2069038 - [vDPA+DPDK]Starting testpmd in VM will cause qemu crash when vdap tool queues and VM queues mismatch
Summary: [vDPA+DPDK]Starting testpmd in VM will cause qemu crash when vdap tool queues...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Laurent Vivier
QA Contact: Pei Zhang
URL:
Whiteboard:
Depends On: 2070804
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-28 05:30 UTC by Pei Zhang
Modified: 2022-07-01 08:39 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-27 06:40:05 UTC
Type: ---
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-116918 0 None None None 2022-03-28 05:54:29 UTC

Description Pei Zhang 2022-03-28 05:30:25 UTC
Description of problem:
Setup vdpa queues as 2, then boot VM with single queue, starting testpmd in VM will cause qemu crash.

Version-Release number of selected component (if applicable):
5.14.0-70.5.1.el9_0.x86_64
qemu-kvm-6.2.0-11.el9_0.1.x86_64
libvirt-8.0.0-7.el9_0.x86_64
iproute-5.15.0-2.2.el9_0.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Setup vdpa with max_vqp 2
# modprobe vhost_vdpa
# modprobe mlx5_vdpa

# echo 0 > /sys/bus/pci/devices/0000\:3b\:00.0/sriov_numvfs
# readlink /sys/bus/pci/devices/0000:3b:00.0/virtfn*
# devlink dev eswitch set pci/0000:3b:00.0 mode switchdev
# echo 1 > /sys/bus/pci/devices/0000\:3b\:00.0/sriov_numvfs
# readlink /sys/bus/pci/devices/0000:3b:00.0/virtfn*
# echo 0000:3b:00.2 >/sys/bus/pci/drivers/mlx5_core/unbind
# devlink dev eswitch set pci/0000:3b:00.0 mode switchdev
# echo 0000:3b:00.2 >/sys/bus/pci/drivers/mlx5_core/bind

# echo 0 > /sys/bus/pci/devices/0000\:3b\:00.1/sriov_numvfs
# readlink /sys/bus/pci/devices/0000:3b:00.1/virtfn*
# devlink dev eswitch set pci/0000:3b:00.1 mode switchdev
# echo 1 > /sys/bus/pci/devices/0000\:3b\:00.1/sriov_numvfs
# readlink /sys/bus/pci/devices/0000:3b:00.1/virtfn*
# echo 0000:3b:01.2 >/sys/bus/pci/drivers/mlx5_core/unbind
# devlink dev eswitch set pci/0000:3b:00.1 mode switchdev
# echo 0000:3b:01.2 >/sys/bus/pci/drivers/mlx5_core/bind

# vdpa mgmtdev show | grep pci

# vdpa dev add name vdpa0 mgmtdev pci/0000:3b:00.2 mac 00:11:22:33:44:03 max_vqp 2
# vdpa dev add name vdpa1 mgmtdev pci/0000:3b:01.2 mac 00:11:22:33:44:04 max_vqp 2

# systemctl stop openvswitch
# systemctl start openvswitch
# /usr/bin/python3 /home/nfv-virt-rt-kvm/tools/dpdk-devbind.py --status
# /usr/bin/ovs-vsctl --if-exists del-br ovsbr0
# /usr/bin/ovs-vsctl add-br ovsbr0
# /usr/bin/ovs-vsctl add-port ovsbr0 eth0
# /usr/bin/ovs-vsctl add-port ovsbr0 enp59s0f0np0
# /usr/bin/python3 /home/nfv-virt-rt-kvm/tools/dpdk-devbind.py --status
# /usr/bin/ovs-vsctl --if-exists del-br ovsbr1
# /usr/bin/ovs-vsctl add-br ovsbr1
# /usr/bin/ovs-vsctl add-port ovsbr1 eth1
# /usr/bin/ovs-vsctl add-port ovsbr1 enp59s0f1np1
 
2. Boot VM with vdpa single queue.
    <interface type='vdpa'>
      <mac address='00:11:22:33:44:03'/>
      <source dev='/dev/vhost-vdpa-0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </interface>
    <interface type='vdpa'>
      <mac address='00:11:22:33:44:04'/>
      <source dev='/dev/vhost-vdpa-1'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </interface>


3. In VM, start testpmd. qemu crash.
# modprobe vfio enable_unsafe_noiommu_mode=Y
# modprobe vfio-pci

# dpdk-devbind.py --bind=vfio-pci 0000:06:00.0
# dpdk-devbind.py --bind=vfio-pci 0000:07:00.0
# dpdk-testpmd \
	-l 1,2,3 \
	-n 4  \
	-d /usr/lib64/librte_net_virtio.so  \
	-- \
	--nb-cores=2 \
	-i \
	--disable-rss \
	--rxd=512 --txd=512 \
	--rxq=1 --txq=1

# virsh list --all
 Id   Name      State
--------------------------
 -    rhel9.0   shut off

# cat /var/log/libvirt/qemu/rhel9.0.log
....
qemu-kvm: ../hw/virtio/vhost-vdpa.c:561: int vhost_vdpa_get_vq_index(struct vhost_dev *, int): Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.
2022-03-28 05:16:59.855+0000: shutting down, reason=crashed


Actual results:
qemu crash

Expected results:
qemu should not crash.

Comment 1 Pei Zhang 2022-03-28 05:31:28 UTC
Additional info:
1. I would think this is a negative case, the vdap tool queues setting and the VM vdpa queues don't match, this is the root cause of the qemu crash. However it's better to fix it as qemu should not crash. Set medium priority, feel free to correct me if you disagree.

2. This issue can also be triggered by qemu layer.

# ulimit -l unlimited
# /usr/libexec/qemu-kvm \
-name guest=rhel9.0,debug-threads=on \
-machine pc-q35-rhel9.0.0,usb=off,vmport=off,dump-guest-core=off,kernel_irqchip=split \
-accel kvm \
-cpu Skylake-Server-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rsba=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,tsc-deadline=on,pmu=off \
-m 8192 \
-overcommit mem-lock=on \
-smp 6,sockets=3,dies=1,cores=1,threads=2 \
-uuid 91e86dae-adf5-11ec-b911-20040fec000c \
-no-user-config \
-nodefaults \
-no-hpet \
-no-shutdown \
-boot strict=on \
-device pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=18,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=19,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=20,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=21,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=22,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
-blockdev '{"driver":"file","filename":"/home/images_nfv-virt-rt-kvm/rhel9.0.qcow2","aio":"threads","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device virtio-blk-pci,bus=pci.2,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=88:66:da:5f:dd:11,bus=pci.1,addr=0x0 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-monitor stdio \
-qmp tcp:0:5555,server,nowait \
-device VGA,bus=pci.4 \
-vnc :0  \
-device virtio-net-pci,mac=00:11:22:33:44:03,id=hostnet1,netdev=net1,bus=pci.6,addr=0x0  \
-netdev vhost-vdpa,id=net1,vhostdev=/dev/vhost-vdpa-0 \
-device virtio-net-pci,mac=00:11:22:33:44:04,id=hostnet2,netdev=net2,bus=pci.7,addr=0x0  \
-netdev vhost-vdpa,id=net2,vhostdev=/dev/vhost-vdpa-1 \

(qemu) qemu-kvm: vhost VQ 2 ring restore failed: -22: Invalid argument (22)
qemu-kvm: vhost VQ 2 ring restore failed: -22: Invalid argument (22)
qemu-kvm: ../hw/virtio/vhost-vdpa.c:561: int vhost_vdpa_get_vq_index(struct vhost_dev *, int): Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.
qemu_1q.sh: line 38: 45819 Aborted                 (core dumped) 

3. We found this issue when testing vdpa 2 queues hot plug/unplug testing with libvirt. Libvirt Bug 2068999 can trigger this issue.

Comment 3 jason wang 2022-03-28 06:19:35 UTC
I think we can fix this by failing the vhost-vDPA initialization if the number doesn't match.

Thanks

Comment 4 Laurent Vivier 2022-03-28 08:56:44 UTC
It looks like to be the same problem as described in BZ 2048060 for RHEL 8.6

The problem is triggered by a kernel change, and the crash is an assert to report the kernel problem.

Comment 5 lulu@redhat.com 2022-03-29 07:50:50 UTC
(In reply to Laurent Vivier from comment #4)
> It looks like to be the same problem as described in BZ 2048060 for RHEL 8.6
> 
> The problem is triggered by a kernel change, and the crash is an assert to
> report the kernel problem.

hi, Laurent, I don't think this is the same issue
the root cause for BZ 2048060 is the  not match version for mlx driver. 
After the MR https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/1974 
https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/1567
merged the BZ 2048060 not reproduces anymore

Thanks
Cindy

Comment 6 Laurent Vivier 2022-03-31 06:27:08 UTC
I think this series could fix the problem:

    [PATCH 0/7] vhost-vdpa multiqueue fixes
    https://patchew.org/QEMU/1648621997-22416-1-git-send-email-si-wei.liu@oracle.com/

I'm going to build a package to be tested.

Comment 7 Pei Zhang 2022-03-31 07:57:00 UTC
(In reply to Laurent Vivier from comment #6)
> I think this series could fix the problem:
> 
>     [PATCH 0/7] vhost-vdpa multiqueue fixes
>    
> https://patchew.org/QEMU/1648621997-22416-1-git-send-email-si-wei.liu@oracle.
> com/
> 
> I'm going to build a package to be tested.

Thank you Laurent for the patch info. I can test after you provide the qemu scratch build.

Best regards,

Pei

Comment 11 Laurent Vivier 2022-05-09 13:40:50 UTC
(In reply to Laurent Vivier from comment #6)
> I think this series could fix the problem:
> 
>     [PATCH 0/7] vhost-vdpa multiqueue fixes
>    
> https://patchew.org/QEMU/1648621997-22416-1-git-send-email-si-wei.liu@oracle.
> com/
> 


Upstream series has been update to:

[PATCH v4 0/7] vhost-vdpa multiqueue fixes
https://patchew.org/QEMU/1651890498-24478-1-git-send-email-si-wei.liu@oracle.com/

Comment 14 Laurent Vivier 2022-05-30 14:15:28 UTC
(In reply to Laurent Vivier from comment #11)
> (In reply to Laurent Vivier from comment #6)
> > I think this series could fix the problem:
> > 
> >     [PATCH 0/7] vhost-vdpa multiqueue fixes
> >    
> > https://patchew.org/QEMU/1648621997-22416-1-git-send-email-si-wei.liu@oracle.
> > com/
> > 
> 
> 
> Upstream series has been update to:
> 
> [PATCH v4 0/7] vhost-vdpa multiqueue fixes
> https://patchew.org/QEMU/1651890498-24478-1-git-send-email-si-wei.liu@oracle.
> com/

This series has been submitted by Jason to be merged in RHEL 9.1.0 to fix BZ 2070804.

https://gitlab.com/redhat/rhel/src/qemu-kvm/qemu-kvm/-/merge_requests/186

So once merged, we can verify it also fixes this BZ.

Comment 16 Laurent Vivier 2022-06-21 15:24:50 UTC
Pei,

Could you re-test with qemu-kvm-7.0.0-6.el9 that includes fix for bug 2070804 ?

Thanks

Comment 17 Pei Zhang 2022-06-27 03:06:19 UTC
(In reply to Laurent Vivier from comment #16)
> Pei,
> 
> Could you re-test with qemu-kvm-7.0.0-6.el9 that includes fix for bug
> 2070804 ?
> 
> Thanks

Hello Laurent,

This issue cannot be reproduced with qemu-kvm-7.0.0-6.el9.x86_64 any more. No qemu crash after following steps in Description.

Can we close this bug as CurrentRelease? 

Thanks.

Best regards,

Pei


Note You need to log in before you can comment on or make changes to this bug.