Bug 2069038
| Summary: | [vDPA+DPDK]Starting testpmd in VM will cause qemu crash when vdap tool queues and VM queues mismatch | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Pei Zhang <pezhang> |
| Component: | qemu-kvm | Assignee: | Laurent Vivier <lvivier> |
| qemu-kvm sub component: | Networking | QA Contact: | Pei Zhang <pezhang> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | aadam, chayang, coli, eperezma, jasowang, jinzhao, juzhang, leiyang, lulu, lvivier, virt-maint, wquan, yanghliu |
| Version: | 9.0 | Keywords: | Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-06-27 06:40:05 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2070804 | ||
| Bug Blocks: | |||
Additional info:
1. I would think this is a negative case, the vdap tool queues setting and the VM vdpa queues don't match, this is the root cause of the qemu crash. However it's better to fix it as qemu should not crash. Set medium priority, feel free to correct me if you disagree.
2. This issue can also be triggered by qemu layer.
# ulimit -l unlimited
# /usr/libexec/qemu-kvm \
-name guest=rhel9.0,debug-threads=on \
-machine pc-q35-rhel9.0.0,usb=off,vmport=off,dump-guest-core=off,kernel_irqchip=split \
-accel kvm \
-cpu Skylake-Server-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rsba=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,tsc-deadline=on,pmu=off \
-m 8192 \
-overcommit mem-lock=on \
-smp 6,sockets=3,dies=1,cores=1,threads=2 \
-uuid 91e86dae-adf5-11ec-b911-20040fec000c \
-no-user-config \
-nodefaults \
-no-hpet \
-no-shutdown \
-boot strict=on \
-device pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=18,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=19,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=20,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=21,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=22,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
-blockdev '{"driver":"file","filename":"/home/images_nfv-virt-rt-kvm/rhel9.0.qcow2","aio":"threads","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device virtio-blk-pci,bus=pci.2,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=88:66:da:5f:dd:11,bus=pci.1,addr=0x0 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-monitor stdio \
-qmp tcp:0:5555,server,nowait \
-device VGA,bus=pci.4 \
-vnc :0 \
-device virtio-net-pci,mac=00:11:22:33:44:03,id=hostnet1,netdev=net1,bus=pci.6,addr=0x0 \
-netdev vhost-vdpa,id=net1,vhostdev=/dev/vhost-vdpa-0 \
-device virtio-net-pci,mac=00:11:22:33:44:04,id=hostnet2,netdev=net2,bus=pci.7,addr=0x0 \
-netdev vhost-vdpa,id=net2,vhostdev=/dev/vhost-vdpa-1 \
(qemu) qemu-kvm: vhost VQ 2 ring restore failed: -22: Invalid argument (22)
qemu-kvm: vhost VQ 2 ring restore failed: -22: Invalid argument (22)
qemu-kvm: ../hw/virtio/vhost-vdpa.c:561: int vhost_vdpa_get_vq_index(struct vhost_dev *, int): Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.
qemu_1q.sh: line 38: 45819 Aborted (core dumped)
3. We found this issue when testing vdpa 2 queues hot plug/unplug testing with libvirt. Libvirt Bug 2068999 can trigger this issue.
I think we can fix this by failing the vhost-vDPA initialization if the number doesn't match. Thanks It looks like to be the same problem as described in BZ 2048060 for RHEL 8.6 The problem is triggered by a kernel change, and the crash is an assert to report the kernel problem. (In reply to Laurent Vivier from comment #4) > It looks like to be the same problem as described in BZ 2048060 for RHEL 8.6 > > The problem is triggered by a kernel change, and the crash is an assert to > report the kernel problem. hi, Laurent, I don't think this is the same issue the root cause for BZ 2048060 is the not match version for mlx driver. After the MR https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/1974 https://gitlab.com/redhat/rhel/src/kernel/rhel-8/-/merge_requests/1567 merged the BZ 2048060 not reproduces anymore Thanks Cindy I think this series could fix the problem:
[PATCH 0/7] vhost-vdpa multiqueue fixes
https://patchew.org/QEMU/1648621997-22416-1-git-send-email-si-wei.liu@oracle.com/
I'm going to build a package to be tested.
(In reply to Laurent Vivier from comment #6) > I think this series could fix the problem: > > [PATCH 0/7] vhost-vdpa multiqueue fixes > > https://patchew.org/QEMU/1648621997-22416-1-git-send-email-si-wei.liu@oracle. > com/ > > I'm going to build a package to be tested. Thank you Laurent for the patch info. I can test after you provide the qemu scratch build. Best regards, Pei (In reply to Laurent Vivier from comment #6) > I think this series could fix the problem: > > [PATCH 0/7] vhost-vdpa multiqueue fixes > > https://patchew.org/QEMU/1648621997-22416-1-git-send-email-si-wei.liu@oracle. > com/ > Upstream series has been update to: [PATCH v4 0/7] vhost-vdpa multiqueue fixes https://patchew.org/QEMU/1651890498-24478-1-git-send-email-si-wei.liu@oracle.com/ (In reply to Laurent Vivier from comment #11) > (In reply to Laurent Vivier from comment #6) > > I think this series could fix the problem: > > > > [PATCH 0/7] vhost-vdpa multiqueue fixes > > > > https://patchew.org/QEMU/1648621997-22416-1-git-send-email-si-wei.liu@oracle. > > com/ > > > > > Upstream series has been update to: > > [PATCH v4 0/7] vhost-vdpa multiqueue fixes > https://patchew.org/QEMU/1651890498-24478-1-git-send-email-si-wei.liu@oracle. > com/ This series has been submitted by Jason to be merged in RHEL 9.1.0 to fix BZ 2070804. https://gitlab.com/redhat/rhel/src/qemu-kvm/qemu-kvm/-/merge_requests/186 So once merged, we can verify it also fixes this BZ. Pei, Could you re-test with qemu-kvm-7.0.0-6.el9 that includes fix for bug 2070804 ? Thanks (In reply to Laurent Vivier from comment #16) > Pei, > > Could you re-test with qemu-kvm-7.0.0-6.el9 that includes fix for bug > 2070804 ? > > Thanks Hello Laurent, This issue cannot be reproduced with qemu-kvm-7.0.0-6.el9.x86_64 any more. No qemu crash after following steps in Description. Can we close this bug as CurrentRelease? Thanks. Best regards, Pei |
Description of problem: Setup vdpa queues as 2, then boot VM with single queue, starting testpmd in VM will cause qemu crash. Version-Release number of selected component (if applicable): 5.14.0-70.5.1.el9_0.x86_64 qemu-kvm-6.2.0-11.el9_0.1.x86_64 libvirt-8.0.0-7.el9_0.x86_64 iproute-5.15.0-2.2.el9_0.x86_64 How reproducible: 100% Steps to Reproduce: 1. Setup vdpa with max_vqp 2 # modprobe vhost_vdpa # modprobe mlx5_vdpa # echo 0 > /sys/bus/pci/devices/0000\:3b\:00.0/sriov_numvfs # readlink /sys/bus/pci/devices/0000:3b:00.0/virtfn* # devlink dev eswitch set pci/0000:3b:00.0 mode switchdev # echo 1 > /sys/bus/pci/devices/0000\:3b\:00.0/sriov_numvfs # readlink /sys/bus/pci/devices/0000:3b:00.0/virtfn* # echo 0000:3b:00.2 >/sys/bus/pci/drivers/mlx5_core/unbind # devlink dev eswitch set pci/0000:3b:00.0 mode switchdev # echo 0000:3b:00.2 >/sys/bus/pci/drivers/mlx5_core/bind # echo 0 > /sys/bus/pci/devices/0000\:3b\:00.1/sriov_numvfs # readlink /sys/bus/pci/devices/0000:3b:00.1/virtfn* # devlink dev eswitch set pci/0000:3b:00.1 mode switchdev # echo 1 > /sys/bus/pci/devices/0000\:3b\:00.1/sriov_numvfs # readlink /sys/bus/pci/devices/0000:3b:00.1/virtfn* # echo 0000:3b:01.2 >/sys/bus/pci/drivers/mlx5_core/unbind # devlink dev eswitch set pci/0000:3b:00.1 mode switchdev # echo 0000:3b:01.2 >/sys/bus/pci/drivers/mlx5_core/bind # vdpa mgmtdev show | grep pci # vdpa dev add name vdpa0 mgmtdev pci/0000:3b:00.2 mac 00:11:22:33:44:03 max_vqp 2 # vdpa dev add name vdpa1 mgmtdev pci/0000:3b:01.2 mac 00:11:22:33:44:04 max_vqp 2 # systemctl stop openvswitch # systemctl start openvswitch # /usr/bin/python3 /home/nfv-virt-rt-kvm/tools/dpdk-devbind.py --status # /usr/bin/ovs-vsctl --if-exists del-br ovsbr0 # /usr/bin/ovs-vsctl add-br ovsbr0 # /usr/bin/ovs-vsctl add-port ovsbr0 eth0 # /usr/bin/ovs-vsctl add-port ovsbr0 enp59s0f0np0 # /usr/bin/python3 /home/nfv-virt-rt-kvm/tools/dpdk-devbind.py --status # /usr/bin/ovs-vsctl --if-exists del-br ovsbr1 # /usr/bin/ovs-vsctl add-br ovsbr1 # /usr/bin/ovs-vsctl add-port ovsbr1 eth1 # /usr/bin/ovs-vsctl add-port ovsbr1 enp59s0f1np1 2. Boot VM with vdpa single queue. <interface type='vdpa'> <mac address='00:11:22:33:44:03'/> <source dev='/dev/vhost-vdpa-0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </interface> <interface type='vdpa'> <mac address='00:11:22:33:44:04'/> <source dev='/dev/vhost-vdpa-1'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </interface> 3. In VM, start testpmd. qemu crash. # modprobe vfio enable_unsafe_noiommu_mode=Y # modprobe vfio-pci # dpdk-devbind.py --bind=vfio-pci 0000:06:00.0 # dpdk-devbind.py --bind=vfio-pci 0000:07:00.0 # dpdk-testpmd \ -l 1,2,3 \ -n 4 \ -d /usr/lib64/librte_net_virtio.so \ -- \ --nb-cores=2 \ -i \ --disable-rss \ --rxd=512 --txd=512 \ --rxq=1 --txq=1 # virsh list --all Id Name State -------------------------- - rhel9.0 shut off # cat /var/log/libvirt/qemu/rhel9.0.log .... qemu-kvm: ../hw/virtio/vhost-vdpa.c:561: int vhost_vdpa_get_vq_index(struct vhost_dev *, int): Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed. 2022-03-28 05:16:59.855+0000: shutting down, reason=crashed Actual results: qemu crash Expected results: qemu should not crash.