Bug 2083068

Summary: [vhost-vdpa][rhel9.1][edk2] Boot a uefi guest with mq vhost-vdpa device occurs qemu core dump
Product: Red Hat Enterprise Linux 9 Reporter: Lei Yang <leiyang>
Component: edk2Assignee: Laurent Vivier <lvivier>
Status: CLOSED DUPLICATE QA Contact: Lei Yang <leiyang>
Severity: high Docs Contact:
Priority: high    
Version: 9.1CC: aadam, berrange, chayang, coli, jasowang, jinzhao, juzhang, kraxel, lulu, pbonzini, pezhang, virt-maint, wquan
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-10 09:25:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lei Yang 2022-05-09 08:51:00 UTC
Description of problem:
Boot a uefi guest with mq vhost-vdpa device occurs qemu core dump

Version-Release number of selected component (if applicable):
kernel-5.14.0-86.el9.x86_64
qemu-kvm-7.0.0-2.el9.x86_64
edk2-ovmf-20220221gitb24306f15d-1.el9.noarch
iproute-5.15.0-2.2.el9_0.x86_64

# flint -d 0000:3b:00.0 q
Image type:            FS4
FW Version:            22.33.1048
FW Release Date:       29.4.2022
Product Version:       22.33.1048
Rom Info:              type=UEFI version=14.26.17 cpu=AMD64,AARCH64
                       type=PXE version=3.6.502 cpu=AMD64
Description:           UID                GuidsNumber
Base GUID:             b8cef603000a110c        4
Base MAC:              b8cef60a110c            4
Image VSD:             N/A
Device VSD:            N/A
PSID:                  MT_0000000359
Security Attributes:   N/A

How reproducible:
100%

Steps to Reproduce:
1. create vdpa device
# echo 0 > /sys/bus/pci/devices/0000\:3b\:00.0/sriov_numvfs
# modprobe vhost_vdpa
# modprobe mlx5_vdpa
# echo 1 > /sys/bus/pci/devices/0000\:3b\:00.0/sriov_numvfs
# readlink /sys/bus/pci/devices/0000:3b:00.0/virtfn*
../0000:3b:00.2
# echo 0000:3b:00.2 >/sys/bus/pci/drivers/mlx5_core/unbind
# devlink dev eswitch set pci/0000:3b:00.0 mode switchdev
# echo 0000:3b:00.2 >/sys/bus/pci/drivers/mlx5_core/bind
# vdpa mgmtdev show | grep pci
pci/0000:3b:00.2: 
# vdpa dev add name vdpa0 mgmtdev pci/0000:3b:00.2 mac 00:11:22:33:44:03  max_vqp 8
# ovs-vsctl add-br vdpa_bridge
# ovs-vsctl set Open_vSwitch . other_config:hw-offload="true"
# ovs-vsctl add-port vdpa_bridge enp59s0f0np0
# ovs-vsctl add-port vdpa_bridge eth0
# ip link set vdpa_bridge up
# ip addr add 192.168.10.10/24 dev vdpa_bridge
# dnsmasq --strict-order --bind-interfaces --listen-address 192.168.10.10 --dhcp-range 192.168.10.20,192.168.10.254 --dhcp-lease-max=253 --dhcp-no-override --pid-file=/tmp/dnsmasq.pid --log-facility=/tmp/dnsmasq.log

2.Boot a uefi guest with multi queues vhost-vdpa device
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-blockdev node-name=file_ovmf_code,driver=file,filename=/usr/share/OVMF/OVMF_CODE.secboot.fd,auto-read-only=on,discard=unmap \
-blockdev node-name=drive_ovmf_code,driver=raw,read-only=on,file=file_ovmf_code \
-blockdev node-name=file_ovmf_vars,driver=file,filename=/home/kvm_autotest_root/images/avocado-vt-vm1_rhel910-64-virtio-scsi.qcow2_VARS.fd,auto-read-only=on,discard=unmap \
-blockdev node-name=drive_ovmf_vars,driver=raw,read-only=off,file=file_ovmf_vars \
-machine q35,memory-backend=mem-machine_mem,pflash0=drive_ovmf_code,pflash1=drive_ovmf_vars \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x2 \
-m 25600 \
-object memory-backend-ram,size=25600M,id=mem-machine_mem  \
-smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
-cpu 'Cascadelake-Server-noTSX',+kvm_pv_unhalt \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
-device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
-blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/rhel910-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:11:22:33:44:02,mq=on,vectors=18,bus=pcie-root-port-3,addr=0x0 \
-netdev vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=hostnet0,queues=8 \
-vnc :0  \
-rtc base=utc,clock=host,driftfix=slew  \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 \
-monitor stdio \

3.qemu core dump occurs at this point
# sh edk2.sh
QEMU 7.0.0 monitor - type 'help' for more information
(qemu) qemu-kvm: ../hw/virtio/vhost-vdpa.c:716: int vhost_vdpa_get_vq_index(struct vhost_dev *, int): Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.
edk2.sh: line 34: 61241 Aborted    

4. core info
# gdb /usr/libexec/qemu-kvm core.qemu-kvm.61241
......
#0  0x00007fa22be6257c in __pthread_kill_implementation () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fa222f0c640 (LWP 61246))]
(gdb) bt full
#0  0x00007fa22be6257c in __pthread_kill_implementation () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fa22be15d56 in raise () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fa22bde8833 in abort () from /lib64/libc.so.6
No symbol table info available.
#3  0x00007fa22bde875b in __assert_fail_base.cold () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007fa22be0ecd6 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#5  0x0000557833acc7a0 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:716
No locals.
#6  0x0000557833ac27e1 in vhost_virtqueue_mask (hdev=0x557835eb4900, vdev=<optimized out>, n=6, mask=<optimized out>)
    at ../hw/virtio/vhost.c:1550
        file = {index = 1, fd = 88}
        index = <optimized out>
        vvq = <optimized out>
        r = <optimized out>
#7  0x000055783397eb20 in virtio_pci_set_guest_notifier (d=<optimized out>, n=<optimized out>, assign=<optimized out>, 
    with_irqfd=<optimized out>) at ../hw/virtio/virtio-pci.c:975
        proxy = <optimized out>
        vdev = 0x55783731e6a0
        vdc = <optimized out>
        vq = <optimized out>
        notifier = 0x7fa2205b91a8
#8  0x000055783397b110 in virtio_pci_set_guest_notifiers (d=<optimized out>, nvqs=3, assign=<optimized out>)
    at ../hw/virtio/virtio-pci.c:1020
        proxy = <optimized out>
        vdev = 0x55783731e6a0
        k = 0x557835dab720
        with_irqfd = false
        n = 2
        r = <optimized out>
        notifiers_error = <optimized out>
#9  0x000055783391f443 in vhost_net_start (dev=0x55783731e6a0, ncs=<optimized out>, data_queue_pairs=<optimized out>, cvq=<optimized out>)
    at ../hw/net/vhost_net.c:361
        qbus = 0x55783731e618
--Type <RET> for more, q to quit, c to continue without paging--
        vbus = 0x55783731e618
        total_notifiers = 3
        k = 0x557835d21580
        index_end = <optimized out>
        nvhosts = 2
        n = 0x55783731e6a0
        i = <optimized out>
        peer = <optimized out>
        net = <optimized out>
        r = <optimized out>
        e = <optimized out>
        err = <optimized out>
#10 0x0000557833a91163 in virtio_net_set_status (vdev=<optimized out>, status=15 '\017') at ../hw/net/virtio-net.c:290
        n = <optimized out>
        i = <optimized out>
        q = <optimized out>
        queue_status = <optimized out>
#11 0x0000557833abb7c7 in virtio_set_status (vdev=0x55783731e6a0, val=15 '\017') at ../hw/virtio/virtio.c:1947
        k = 0x557835dab720
        ret = <optimized out>
#12 0x000055783397e5ce in virtio_pci_common_write (opaque=0x557837316300, addr=<optimized out>, val=15, size=<optimized out>)
    at ../hw/virtio/virtio-pci.c:1293
        proxy = 0x557837316300
        vdev = 0x55783731e6a0
#13 0x0000557833a3d9d9 in memory_region_dispatch_write (mr=0x557837316e10, addr=20, data=<optimized out>, op=<optimized out>, attrs=...)
    at ../softmmu/memory.c:554
        size = <optimized out>
#14 0x0000557833a46dd5 in flatview_write_continue (fv=<optimized out>, addr=34360786964, attrs=..., ptr=<optimized out>, len=1, addr1=61246, 
    l=1, mr=<optimized out>) at ../softmmu/physmem.c:2814
        buf = <optimized out>
        release_lock = true
        result = 0
        val = 6
        ram_ptr = <optimized out>
#15 0x0000557833a4ae19 in address_space_write (as=<optimized out>, addr=34360786964, attrs=..., 
    buf=0x7fa22be6257c <__pthread_kill_implementation+284>, len=1) at ../softmmu/physmem.c:2856
        result = 0
--Type <RET> for more, q to quit, c to continue without paging--
        fv = 0x7f9bd4561de0
#16 0x0000557833b6dfc0 in kvm_cpu_exec (cpu=<optimized out>) at ../softmmu/physmem.c:2962
        run = <optimized out>
        ret = <optimized out>
        run_ret = <optimized out>
#17 0x0000557833b702ea in kvm_vcpu_thread_fn (arg=0x557835eb9320) at ../accel/kvm/kvm-accel-ops.c:49
        r = <optimized out>
        cpu = <optimized out>
#18 0x0000557833da325a in qemu_thread_start (args=0x557835ec7f10) at ../util/qemu-thread-posix.c:556
        __clframe = {__cancel_routine = <optimized out>, __cancel_arg = 0x0, __do_it = 1, __cancel_type = <synthetic pointer>}
        qemu_thread_args = 0x557835ec7f10
        start_routine = 0x557833b70170 <kvm_vcpu_thread_fn>
        arg = 0x557835eb9320
        r = <optimized out>
#19 0x00007fa22be60832 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#20 0x00007fa22be004c0 in clone3 () from /lib64/libc.so.6
No symbol table info available.

Actual results:
qemu core dump

Expected results:
Guest can boot succeed

Additional info:
1. this problem only occurs "uefi guest" + "mq vhost-vdpa", single queue works well
2. Seabios guest works well on mq/single queue

Comment 2 jason wang 2022-05-09 09:17:52 UTC
Looks like a duplication of bz2069946.

Thanks

Comment 3 Lei Yang 2022-05-09 10:13:11 UTC
(In reply to jason wang from comment #2)
> Looks like a duplication of bz2069946.
> 
> Thanks

Hello Jason

According to test result, I think you're right, they are has same core dump info. I tried to test bz2069946's scenario. the core dump file are as fowllow:

1. PXE a guest with multi queues vdpa
# cat seabios.sh 
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-machine q35,memory-backend=mem-machine_mem \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x2 \
-m 24576 \
-object memory-backend-ram,size=24576M,id=mem-machine_mem  \
-smp 16,maxcpus=16,cores=8,threads=1,dies=1,sockets=2  \
-cpu 'Cascadelake-Server-noTSX',+kvm_pv_unhalt \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
-device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
-blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/test.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
-netdev vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=hostnet0,queues=8 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mq=on,vectors=18,mac=ce:19:60:6d:33:df,bus=pcie-root-port-3,addr=0x0 \
-vnc :0  \
-rtc base=utc,clock=host,driftfix=slew  \
-boot menu=on \
-enable-kvm \
-device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5 \
-monitor stdio \

2.Qemu core dump
QEMU 7.0.0 monitor - type 'help' for more information
(qemu) qemu-kvm: ../hw/virtio/vhost-vdpa.c:716: int vhost_vdpa_get_vq_index(struct vhost_dev *, int): Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed.

# gdb /usr/libexec/qemu-kvm core.qemu-kvm.63750
......
#0  0x00007f374b95957c in __pthread_kill_implementation () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f37429be640 (LWP 63755))]
(gdb) bt full
#0  0x00007f374b95957c in __pthread_kill_implementation () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007f374b90cd56 in raise () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007f374b8df833 in abort () from /lib64/libc.so.6
No symbol table info available.
#3  0x00007f374b8df75b in __assert_fail_base.cold () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007f374b905cd6 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#5  0x000055cd168697a0 in vhost_vdpa_get_vq_index (dev=<optimized out>, idx=<optimized out>) at ../hw/virtio/vhost-vdpa.c:716
No locals.
#6  0x000055cd1685f7e1 in vhost_virtqueue_mask (hdev=0x55cd18256c10, vdev=<optimized out>, n=6, mask=<optimized out>)
    at ../hw/virtio/vhost.c:1550
        file = {index = 1, fd = 86}
        index = <optimized out>
        vvq = <optimized out>
        r = <optimized out>
#7  0x000055cd1671bb20 in virtio_pci_set_guest_notifier (d=<optimized out>, n=<optimized out>, assign=<optimized out>, 
    with_irqfd=<optimized out>) at ../hw/virtio/virtio-pci.c:975
        proxy = <optimized out>
        vdev = 0x55cd19533af0
        vdc = <optimized out>
        vq = <optimized out>
        notifier = 0x7f373beaa1a8
#8  0x000055cd16718110 in virtio_pci_set_guest_notifiers (d=<optimized out>, nvqs=3, assign=<optimized out>)
    at ../hw/virtio/virtio-pci.c:1020
        proxy = <optimized out>
        vdev = 0x55cd19533af0
        k = 0x55cd1816ab20
        with_irqfd = false
        n = 2
        r = <optimized out>
        notifiers_error = <optimized out>
#9  0x000055cd166bc443 in vhost_net_start (dev=0x55cd19533af0, ncs=<optimized out>, data_queue_pairs=<optimized out>, cvq=<optimized out>)
    at ../hw/net/vhost_net.c:361
        qbus = 0x55cd19533a68
--Type <RET> for more, q to quit, c to continue without paging--
        vbus = 0x55cd19533a68
        total_notifiers = 3
        k = 0x55cd180df7f0
        index_end = <optimized out>
        nvhosts = 2
        n = 0x55cd19533af0
        i = <optimized out>
        peer = <optimized out>
        net = <optimized out>
        r = <optimized out>
        e = <optimized out>
        err = <optimized out>
#10 0x000055cd1682e163 in virtio_net_set_status (vdev=<optimized out>, status=15 '\017') at ../hw/net/virtio-net.c:290
        n = <optimized out>
        i = <optimized out>
        q = <optimized out>
        queue_status = <optimized out>
#11 0x000055cd168587c7 in virtio_set_status (vdev=0x55cd19533af0, val=15 '\017') at ../hw/virtio/virtio.c:1947
        k = 0x55cd1816ab20
        ret = <optimized out>
#12 0x000055cd1671b5ce in virtio_pci_common_write (opaque=0x55cd1952b750, addr=<optimized out>, val=15, size=<optimized out>)
    at ../hw/virtio/virtio-pci.c:1293
        proxy = 0x55cd1952b750
        vdev = 0x55cd19533af0
#13 0x000055cd167da9d9 in memory_region_dispatch_write (mr=0x55cd1952c260, addr=20, data=<optimized out>, op=<optimized out>, attrs=...)
    at ../softmmu/memory.c:554
        size = <optimized out>
#14 0x000055cd167e3dd5 in flatview_write_continue (fv=<optimized out>, addr=4246732820, attrs=..., ptr=<optimized out>, len=1, addr1=63755, 
    l=1, mr=<optimized out>) at ../softmmu/physmem.c:2814
        buf = <optimized out>
        release_lock = true
        result = 0
        val = 6
        ram_ptr = <optimized out>
#15 0x000055cd167e7e19 in address_space_write (as=<optimized out>, addr=4246732820, attrs=..., 
    buf=0x7f374b95957c <__pthread_kill_implementation+284>, len=1) at ../softmmu/physmem.c:2856
        result = 0
--Type <RET> for more, q to quit, c to continue without paging--
        fv = 0x7f31347130b0
#16 0x000055cd1690afc0 in kvm_cpu_exec (cpu=<optimized out>) at ../softmmu/physmem.c:2962
        run = <optimized out>
        ret = <optimized out>
        run_ret = <optimized out>
#17 0x000055cd1690d2ea in kvm_vcpu_thread_fn (arg=0x55cd1825b440) at ../accel/kvm/kvm-accel-ops.c:49
        r = <optimized out>
        cpu = <optimized out>
#18 0x000055cd16b4025a in qemu_thread_start (args=0x55cd1826adc0) at ../util/qemu-thread-posix.c:556
        __clframe = {__cancel_routine = <optimized out>, __cancel_arg = 0x0, __do_it = 1, __cancel_type = <synthetic pointer>}
        qemu_thread_args = 0x55cd1826adc0
        start_routine = 0x55cd1690d170 <kvm_vcpu_thread_fn>
        arg = 0x55cd1825b440
        r = <optimized out>
#19 0x00007f374b957832 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#20 0x00007f374b8f74c0 in clone3 () from /lib64/libc.so.6
No symbol table info available.

Comment 4 Lei Yang 2022-05-09 10:20:29 UTC
Hello Jason

Could you please help review Bug 2082782, maybe this bug is also a duplication of bz2069946.

Thanks
Lei

Comment 5 jason wang 2022-05-10 07:30:56 UTC
(In reply to Lei Yang from comment #4)
> Hello Jason
> 
> Could you please help review Bug 2082782, maybe this bug is also a
> duplication of bz2069946.
> 
> Thanks
> Lei

Yes, I think it's another duplication.

Thanks

Comment 7 Laurent Vivier 2022-05-10 09:25:32 UTC

*** This bug has been marked as a duplicate of bug 2070804 ***