Description of problem: [ppc64le]qemu got core dump while continue a guest attach network with "disable-legacy=off,disable-modern=true" Version-Release number of selected component (if applicable): kernel-4.18.0-226.el8.ppc64le qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.ppc64le How reproducible: 80% or so Steps to Reproduce: 1.boot up a guest with /usr/libexec/qemu-kvm -S ... -device virtio-net-pci,mac=9a:4c:4d:4e:4f:60,id=idtniYmJ,vectors=4,netdev=idG7NvsN,disable-legacy=off,disable-modern=true... 2.cont guest at very early stage 3. Actual results: QEMU got coredump Expected results: no core dump and so on so forth Additional info: /usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -sandbox on -machine pseries -nodefaults -device VGA,bus=pci.0,addr=0x2 -m 8192 -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 -cpu 'host' -chardev socket,id=qmp_id_qmpmonitor1,server,nowait,path=/tmp/t22 -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,id=chardev_serial0,server,nowait,path=/tmp/t11 -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4,disable-legacy=off,disable-modern=true -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kar/vt_test_images/rhel830-ppc64le-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 -device scsi-hd,id=image1,drive=drive_image1,write-cache=on -vnc :0 -rtc base=utc,clock=host -boot menu=off,order=cdn,once=c,strict=off -enable-kvm -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6,disable-legacy=off,disable-modern=true -monitor stdio -device virtio-net-pci,mac=9a:4c:4d:4e:4f:60,id=idtniYmJ,vectors=4,netdev=idG7NvsN,disable-legacy=off,disable-modern=true -netdev tap,id=idG7NvsN,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
(gdb) bt #0 0x00007ffff6e5163c in pthread_cond_wait@@GLIBC_2.17 () at /lib64/power9/libpthread.so.0 #1 0x0000000100681b80 in qemu_cond_wait_impl (cond=<optimized out>, mutex=<optimized out>, file=<optimized out>, line=<optimized out>) at /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.ppc64le/util/qemu-thread-posix.c:174 #2 0x00000001002efa2c in qemu_wait_io_event (cpu=cpu@entry=0x1010f7dd0) at /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.ppc64le/softmmu/cpus.c:1145 #3 0x00000001002f1fc8 in qemu_kvm_cpu_thread_fn (arg=0x1010f7dd0) at /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.ppc64le/softmmu/cpus.c:1193 #4 0x00000001002f1fc8 in qemu_kvm_cpu_thread_fn (arg=0x1010f7dd0) at /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.ppc64le/softmmu/cpus.c:1160 #5 0x00000001006814c0 in qemu_thread_start (args=0x10113e960) at /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.ppc64le/util/qemu-thread-posix.c:521 #6 0x00007ffff6e487c8 in start_thread () at /lib64/power9/libpthread.so.0 #7 0x00007ffff6d60508 in clone () at /lib64/power9/libc.so.6
The issues wasn't reproduced on qemu-kvm-5.0.0-0.scrmod+el8.3.0+7066+6dd3ecaa.wrb200617.ppc64le so it should be a regression, thanks.
Also tried the issue on x86 platform, the similar issue wasn't reproduced. kernel-4.18.0-227.el8.x86_64 qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.x86_64
Created attachment 1702326 [details] command
Additional information, if removing "disable-legacy=off,disable-modern=true", the issue can't be reproduced. Thanks.
Is this ppc specific, or does it also appear on x86 (with the same qemu version)?
(In reply to David Gibson from comment #6) > Is this ppc specific, or does it also appear on x86 (with the same qemu > version)? Hi David, Please have a look on comment 3, the qemu won't get core dump but the qemu hung instead on x86. Thanks Min
Looks like a stack overflow because of an infinite recursive loop: #0 0x00007ffff7abd3bc in g_hash_table_lookup () at /lib64/libglib-2.0.so.0 #1 0x000000010053c818 in type_table_lookup () #2 0x000000010053ec18 in object_class_dynamic_cast () #3 0x000000010053eefc in object_class_dynamic_cast_assert () #4 0x000000010029a438 in virtio_queue_enabled () #5 0x0000000100497a9c in virtio_pci_queue_enabled () ... #130902 0x000000010029a460 in virtio_queue_enabled () #130903 0x0000000100497a9c in virtio_pci_queue_enabled () #130904 0x000000010029a460 in virtio_queue_enabled () #130905 0x0000000100454a20 in vhost_net_start () #130906 0x00000001002775b0 in virtio_net_set_status () #130907 0x0000000100298540 in virtio_set_status () #130908 0x0000000100499e9c in virtio_pci_config_write () #130909 0x00000001002f89ec in memory_region_write_accessor () #130910 0x00000001002f5cc0 in access_with_adjusted_size () #130911 0x00000001002fbcbc in memory_region_dispatch_write () #130912 0x00000001001bfeb0 in address_space_stb () #130913 0x00000001002cb8f4 in h_logical_store () #130914 0x00000001002cf688 in spapr_hypercall () #130915 0x00000001003ad898 in kvm_arch_handle_exit () #130916 0x0000000100227f8c in kvm_cpu_exec () #130917 0x00000001002f1ff8 in qemu_kvm_cpu_thread_fn () #130918 0x00000001006814c0 in qemu_thread_start () #130919 0x00007ffff6e28878 in start_thread () at /lib64/libpthread.so.0 #130920 0x00007ffff6d332c8 in clone () at /lib64/libc.so.6 Reproduced with qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.ppc64le, but also with upstream (4215d34132)
(In reply to Min Deng from comment #1) > (gdb) bt > #0 0x00007ffff6e5163c in pthread_cond_wait@@GLIBC_2.17 () at > /lib64/power9/libpthread.so.0 > #1 0x0000000100681b80 in qemu_cond_wait_impl (cond=<optimized out>, > mutex=<optimized out>, file=<optimized out>, line=<optimized out>) > at > /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716. > ppc64le/util/qemu-thread-posix.c:174 > #2 0x00000001002efa2c in qemu_wait_io_event (cpu=cpu@entry=0x1010f7dd0) at > /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716. > ppc64le/softmmu/cpus.c:1145 > #3 0x00000001002f1fc8 in qemu_kvm_cpu_thread_fn (arg=0x1010f7dd0) at > /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716. > ppc64le/softmmu/cpus.c:1193 > #4 0x00000001002f1fc8 in qemu_kvm_cpu_thread_fn (arg=0x1010f7dd0) at > /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716. > ppc64le/softmmu/cpus.c:1160 > #5 0x00000001006814c0 in qemu_thread_start (args=0x10113e960) at > /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716. > ppc64le/util/qemu-thread-posix.c:521 > #6 0x00007ffff6e487c8 in start_thread () at /lib64/power9/libpthread.so.0 > #7 0x00007ffff6d60508 in clone () at /lib64/power9/libc.so.6 I think your gdb has stopped because of the SIGUSR1 signal (that is expected) not because of the SIGSEGV. To get the real backtrace, you must do "handle SIGUSR1 nostop noprint" before running QEMU.
The crash is triggered by the SLOF devices scan: Populating /vdevice methods Populating /vdevice/vty@30000000 Populating /vdevice/nvram@71000000 Populating /pci@800000020000000 00 0000 (D) : 1af4 1000 virtio [ net ]
Bisected to: commit f19bcdfedd53ee93412d535a842a89fa27cae7f2 Author: Jason Wang <jasowang> Date: Wed Jul 1 22:55:28 2020 +0800 virtio-pci: implement queue_enabled method With version 1, we can detect whether a queue is enabled via queue_enabled. Signed-off-by: Jason Wang <jasowang> Signed-off-by: Cindy Lu <lulu> Message-Id: <20200701145538.22333-5-lulu> Reviewed-by: Michael S. Tsirkin <mst> Signed-off-by: Michael S. Tsirkin <mst> Acked-by: Jason Wang <jasowang> hw/virtio/virtio-pci.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
I think the loop comes from: b2a5f62a22 virtio-bus: introduce queue_enabled method @@ -3286,6 +3286,12 @@ hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n) bool virtio_queue_enabled(VirtIODevice *vdev, int n) { + BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); + VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); + + if (k->queue_enabled) { + return k->queue_enabled(qbus->parent, n); + } return virtio_queue_get_desc_addr(vdev, n) != 0; } and then f19bcdfedd virtio-pci: implement queue_enabled method +static bool virtio_pci_queue_enabled(DeviceState *d, int n) +{ + VirtIOPCIProxy *proxy = VIRTIO_PCI(d); + VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus); + + if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) { + return proxy->vqs[vdev->queue_sel].enabled; + } + + return virtio_queue_enabled(vdev, n); +} ... + k->queue_enabled = virtio_pci_queue_enabled; So virtio_queue_enabled() calls virtio_pci_queue_enabled() that calls virtio_queue_enabled() again... It should call "virtio_queue_get_desc_addr(vdev, n) != 0;" directly.
This patch fixes the problem for me: diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index ada1101d07..0a85c17e91 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -1116,7 +1116,7 @@ static bool virtio_pci_queue_enabled(DeviceState *d, int n) return proxy->vqs[vdev->queue_sel].enabled; } - return virtio_queue_enabled(vdev, n); + return virtio_queue_get_desc_addr(vdev, n) != 0; }
Fix sent upstream: [PATCH] virtio-pci: fix virtio_pci_queue_enabled() https://patchew.org/QEMU/20200727153319.43716-1-lvivier@redhat.com/
Fix committed upstream in 5.1.0-rc2: 0c9753ebda27 virtio-pci: fix virtio_pci_queue_enabled() https://github.com/qemu/qemu/commit/0c9753ebda274b0e618d7b4032bb2d83d27483ed
*** Bug 1860866 has been marked as a duplicate of this bug. ***
Verify this bug with qemu-kvm-5.1.0-2.module+el8.3.0+7652+b30e6901.ppc64le QEMU command: # /usr/libexec/qemu-kvm -machine pseries -nodefaults -display none -vga none -nographic -m 8192 -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 -cpu 'host' -device virtio-net-pci,mac=9a:4c:4d:4e:4f:60,id=idtniYmJ,vectors=4,netdev=idG7NvsN,disable-legacy=off,disable-modern=on -netdev tap,id=idG7NvsN,vhost=on "Segmentation fault (core dumped)" is not triggered, and automated test case test passed. Host_RHEL.m8.u3.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.3.0.ppc64le.io-github-autotest-qemu.virtio_mode.with_netkvm.with_legacy: PASS (90.56 s)
Hi Lei, Can you help try to verify on x86 as well? Because bz 1860866 also mentioned.
(In reply to Yihuang Yu from comment #21) > Hi Lei, > > Can you help try to verify on x86 as well? Because bz 1860866 also mentioned. Hi, Yihuang I tried test it with qemu-kvm-5.1.0-2.module+el8.3.0+7652+b30e6901.x86_64, it works well on x86. Best Regards Lei Yang.
Verify this bug based on comment 20 and comment 22.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137