Bug 1860314 - [ppc64le]qemu got core dump while continue a guest attached network with "disable-legacy=off,disable-modern=true"
Summary: [ppc64le]qemu got core dump while continue a guest attached network with "dis...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.3
Hardware: ppc64le
OS: Linux
high
high
Target Milestone: rc
: 8.3
Assignee: Laurent Vivier
QA Contact: Yihuang Yu
URL:
Whiteboard:
: 1860866 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-24 09:53 UTC by Min Deng
Modified: 2020-11-17 17:51 UTC (History)
14 users (show)

Fixed In Version: qemu-kvm-5.1.0-2.module+el8.3.0+7652+b30e6901
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-17 17:50:17 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
command (1.46 KB, text/plain)
2020-07-24 10:03 UTC, Min Deng
no flags Details

Description Min Deng 2020-07-24 09:53:47 UTC
Description of problem:
[ppc64le]qemu got core dump while continue a guest attach network with "disable-legacy=off,disable-modern=true" 

Version-Release number of selected component (if applicable):
kernel-4.18.0-226.el8.ppc64le
qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.ppc64le

How reproducible:
80% or so

Steps to Reproduce:
1.boot up a guest with 
  /usr/libexec/qemu-kvm -S ... -device virtio-net-pci,mac=9a:4c:4d:4e:4f:60,id=idtniYmJ,vectors=4,netdev=idG7NvsN,disable-legacy=off,disable-modern=true...
2.cont guest at very early stage
3.

Actual results:
QEMU got coredump

Expected results:
no core dump and so on so forth

Additional info:
/usr/libexec/qemu-kvm -S -name 'avocado-vt-vm1' -sandbox on -machine pseries -nodefaults -device VGA,bus=pci.0,addr=0x2 -m 8192 -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 -cpu 'host' -chardev socket,id=qmp_id_qmpmonitor1,server,nowait,path=/tmp/t22 -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,id=chardev_serial0,server,nowait,path=/tmp/t11 -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4,disable-legacy=off,disable-modern=true -blockdev node-name=file_image1,driver=file,aio=threads,filename=/home/kar/vt_test_images/rhel830-ppc64le-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off -blockdev node-name=drive_image1,driver=qcow2,cache.direct=on,cache.no-flush=off,file=file_image1 -device scsi-hd,id=image1,drive=drive_image1,write-cache=on -vnc :0 -rtc base=utc,clock=host -boot menu=off,order=cdn,once=c,strict=off -enable-kvm -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6,disable-legacy=off,disable-modern=true -monitor stdio -device virtio-net-pci,mac=9a:4c:4d:4e:4f:60,id=idtniYmJ,vectors=4,netdev=idG7NvsN,disable-legacy=off,disable-modern=true -netdev tap,id=idG7NvsN,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown

Comment 1 Min Deng 2020-07-24 09:55:19 UTC
(gdb) bt
#0  0x00007ffff6e5163c in pthread_cond_wait@@GLIBC_2.17 () at /lib64/power9/libpthread.so.0
#1  0x0000000100681b80 in qemu_cond_wait_impl (cond=<optimized out>, mutex=<optimized out>, file=<optimized out>, line=<optimized out>)
    at /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.ppc64le/util/qemu-thread-posix.c:174
#2  0x00000001002efa2c in qemu_wait_io_event (cpu=cpu@entry=0x1010f7dd0) at /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.ppc64le/softmmu/cpus.c:1145
#3  0x00000001002f1fc8 in qemu_kvm_cpu_thread_fn (arg=0x1010f7dd0) at /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.ppc64le/softmmu/cpus.c:1193
#4  0x00000001002f1fc8 in qemu_kvm_cpu_thread_fn (arg=0x1010f7dd0) at /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.ppc64le/softmmu/cpus.c:1160
#5  0x00000001006814c0 in qemu_thread_start (args=0x10113e960) at /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.ppc64le/util/qemu-thread-posix.c:521
#6  0x00007ffff6e487c8 in start_thread () at /lib64/power9/libpthread.so.0
#7  0x00007ffff6d60508 in clone () at /lib64/power9/libc.so.6

Comment 2 Min Deng 2020-07-24 09:57:04 UTC
The issues wasn't reproduced on qemu-kvm-5.0.0-0.scrmod+el8.3.0+7066+6dd3ecaa.wrb200617.ppc64le
so it should be a regression, thanks.

Comment 3 Min Deng 2020-07-24 10:00:05 UTC
Also tried the issue on x86 platform, the similar issue wasn't reproduced.
kernel-4.18.0-227.el8.x86_64
qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.x86_64

Comment 4 Min Deng 2020-07-24 10:03:42 UTC
Created attachment 1702326 [details]
command

Comment 5 Min Deng 2020-07-27 02:17:03 UTC
Additional information,
if removing "disable-legacy=off,disable-modern=true", the issue can't be reproduced. Thanks.

Comment 6 David Gibson 2020-07-27 03:20:58 UTC
Is this ppc specific, or does it also appear on x86 (with the same qemu version)?

Comment 7 Min Deng 2020-07-27 05:51:01 UTC
(In reply to David Gibson from comment #6)
> Is this ppc specific, or does it also appear on x86 (with the same qemu
> version)?

Hi David,
Please have a look on comment 3, the qemu won't get core dump but the qemu hung instead on x86.
Thanks
Min

Comment 8 Laurent Vivier 2020-07-27 13:13:14 UTC
Looks like a stack overflow because of an infinite recursive loop:

#0  0x00007ffff7abd3bc in g_hash_table_lookup () at /lib64/libglib-2.0.so.0
#1  0x000000010053c818 in type_table_lookup ()
#2  0x000000010053ec18 in object_class_dynamic_cast ()
#3  0x000000010053eefc in object_class_dynamic_cast_assert ()
#4  0x000000010029a438 in virtio_queue_enabled ()
#5  0x0000000100497a9c in virtio_pci_queue_enabled ()
...
#130902 0x000000010029a460 in virtio_queue_enabled ()
#130903 0x0000000100497a9c in virtio_pci_queue_enabled ()
#130904 0x000000010029a460 in virtio_queue_enabled ()
#130905 0x0000000100454a20 in vhost_net_start ()
#130906 0x00000001002775b0 in virtio_net_set_status ()
#130907 0x0000000100298540 in virtio_set_status ()
#130908 0x0000000100499e9c in virtio_pci_config_write ()
#130909 0x00000001002f89ec in memory_region_write_accessor ()
#130910 0x00000001002f5cc0 in access_with_adjusted_size ()
#130911 0x00000001002fbcbc in memory_region_dispatch_write ()
#130912 0x00000001001bfeb0 in address_space_stb ()
#130913 0x00000001002cb8f4 in h_logical_store ()
#130914 0x00000001002cf688 in spapr_hypercall ()
#130915 0x00000001003ad898 in kvm_arch_handle_exit ()
#130916 0x0000000100227f8c in kvm_cpu_exec ()
#130917 0x00000001002f1ff8 in qemu_kvm_cpu_thread_fn ()
#130918 0x00000001006814c0 in qemu_thread_start ()
#130919 0x00007ffff6e28878 in start_thread () at /lib64/libpthread.so.0
#130920 0x00007ffff6d332c8 in clone () at /lib64/libc.so.6

Reproduced with qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.ppc64le, but also with upstream (4215d34132)

Comment 9 Laurent Vivier 2020-07-27 13:18:56 UTC
(In reply to Min Deng from comment #1)
> (gdb) bt
> #0  0x00007ffff6e5163c in pthread_cond_wait@@GLIBC_2.17 () at
> /lib64/power9/libpthread.so.0
> #1  0x0000000100681b80 in qemu_cond_wait_impl (cond=<optimized out>,
> mutex=<optimized out>, file=<optimized out>, line=<optimized out>)
>     at
> /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.
> ppc64le/util/qemu-thread-posix.c:174
> #2  0x00000001002efa2c in qemu_wait_io_event (cpu=cpu@entry=0x1010f7dd0) at
> /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.
> ppc64le/softmmu/cpus.c:1145
> #3  0x00000001002f1fc8 in qemu_kvm_cpu_thread_fn (arg=0x1010f7dd0) at
> /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.
> ppc64le/softmmu/cpus.c:1193
> #4  0x00000001002f1fc8 in qemu_kvm_cpu_thread_fn (arg=0x1010f7dd0) at
> /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.
> ppc64le/softmmu/cpus.c:1160
> #5  0x00000001006814c0 in qemu_thread_start (args=0x10113e960) at
> /usr/src/debug/qemu-kvm-5.1.0-0.scrmod+el8.3.0+7384+2e5aeafb.wrb200716.
> ppc64le/util/qemu-thread-posix.c:521
> #6  0x00007ffff6e487c8 in start_thread () at /lib64/power9/libpthread.so.0
> #7  0x00007ffff6d60508 in clone () at /lib64/power9/libc.so.6


I think your gdb has stopped because of the SIGUSR1 signal (that is expected) not because of the SIGSEGV.

To get the real backtrace, you must do "handle SIGUSR1 nostop noprint" before running QEMU.

Comment 10 Laurent Vivier 2020-07-27 13:25:27 UTC
The crash is triggered by the SLOF devices scan:

Populating /vdevice methods                                                     
Populating /vdevice/vty@30000000                                                
Populating /vdevice/nvram@71000000                                              
Populating /pci@800000020000000                                                 
                     00 0000 (D) : 1af4 1000    virtio [ net ]

Comment 11 Laurent Vivier 2020-07-27 13:39:39 UTC
Bisected to:

commit f19bcdfedd53ee93412d535a842a89fa27cae7f2
Author: Jason Wang <jasowang>
Date:   Wed Jul 1 22:55:28 2020 +0800

    virtio-pci: implement queue_enabled method
    
    With version 1, we can detect whether a queue is enabled via
    queue_enabled.
    
    Signed-off-by: Jason Wang <jasowang>
    Signed-off-by: Cindy Lu <lulu>
    Message-Id: <20200701145538.22333-5-lulu>
    Reviewed-by: Michael S. Tsirkin <mst>
    Signed-off-by: Michael S. Tsirkin <mst>
    Acked-by: Jason Wang <jasowang>

 hw/virtio/virtio-pci.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comment 12 Laurent Vivier 2020-07-27 13:43:35 UTC
I think the loop comes from:

b2a5f62a22 virtio-bus: introduce queue_enabled method

@@ -3286,6 +3286,12 @@ hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n)
 
 bool virtio_queue_enabled(VirtIODevice *vdev, int n)
 {
+    BusState *qbus = qdev_get_parent_bus(DEVICE(vdev));
+    VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+
+    if (k->queue_enabled) {
+        return k->queue_enabled(qbus->parent, n);
+    }
     return virtio_queue_get_desc_addr(vdev, n) != 0;
 }
 

and then f19bcdfedd virtio-pci: implement queue_enabled method

+static bool virtio_pci_queue_enabled(DeviceState *d, int n)
+{
+    VirtIOPCIProxy *proxy = VIRTIO_PCI(d);
+    VirtIODevice *vdev = virtio_bus_get_device(&proxy->bus);
+
+    if (virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+        return proxy->vqs[vdev->queue_sel].enabled;
+    }
+
+    return virtio_queue_enabled(vdev, n);
+}
...
+    k->queue_enabled = virtio_pci_queue_enabled;

So virtio_queue_enabled() calls virtio_pci_queue_enabled() that calls virtio_queue_enabled() again...

It should call "virtio_queue_get_desc_addr(vdev, n) != 0;" directly.

Comment 13 Laurent Vivier 2020-07-27 13:47:04 UTC
This patch fixes the problem for me:

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index ada1101d07..0a85c17e91 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1116,7 +1116,7 @@ static bool virtio_pci_queue_enabled(DeviceState *d, int n)
         return proxy->vqs[vdev->queue_sel].enabled;
     }
 
-    return virtio_queue_enabled(vdev, n);
+    return virtio_queue_get_desc_addr(vdev, n) != 0;
 }

Comment 14 Laurent Vivier 2020-07-27 16:02:39 UTC
Fix sent upstream:

[PATCH] virtio-pci: fix virtio_pci_queue_enabled()
        https://patchew.org/QEMU/20200727153319.43716-1-lvivier@redhat.com/

Comment 15 Laurent Vivier 2020-07-28 08:39:20 UTC
Fix committed upstream in 5.1.0-rc2:

0c9753ebda27 virtio-pci: fix virtio_pci_queue_enabled()
             https://github.com/qemu/qemu/commit/0c9753ebda274b0e618d7b4032bb2d83d27483ed

Comment 16 Yumei Huang 2020-07-29 07:47:27 UTC
*** Bug 1860866 has been marked as a duplicate of this bug. ***

Comment 20 Yihuang Yu 2020-08-14 06:15:43 UTC
Verify this bug with qemu-kvm-5.1.0-2.module+el8.3.0+7652+b30e6901.ppc64le

QEMU command:
# /usr/libexec/qemu-kvm -machine pseries -nodefaults -display none -vga none -nographic -m 8192 -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 -cpu 'host' -device virtio-net-pci,mac=9a:4c:4d:4e:4f:60,id=idtniYmJ,vectors=4,netdev=idG7NvsN,disable-legacy=off,disable-modern=on -netdev tap,id=idG7NvsN,vhost=on

"Segmentation fault (core dumped)" is not triggered, and automated test case test passed.
Host_RHEL.m8.u3.product_av.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.8.3.0.ppc64le.io-github-autotest-qemu.virtio_mode.with_netkvm.with_legacy: PASS (90.56 s)

Comment 21 Yihuang Yu 2020-08-14 06:22:04 UTC
Hi Lei,

Can you help try to verify on x86 as well? Because bz 1860866 also mentioned.

Comment 22 Lei Yang 2020-08-24 03:51:46 UTC
(In reply to Yihuang Yu from comment #21)
> Hi Lei,
> 
> Can you help try to verify on x86 as well? Because bz 1860866 also mentioned.

Hi, Yihuang

I tried test it with qemu-kvm-5.1.0-2.module+el8.3.0+7652+b30e6901.x86_64, it works well on x86.

Best Regards
Lei Yang.

Comment 23 Yihuang Yu 2020-08-24 08:07:19 UTC
Verify this bug based on comment 20 and comment 22.

Comment 26 errata-xmlrpc 2020-11-17 17:50:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5137


Note You need to log in before you can comment on or make changes to this bug.