Bug 1733977

Summary: Qemu core dumped: /home/ngu/qemu/hw/intc/xics_kvm.c:321: ics_kvm_set_irq: Assertion `kernel_xics_fd != -1' failed
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: Gu Nini <ngu>
Component: qemu-kvmAssignee: David Gibson <dgibson>
Status: CLOSED ERRATA QA Contact: Zhenyu Zhang <zhenyzha>
Severity: high Docs Contact:
Priority: high    
Version: 8.1CC: dgibson, lvivier, mdeng, micai, mrezanin, qzhang, virt-maint, xianwang, xuma, yihyu, zhenyzha
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: ppc64le   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-4.1.0-2.module+el8.1.0+4012+8109dd4a Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-06 07:18:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
gdb_debug_info-07292019 none

Description Gu Nini 2019-07-29 11:26:33 UTC
Created attachment 1594248 [details]
gdb_debug_info-07292019

Description of problem:
Start a guest with spapr-vscsi system disk, try to system_reset the guest while there is io write on the system, such as issuing "dd if=/dev/urandom of=test bs=64k count=10240", then qemu core dumped:

[root@ibm-p9b-10 ngu]# ./vm3.sh
QEMU 4.0.92 monitor - type 'help' for more information
(qemu) 
(qemu) system_reset
(qemu) info usb
  Device 0.1, Port 1, Speed 480 Mb/s, Product QEMU USB Tablet, ID: usb-tablet1
(qemu) 
(qemu) 
(qemu) system_reset
(qemu) qemu-system-ppc64: /home/ngu/qemu/hw/intc/xics_kvm.c:321: ics_kvm_set_irq: Assertion `kernel_xics_fd != -1' failed.
./vm3.sh: line 25: 26865 Aborted                 (core dumped) /home/ngu/qemu/ppc64-softmmu/qemu-system-ppc64 -name 'avocado-vt-vm1' -machine pseries -nodefaults -device VGA,bus=pci.0,addr=0x2 -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_1,server,nowait -mon chardev=qmp_id_qmpmonitor1,mode=control -chardev socket,server,path=/var/tmp/avocado_1,nowait,id=chardev_serial0 -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 -device spapr-vscsi,id=scsi0,reg=0x2000 -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/ngu/rhel810-ppc64le-vscsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1,bus=scsi0.0 -device virtio-net-pci,mac=9a:7e:de:21:d0:b1,id=idqnTxXL,netdev=id8zpVhB -netdev tap,id=id8zpVhB,vhost=on -m 2048 -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 -vnc :20 -rtc base=utc,clock=host -boot order=cdn,once=d,menu=off,strict=off -no-shutdown -enable-kvm -monitor stdio
[root@ibm-p9b-10 ngu]# [root@ibm-p9b-10 ngu]# 

Version-Release number of selected component (if applicable):
Host kernel: 4.18.0-122.el8.ppc64le
Guest kernel: 4.18.0-122.el8.ppc64le
Qemu: (upstream qemu4.1) QEMU emulator version 4.0.92 (v4.1.0-rc2-33-gfff3159900-dirty)

How reproducible:
100%

Steps to Reproduce:
1. Boot up a guest with following qemu command line, the system disk is a spapr-vscsi one:

/home/ngu/qemu/ppc64-softmmu/qemu-system-ppc64 \
    -name 'avocado-vt-vm1' \
    -machine pseries  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_1,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,server,path=/var/tmp/avocado_1,nowait,id=chardev_serial0 \
    -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -device spapr-vscsi,id=scsi0,reg=0x2000 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/ngu/rhel810-ppc64le-vscsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1,bus=scsi0.0 \
    -device virtio-net-pci,mac=9a:7e:de:21:d0:b1,id=idqnTxXL,netdev=id8zpVhB  \
    -netdev tap,id=id8zpVhB,vhost=on \
    -m 2048  \
    -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :20  \
    -rtc base=utc,clock=host  \
    -boot order=cdn,once=d,menu=off,strict=off  \
    -no-shutdown \
    -enable-kvm \
    -monitor stdio

2. After the guest boots up, do some io write to the system disk:
# dd if=/dev/urandom of=test bs=64k count=10240

3. During io write to the system disk, issue system_reset to the guest in hmp:
(qemu) system_reset


Actual results:
Qemu core dumped.

Expected results:
No core dump.

Additional info:

Comment 1 Gu Nini 2019-07-29 11:29:34 UTC
Have tried on virtio-scsi system disk, there is no the bug problem.

Comment 2 Laurent Vivier 2019-07-29 14:45:30 UTC
From attachment 1594248 [details]:

...
#4  0x0000000114a50f44 in ics_kvm_set_irq () at /home/ngu/qemu/hw/intc/xics_kvm.c:321

315 void ics_kvm_set_irq(ICSState *ics, int srcno, int val)
316 {
317     struct kvm_irq_level args;
318     int rc;
319 
320     /* The KVM XICS device should be in use */
321     assert(kernel_xics_fd != -1);
322 

#5  0x0000000114a4ed50 in ics_simple_set_irq () at /home/ngu/qemu/hw/intc/xics.c:486
#6  0x0000000114acfaec in spapr_irq_set_irq_xics () at /home/ngu/qemu/hw/ppc/spapr_irq.c:218
#7  0x0000000114ad0840 in spapr_irq_set_irq_dual () at /home/ngu/qemu/hw/ppc/spapr_irq.c:585
#8  0x0000000114c5db3c in qemu_set_irq () at hw/core/irq.c:44
#9  0x0000000114aba324 in qemu_irq_pulse () at /home/ngu/qemu/include/hw/irq.h:26
#10 spapr_vio_send_crq () at /home/ngu/qemu/hw/ppc/spapr_vio.c:298
#11 0x0000000114a6d3ac in vscsi_send_iu () at /home/ngu/qemu/hw/scsi/spapr_vscsi.c:199
#12 0x0000000114a6d574 in vscsi_send_rsp ()
#13 0x0000000114a6d82c in vscsi_command_complete () at /home/ngu/qemu/hw/scsi/spapr_vscsi.c:577
#14 0x0000000114d2e518 in scsi_req_complete () at hw/scsi/scsi-bus.c:1401
#15 0x0000000114d256e8 in scsi_write_do_fua () at hw/scsi/scsi-disk.c:259
#16 0x0000000114d27568 in scsi_write_complete () at hw/scsi/scsi-disk.c:534
#17 0x0000000114e863d8 in blk_aio_complete () at block/block-backend.c:1317
#18 0x0000000114f81ea8 in coroutine_trampoline () at util/coroutine-ucontext.c:115
#19 0x00007fffa5c7817c in makecontext () from /lib64/power9/libc.so.6

On reset, spapr_irq_reset_dual() (hw/ppc/spapr_irq.c) is called:

spapr_irq_reset_dual()
...
    562     /* Destroy all KVM devices */
    563     if (kvm_irqchip_in_kernel()) {
    564         xics_kvm_disconnect(spapr, &local_err);
    
    And kernel_xics_fd is set to -1:

        444     /*
        445      * Only on P9 using the XICS-on XIVE KVM device:
        446      *
        447      * When the KVM device fd is closed, the device is destroyed and
        448      * removed from the list of devices of the VM. The VCPU presenters
        449      * are also detached from the device.
        450      */
        451     if (kernel_xics_fd != -1) {
        452         close(kernel_xics_fd);
        453         kernel_xics_fd = -1;
        454     }

So the problem happens because we reset the IRQ controller while it is in use by spapr-vscsi device

Comment 5 David Gibson 2019-08-13 02:30:34 UTC
I've reproduced this also with virtio-blk and no SMP.  This is a nasty looking regression :(.

Comment 6 David Gibson 2019-08-13 05:46:26 UTC
Ok, I think I see what's going on here.  In spapr_machine_reset() we:

1. reset the CAS vector
2. reset all devices
3. reset the irq subsystem


But (1) implicitly changes whether we're in xics or xive mode, since we determine that from the CAS state.  We don't properly set up the new state until (3) though.

In the meantime (2) can temporarily drop the BQL leading to some irqs being delivered - this attempts to deliver them as XICS, but we're not set up properly, tripping the assert().

Comment 7 David Gibson 2019-08-14 01:54:20 UTC
Ok, just squeezed an upstream fix in for this in -rc5.

Mirek, will we be getting -rc5 via rebase, or will we need to backport?

Comment 8 Miroslav Rezanina 2019-08-14 03:53:48 UTC
(In reply to David Gibson from comment #7)
> Ok, just squeezed an upstream fix in for this in -rc5.
> 
> Mirek, will we be getting -rc5 via rebase, or will we need to backport?

We will use rc4 for the build. To backport rc5 patches we use BZ 1740692

Comment 12 Zhenyu Zhang 2019-08-20 03:18:44 UTC
Version-Release number of selected component (if applicable):
Host kernel: 4.18.0-134.el8.ppc64le
Guest kernel: 4.18.0-128.el8.ppc64le
Qemu: qemu-kvm-4.1.0-4.module+el8.1.0+4020+16089f93

Steps to Reproduce:
1. Boot up a guest with following qemu command line, the system disk is a spapr-vscsi one:

/home/ngu/qemu/ppc64-softmmu/qemu-system-ppc64 \
    -name 'avocado-vt-vm1' \
    -machine pseries  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_1,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,server,path=/var/tmp/avocado_1,nowait,id=chardev_serial0 \
    -device spapr-vty,id=serial0,reg=0x30000000,chardev=chardev_serial0 \
    -device qemu-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -device spapr-vscsi,id=scsi0,reg=0x2000 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/ngu/rhel810-ppc64le-vscsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1,bus=scsi0.0 \
    -device virtio-net-pci,mac=9a:7e:de:21:d0:b1,id=idqnTxXL,netdev=id8zpVhB  \
    -netdev tap,id=id8zpVhB,vhost=on \
    -m 2048  \
    -smp 6,maxcpus=6,cores=3,threads=1,sockets=2 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :20  \
    -rtc base=utc,clock=host  \
    -boot order=cdn,once=d,menu=off,strict=off  \
    -no-shutdown \
    -enable-kvm \
    -monitor stdio

2. After the guest boots up, do some io write to the system disk:
# dd if=/dev/urandom of=test bs=64k count=10240

3. During io write to the system disk, issue system_reset to the guest in hmp:
(qemu) system_reset
(qemu) 

4.check guest /var/log/messages       ------------------------------normal

So the setting status is verified

Comment 14 errata-xmlrpc 2019-11-06 07:18:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723