Bug 1974620

Summary: Wrong panic property 'hv-panic' reply to qemu while guest os crash with panic device
Product: Red Hat Enterprise Linux 9 Reporter: yafu <yafu>
Component: kernelAssignee: Vitaly Kuznetsov <vkuznets>
kernel sub component: KVM QA Contact: leidwang <leidwang>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: jinzhao, juzhang, lcheng, leidwang, virt-maint
Version: 9.0Keywords: Automation, Regression
Target Milestone: beta   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-14 10:16:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description yafu 2021-06-22 08:07:24 UTC
Description of problem:
Wrong panic property 'hv-panic' reply to qemu while guest os crash with panic device

Version-Release number of selected component (if applicable):
5.13.0-0.rc4.33.el9.x86_64
qemu-kvm-6.0.0-5.el9.x86_64
libvirt-7.4.0-1.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Edit a guest with panic device:
#virsh edit vm1
...
<on_crash>coredump-restart</on_crash>
<device>
...
<panic model='isa'>
   <address type='isa' iobase='0x505'/>
</panic>
</device>
...


2.Start guest
#virsh start vm1
Domain 'vm1' started

2.Login guest os, stop kdump service and crash guest os:
(guest os)#systecmctl stop kdump
(guest os)#echo c > /proc/sysrq-trigger

3.After guest os crash, check the guest os status:
# virsh domstate vm1 --reason
running (booted)

4.Check the libvirtd log:
2021-06-22 06:45:22.224+0000: 470195: info : qemuMonitorJSONIOProcessLine:241 : QEMU_MONITOR_RECV_REPLY: mon=0x7f3fe4003340 reply={"return": [{"name": "type", "type": "string"},...{"name": "hv-crash", "type": "bool"}...
2021-06-22 06:45:22.619+0000: 419924: info : qemuMonitorSend:958 : QEMU_MONITOR_SEND_MSG: mon=0x7f3fe4003340 msg={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hv-crash"},"id":"libvirt-288"}
2021-06-22 06:45:22.619+0000: 470195: info : qemuMonitorIOWrite:436 : QEMU_MONITOR_IO_WRITE: mon=0x7f3fe4003340 buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hv-crash"},"id":"libvirt-288"}


Actual results:
Wrong panic property 'hv-panic' reply to qemu while guest os crash with panic device

Expected results:
Should reply 'GUEST_PANICKED' to qemu

Additional info:
1.It works well with kernel-5.11.0-2.el9.

Comment 1 leidwang@redhat.com 2021-06-22 08:31:09 UTC
Hit this issue on my test.
Host env:
Kernel-5.13.0-0.rc4.33.el9.x86_64
qemu-kvm-6.0.0-5.el9.x86_64

Guest env:
kernel-5.13.0-0.rc4.33.el9.x86_64

Steps to Reproduce:
1.boot up guest with pvpanic device
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-machine q35,memory-backend=mem-machine_mem \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x2 \
-m 14336 \
-object memory-backend-ram,size=14336M,id=mem-machine_mem  \
-smp 24,maxcpus=24,cores=12,threads=1,dies=1,sockets=2  \
-cpu 'Skylake-Server',+kvm_pv_unhalt \
-chardev socket,id=qmp_id_qmp1,server=on,wait=off,path=/tmp/avocado_jrcpkvev/monitor-qmp1-20210622-030919-vu7naQQm  \
-mon chardev=qmp_id_qmp1,mode=control \
-chardev socket,id=qmp_id_catch_monitor,server=on,wait=off,path=/tmp/avocado_jrcpkvev/monitor-catch_monitor-20210622-030919-vu7naQQm  \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=idUEPpR9 \
-chardev socket,id=chardev_serial0,server=on,wait=off,path=/tmp/avocado_jrcpkvev/serial-serial0-20210622-030919-vu7naQQm \
-device isa-serial,id=serial0,chardev=chardev_serial0  \
-chardev socket,id=seabioslog_id_20210622-030919-vu7naQQm,path=/tmp/avocado_jrcpkvev/seabios-20210622-030919-vu7naQQm,server=on,wait=off \
-device isa-debugcon,chardev=seabioslog_id_20210622-030919-vu7naQQm,iobase=0x402 \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
-device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
-blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
-device virtio-net-pci,mac=9a:b9:ed:eb:a6:12,id=ides6ztL,netdev=idC13vK7,bus=pcie-root-port-3,addr=0x0  \
-netdev tap,id=idC13vK7,vhost=on  \
-qmp tcp:127.0.0.1:4445,server,nowait \
-monitor stdio \
-vnc :0  \
-rtc base=utc,clock=host,driftfix=slew  \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-no-shutdown \
-device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5

2.stop kdump service
3.trigger crash in guest
4.check QMP output.

Actual results:
No output in QMP

Expected results:
{"timestamp": {"seconds": 1624347802, "microseconds": 978480}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}

Alse try this case with different kernel version in guest.
kernel-5.13.0-0.rc3.25.el9.x86_64  cannot get the correct event in QMP
kernel-5.13.0-0.rc2.19.el9.x86_64  cannot get the correct event in QMP
kernel-5.12.0-1.el9.x86_64         it works well

So this problem may appear from version 'kernel-5.13.0-0.rc2.19.el9.x86_64'

Comment 2 John Ferlan 2021-07-08 16:31:50 UTC
Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Comment 4 leidwang@redhat.com 2021-07-11 11:35:21 UTC
I tested this case with latest rhel9 compose, seems this issue has been fixed.
Env:
kernel-5.13.0-0.rc7.51.el9.x86_64
qemu-kvm-6.0.0-7.el9.x86_64
Guest:
kernel-5.13.0-0.rc7.51.el9.x86_64

Test steps is same as comment1

QMP output:
{"execute":"qmp_capabilities"}
{"return": {}}
{"timestamp": {"seconds": 1625823106, "microseconds": 327872}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}
{"timestamp": {"seconds": 1625823106, "microseconds": 385322}, "event": "STOP"}
{"execute":"query-status"}
{"return": {"status": "guest-panicked", "singlestep": false, "running": false}}

I noticed that the pvpanic device has a new parameter "event". I did not set this option in the qemu command, 
but this parameter is added by default to the pvpanic device (events=3). Could someone explain what this parameter does? Thanks a lot.

(qemu) info qtree
          dev: pvpanic, id "idUEPpR9"
            ioport = 1285 (0x505)
            events = 3 (0x3)

Comment 5 Vitaly Kuznetsov 2021-07-14 08:35:52 UTC
(In reply to leidwang from comment #4)
> I tested this case with latest rhel9 compose, seems this issue has been
> fixed.
> Env:
> kernel-5.13.0-0.rc7.51.el9.x86_64
> qemu-kvm-6.0.0-7.el9.x86_64
> Guest:
> kernel-5.13.0-0.rc7.51.el9.x86_64

The issue was reported with the followin env:

kernel-5.13.0-0.rc4.33.el9.x86_64
qemu-kvm-6.0.0-5.el9.x86_64

It would be interesting to know which component is responsible - qemu
or kernel as I'm unable to spot any relevant changes. The only guess
I have is that 'q35' machine type was updated.

> 
> Test steps is same as comment1
> 
> QMP output:
> {"execute":"qmp_capabilities"}
> {"return": {}}
> {"timestamp": {"seconds": 1625823106, "microseconds": 327872}, "event":
> "GUEST_PANICKED", "data": {"action": "pause"}}
> {"timestamp": {"seconds": 1625823106, "microseconds": 385322}, "event":
> "STOP"}
> {"execute":"query-status"}
> {"return": {"status": "guest-panicked", "singlestep": false, "running":
> false}}
> 
> I noticed that the pvpanic device has a new parameter "event". I did not set
> this option in the qemu command, 
> but this parameter is added by default to the pvpanic device (events=3).
> Could someone explain what this parameter does? Thanks a lot.
> 
> (qemu) info qtree
>           dev: pvpanic, id "idUEPpR9"
>             ioport = 1285 (0x505)
>             events = 3 (0x3)

'events' represents which events are supported and '3' (0b11) 
according to https://github.com/qemu/qemu/blob/master/docs/specs/pvpanic.txt
means that both

bit 0: a guest panic has happened and should be processed by the host
bit 1: a guest panic has happened and will be handled by the guest;
       the host should record it or report it, but should not affect
       the execution of the guest.

types are supported.

Comment 6 leidwang@redhat.com 2021-07-14 09:04:36 UTC
(In reply to Vitaly Kuznetsov from comment #5)
> (In reply to leidwang from comment #4)
> > I tested this case with latest rhel9 compose, seems this issue has been
> > fixed.
> > Env:
> > kernel-5.13.0-0.rc7.51.el9.x86_64
> > qemu-kvm-6.0.0-7.el9.x86_64
> > Guest:
> > kernel-5.13.0-0.rc7.51.el9.x86_64
> 
> The issue was reported with the followin env:
> 
> kernel-5.13.0-0.rc4.33.el9.x86_64
> qemu-kvm-6.0.0-5.el9.x86_64
> 
> It would be interesting to know which component is responsible - qemu
> or kernel as I'm unable to spot any relevant changes. The only guess
> I have is that 'q35' machine type was updated.

Maybe caused by kernel, since i tested it with same qemu version(comment1),
it works fine with kernel-5.12.0-1.el9.x86_64
> 
> > 
> > Test steps is same as comment1
> > 
> > QMP output:
> > {"execute":"qmp_capabilities"}
> > {"return": {}}
> > {"timestamp": {"seconds": 1625823106, "microseconds": 327872}, "event":
> > "GUEST_PANICKED", "data": {"action": "pause"}}
> > {"timestamp": {"seconds": 1625823106, "microseconds": 385322}, "event":
> > "STOP"}
> > {"execute":"query-status"}
> > {"return": {"status": "guest-panicked", "singlestep": false, "running":
> > false}}
> > 
> > I noticed that the pvpanic device has a new parameter "event". I did not set
> > this option in the qemu command, 
> > but this parameter is added by default to the pvpanic device (events=3).
> > Could someone explain what this parameter does? Thanks a lot.
> > 
> > (qemu) info qtree
> >           dev: pvpanic, id "idUEPpR9"
> >             ioport = 1285 (0x505)
> >             events = 3 (0x3)
> 
> 'events' represents which events are supported and '3' (0b11) 
> according to https://github.com/qemu/qemu/blob/master/docs/specs/pvpanic.txt
> means that both
> 
> bit 0: a guest panic has happened and should be processed by the host
> bit 1: a guest panic has happened and will be handled by the guest;
>        the host should record it or report it, but should not affect
>        the execution of the guest.
> 
> types are supported.
Thanks a lot.

Comment 7 Vitaly Kuznetsov 2021-07-14 10:16:59 UTC
I'm closing this as CLOSED/WORKSFORME as the issue seems to be gone. Plese reopen in case the issue reproduces again.