Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1974620

Summary: Wrong panic property 'hv-panic' reply to qemu while guest os crash with panic device
Product: Red Hat Enterprise Linux 9 Reporter: yafu <yafu>
Component: kernelAssignee: Vitaly Kuznetsov <vkuznets>
kernel sub component: KVM QA Contact: leidwang <leidwang>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: jinzhao, juzhang, lcheng, leidwang, virt-maint
Version: 9.0Keywords: Automation, Regression
Target Milestone: betaFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-14 10:16:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description yafu 2021-06-22 08:07:24 UTC
Description of problem:
Wrong panic property 'hv-panic' reply to qemu while guest os crash with panic device

Version-Release number of selected component (if applicable):
5.13.0-0.rc4.33.el9.x86_64
qemu-kvm-6.0.0-5.el9.x86_64
libvirt-7.4.0-1.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Edit a guest with panic device:
#virsh edit vm1
...
<on_crash>coredump-restart</on_crash>
<device>
...
<panic model='isa'>
   <address type='isa' iobase='0x505'/>
</panic>
</device>
...


2.Start guest
#virsh start vm1
Domain 'vm1' started

2.Login guest os, stop kdump service and crash guest os:
(guest os)#systecmctl stop kdump
(guest os)#echo c > /proc/sysrq-trigger

3.After guest os crash, check the guest os status:
# virsh domstate vm1 --reason
running (booted)

4.Check the libvirtd log:
2021-06-22 06:45:22.224+0000: 470195: info : qemuMonitorJSONIOProcessLine:241 : QEMU_MONITOR_RECV_REPLY: mon=0x7f3fe4003340 reply={"return": [{"name": "type", "type": "string"},...{"name": "hv-crash", "type": "bool"}...
2021-06-22 06:45:22.619+0000: 419924: info : qemuMonitorSend:958 : QEMU_MONITOR_SEND_MSG: mon=0x7f3fe4003340 msg={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hv-crash"},"id":"libvirt-288"}
2021-06-22 06:45:22.619+0000: 470195: info : qemuMonitorIOWrite:436 : QEMU_MONITOR_IO_WRITE: mon=0x7f3fe4003340 buf={"execute":"qom-get","arguments":{"path":"/machine/unattached/device[0]","property":"hv-crash"},"id":"libvirt-288"}


Actual results:
Wrong panic property 'hv-panic' reply to qemu while guest os crash with panic device

Expected results:
Should reply 'GUEST_PANICKED' to qemu

Additional info:
1.It works well with kernel-5.11.0-2.el9.

Comment 1 leidwang@redhat.com 2021-06-22 08:31:09 UTC
Hit this issue on my test.
Host env:
Kernel-5.13.0-0.rc4.33.el9.x86_64
qemu-kvm-6.0.0-5.el9.x86_64

Guest env:
kernel-5.13.0-0.rc4.33.el9.x86_64

Steps to Reproduce:
1.boot up guest with pvpanic device
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox on  \
-machine q35,memory-backend=mem-machine_mem \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
-nodefaults \
-device VGA,bus=pcie.0,addr=0x2 \
-m 14336 \
-object memory-backend-ram,size=14336M,id=mem-machine_mem  \
-smp 24,maxcpus=24,cores=12,threads=1,dies=1,sockets=2  \
-cpu 'Skylake-Server',+kvm_pv_unhalt \
-chardev socket,id=qmp_id_qmp1,server=on,wait=off,path=/tmp/avocado_jrcpkvev/monitor-qmp1-20210622-030919-vu7naQQm  \
-mon chardev=qmp_id_qmp1,mode=control \
-chardev socket,id=qmp_id_catch_monitor,server=on,wait=off,path=/tmp/avocado_jrcpkvev/monitor-catch_monitor-20210622-030919-vu7naQQm  \
-mon chardev=qmp_id_catch_monitor,mode=control \
-device pvpanic,ioport=0x505,id=idUEPpR9 \
-chardev socket,id=chardev_serial0,server=on,wait=off,path=/tmp/avocado_jrcpkvev/serial-serial0-20210622-030919-vu7naQQm \
-device isa-serial,id=serial0,chardev=chardev_serial0  \
-chardev socket,id=seabioslog_id_20210622-030919-vu7naQQm,path=/tmp/avocado_jrcpkvev/seabios-20210622-030919-vu7naQQm,server=on,wait=off \
-device isa-debugcon,chardev=seabioslog_id_20210622-030919-vu7naQQm,iobase=0x402 \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2 \
-device qemu-xhci,id=usb1,bus=pcie-root-port-1,addr=0x0 \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-2,addr=0x0 \
-blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
-blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
-device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
-device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
-device virtio-net-pci,mac=9a:b9:ed:eb:a6:12,id=ides6ztL,netdev=idC13vK7,bus=pcie-root-port-3,addr=0x0  \
-netdev tap,id=idC13vK7,vhost=on  \
-qmp tcp:127.0.0.1:4445,server,nowait \
-monitor stdio \
-vnc :0  \
-rtc base=utc,clock=host,driftfix=slew  \
-boot menu=off,order=cdn,once=c,strict=off \
-enable-kvm \
-no-shutdown \
-device pcie-root-port,id=pcie_extra_root_port_0,multifunction=on,bus=pcie.0,addr=0x3,chassis=5

2.stop kdump service
3.trigger crash in guest
4.check QMP output.

Actual results:
No output in QMP

Expected results:
{"timestamp": {"seconds": 1624347802, "microseconds": 978480}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}

Alse try this case with different kernel version in guest.
kernel-5.13.0-0.rc3.25.el9.x86_64  cannot get the correct event in QMP
kernel-5.13.0-0.rc2.19.el9.x86_64  cannot get the correct event in QMP
kernel-5.12.0-1.el9.x86_64         it works well

So this problem may appear from version 'kernel-5.13.0-0.rc2.19.el9.x86_64'

Comment 2 John Ferlan 2021-07-08 16:31:50 UTC
Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Comment 4 leidwang@redhat.com 2021-07-11 11:35:21 UTC
I tested this case with latest rhel9 compose, seems this issue has been fixed.
Env:
kernel-5.13.0-0.rc7.51.el9.x86_64
qemu-kvm-6.0.0-7.el9.x86_64
Guest:
kernel-5.13.0-0.rc7.51.el9.x86_64

Test steps is same as comment1

QMP output:
{"execute":"qmp_capabilities"}
{"return": {}}
{"timestamp": {"seconds": 1625823106, "microseconds": 327872}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}
{"timestamp": {"seconds": 1625823106, "microseconds": 385322}, "event": "STOP"}
{"execute":"query-status"}
{"return": {"status": "guest-panicked", "singlestep": false, "running": false}}

I noticed that the pvpanic device has a new parameter "event". I did not set this option in the qemu command, 
but this parameter is added by default to the pvpanic device (events=3). Could someone explain what this parameter does? Thanks a lot.

(qemu) info qtree
          dev: pvpanic, id "idUEPpR9"
            ioport = 1285 (0x505)
            events = 3 (0x3)

Comment 5 Vitaly Kuznetsov 2021-07-14 08:35:52 UTC
(In reply to leidwang from comment #4)
> I tested this case with latest rhel9 compose, seems this issue has been
> fixed.
> Env:
> kernel-5.13.0-0.rc7.51.el9.x86_64
> qemu-kvm-6.0.0-7.el9.x86_64
> Guest:
> kernel-5.13.0-0.rc7.51.el9.x86_64

The issue was reported with the followin env:

kernel-5.13.0-0.rc4.33.el9.x86_64
qemu-kvm-6.0.0-5.el9.x86_64

It would be interesting to know which component is responsible - qemu
or kernel as I'm unable to spot any relevant changes. The only guess
I have is that 'q35' machine type was updated.

> 
> Test steps is same as comment1
> 
> QMP output:
> {"execute":"qmp_capabilities"}
> {"return": {}}
> {"timestamp": {"seconds": 1625823106, "microseconds": 327872}, "event":
> "GUEST_PANICKED", "data": {"action": "pause"}}
> {"timestamp": {"seconds": 1625823106, "microseconds": 385322}, "event":
> "STOP"}
> {"execute":"query-status"}
> {"return": {"status": "guest-panicked", "singlestep": false, "running":
> false}}
> 
> I noticed that the pvpanic device has a new parameter "event". I did not set
> this option in the qemu command, 
> but this parameter is added by default to the pvpanic device (events=3).
> Could someone explain what this parameter does? Thanks a lot.
> 
> (qemu) info qtree
>           dev: pvpanic, id "idUEPpR9"
>             ioport = 1285 (0x505)
>             events = 3 (0x3)

'events' represents which events are supported and '3' (0b11) 
according to https://github.com/qemu/qemu/blob/master/docs/specs/pvpanic.txt
means that both

bit 0: a guest panic has happened and should be processed by the host
bit 1: a guest panic has happened and will be handled by the guest;
       the host should record it or report it, but should not affect
       the execution of the guest.

types are supported.

Comment 6 leidwang@redhat.com 2021-07-14 09:04:36 UTC
(In reply to Vitaly Kuznetsov from comment #5)
> (In reply to leidwang from comment #4)
> > I tested this case with latest rhel9 compose, seems this issue has been
> > fixed.
> > Env:
> > kernel-5.13.0-0.rc7.51.el9.x86_64
> > qemu-kvm-6.0.0-7.el9.x86_64
> > Guest:
> > kernel-5.13.0-0.rc7.51.el9.x86_64
> 
> The issue was reported with the followin env:
> 
> kernel-5.13.0-0.rc4.33.el9.x86_64
> qemu-kvm-6.0.0-5.el9.x86_64
> 
> It would be interesting to know which component is responsible - qemu
> or kernel as I'm unable to spot any relevant changes. The only guess
> I have is that 'q35' machine type was updated.

Maybe caused by kernel, since i tested it with same qemu version(comment1),
it works fine with kernel-5.12.0-1.el9.x86_64
> 
> > 
> > Test steps is same as comment1
> > 
> > QMP output:
> > {"execute":"qmp_capabilities"}
> > {"return": {}}
> > {"timestamp": {"seconds": 1625823106, "microseconds": 327872}, "event":
> > "GUEST_PANICKED", "data": {"action": "pause"}}
> > {"timestamp": {"seconds": 1625823106, "microseconds": 385322}, "event":
> > "STOP"}
> > {"execute":"query-status"}
> > {"return": {"status": "guest-panicked", "singlestep": false, "running":
> > false}}
> > 
> > I noticed that the pvpanic device has a new parameter "event". I did not set
> > this option in the qemu command, 
> > but this parameter is added by default to the pvpanic device (events=3).
> > Could someone explain what this parameter does? Thanks a lot.
> > 
> > (qemu) info qtree
> >           dev: pvpanic, id "idUEPpR9"
> >             ioport = 1285 (0x505)
> >             events = 3 (0x3)
> 
> 'events' represents which events are supported and '3' (0b11) 
> according to https://github.com/qemu/qemu/blob/master/docs/specs/pvpanic.txt
> means that both
> 
> bit 0: a guest panic has happened and should be processed by the host
> bit 1: a guest panic has happened and will be handled by the guest;
>        the host should record it or report it, but should not affect
>        the execution of the guest.
> 
> types are supported.
Thanks a lot.

Comment 7 Vitaly Kuznetsov 2021-07-14 10:16:59 UTC
I'm closing this as CLOSED/WORKSFORME as the issue seems to be gone. Plese reopen in case the issue reproduces again.