1458705 – pvdump: QMP reports "GUEST_PANICKED" event but HMP still shows VM running after guest crashed

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1458705 - pvdump: QMP reports "GUEST_PANICKED" event but HMP still shows VM running after guest crashed

Summary: pvdump: QMP reports "GUEST_PANICKED" event but HMP still shows VM running aft...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	qemu-kvm-rhev
Sub Component:
Version:	7.4
Hardware:	ppc64le
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	David Gibson
QA Contact:	yilzhang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-06-05 09:33 UTC by yilzhang
Modified:	2017-08-02 04:41 UTC (History)
CC List:	9 users (show)
Fixed In Version:	qemu-kvm-rhev-2.9.0-12.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-02 04:41:00 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:2392	0	normal	SHIPPED_LIVE	Important: qemu-kvm-rhev security, bug fix, and enhancement update	2017-08-01 20:04:36 UTC

Description yilzhang 2017-06-05 09:33:18 UTC

Description of problem:
When testing pvdump, after I stop kdump service and trigger a crash inside guest, QMP reports "GUEST_PANICKED" event but HMP still shows VM "running".

Version-Release number of selected component (if applicable):
HOST:
  kernel: 3.10.0-675.el7.ppc64le
  qemu:   qemu-kvm-rhev-2.9.0-7.el7.ppc64le
  SLOF:   SLOF-20170303-4.git66d250e.el7.noarch
GUEST:  kernel-3.10.0-675.el7.ppc64le

How reproducible:
100%

Steps to Reproduce:
1. Boot up guest
2. Conenct QMP
   # telnet $HostIP 9990
3. Check the HMP monitor status
   (qemu) info status
4. Stop kdump, and trigger crash in guest
   # service kdump stop
   # echo c >/proc/sysrq-trigger
5. Check guest status with HMP and QMP


Actual results:
HMP: 
(qemu) info status
VM status: running
QMP:
{"timestamp": {"seconds": 1496653822, "microseconds": 102423}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}


Expected results:
HMP:
(qemu) info status
VM status: **paused (guest-panicked)**
QMP:
{"timestamp": {"seconds": 1496653822, "microseconds": 102423}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}


Additional info:
Qemu command line used to boot up guest:
/usr/libexec/qemu-kvm \
 -name yilzhang_vm \
 -smp 8,maxcpus=20,sockets=2,cores=2,threads=4 \
 -m 8192 \
-serial unix:/tmp/ttyS0,server,nowait \
-no-shutdown \
 -rtc base=localtime,clock=host \
 -boot menu=on \
 -monitor stdio \
 -vnc 0:90 \
 -qmp tcp:0:9990,server,nowait \
 -device usb-tablet,id=usb-table0 \
 -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c3:e7:84 \
 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \
 -device virtio-scsi-pci,id=scsi0 \
 -drive file=rhel.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0 \

Comment 2 Qunfang Zhang 2017-06-05 10:14:08 UTC

Hi, Yilin

Is this a power specific bug?

Comment 3 hachen 2017-06-06 02:13:33 UTC

X86 doesn't have this issue.

Comment 4 Qunfang Zhang 2017-06-06 02:19:31 UTC

(In reply to hachen from comment #3)
> X86 doesn't have this issue.

Thanks Haotong for confirmation.

Comment 5 yilzhang 2017-06-06 03:44:57 UTC

(In reply to yilzhang from comment #0)
> Description of problem:
> When testing pvdump, after I stop kdump service and trigger a crash inside
> guest, QMP reports "GUEST_PANICKED" event but HMP still shows VM "running".
> 
> Version-Release number of selected component (if applicable):
> HOST:
>   kernel: 3.10.0-675.el7.ppc64le
>   qemu:   qemu-kvm-rhev-2.9.0-7.el7.ppc64le
>   SLOF:   SLOF-20170303-4.git66d250e.el7.noarch
> GUEST:  kernel-3.10.0-675.el7.ppc64le
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. Boot up guest
> 2. Conenct QMP
>    # telnet $HostIP 9990
       { "execute": "qmp_capabilities" }
> 3. Check the HMP monitor status
>    (qemu) info status
> 4. Stop kdump, and trigger crash in guest
>    # service kdump stop
>    # echo c >/proc/sysrq-trigger
> 5. Check guest status with HMP and QMP
> 
> 
> Actual results:
> HMP: 
> (qemu) info status
> VM status: running
> QMP:
> {"timestamp": {"seconds": 1496653822, "microseconds": 102423}, "event":
> "GUEST_PANICKED", "data": {"action": "pause"}}
> 
> 
> Expected results:
> HMP:
> (qemu) info status
> VM status: **paused (guest-panicked)**
> QMP:
> {"timestamp": {"seconds": 1496653822, "microseconds": 102423}, "event":
> "GUEST_PANICKED", "data": {"action": "pause"}}
> 
> 
> Additional info:
> Qemu command line used to boot up guest:
> /usr/libexec/qemu-kvm \
>  -name yilzhang_vm \
>  -smp 8,maxcpus=20,sockets=2,cores=2,threads=4 \
>  -m 8192 \
> -serial unix:/tmp/ttyS0,server,nowait \
> -no-shutdown \
>  -rtc base=localtime,clock=host \
>  -boot menu=on \
>  -monitor stdio \
>  -vnc 0:90 \
>  -qmp tcp:0:9990,server,nowait \
>  -device usb-tablet,id=usb-table0 \
>  -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c3:e7:84 \
>  -netdev
> tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \
>  -device virtio-scsi-pci,id=scsi0 \
>  -drive
> file=rhel.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,
> werror=stop,rerror=stop \
>  -device scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0 \

Comment 6 David Gibson 2017-06-07 04:07:30 UTC

I've reproduced this both with package and upstream.

Indeed it appears that although qemu detects and reports the panic, it doesn't actually pause the VM.  Since this is mostly handled in generic code, I'm not quite sure how we get a Power specific bug here, but I'm investigating.

Comment 7 David Gibson 2017-06-07 04:38:02 UTC

Ok, I've located the problem and have written an upstream patch to post shortly.

Comment 8 David Gibson 2017-06-07 07:22:00 UTC

Upstream patch is posted.

I don't believe this is a regression, which means it may be to late to look at
a backport to RHEL 7.4.  We might have to wait until 7.5 (in which case we should get it via rebase).

Comment 9 David Gibson 2017-06-14 02:53:59 UTC

Scratch build incorporating this fix completed at:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13420539

Comment 10 Qunfang Zhang 2017-06-14 09:08:05 UTC

Hello, Martin

Could you provide us a green light for the pm_ack? Thanks.

Comment 11 Qunfang Zhang 2017-06-14 09:08:47 UTC

(In reply to Qunfang Zhang from comment #10)
> Hello, Martin
> 
> Could you provide us a green light for the pm_ack? Thanks.

Sorry, I mean the "blocker+" flag since pm_ack+ is already set.

Comment 12 Miroslav Rezanina 2017-06-20 06:03:01 UTC

Fix included in qemu-kvm-rhev-2.9.0-12.el7

Comment 14 yilzhang 2017-06-20 09:44:08 UTC

This bug has been verified on PPC platform


*********************  Bug reproduced on PPC platform:  *********************
Host:   kernel:   3.10.0-681.el7.ppc64le
        qemu-kvm-rhev-2.9.0-10.el7.ppc64le
        SLOF-20170303-4.git66d250e.el7.noarch
Guest:  3.10.0-681.el7.ppc64le

Steps to Reproduce: the same as bug reported
Actual results:
HMP:   (qemu) info status
       VM status: running
QMP:   {"timestamp": {"seconds": 1497947877, "microseconds": 224213}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}



*********************  Bug verify on ppc platform *********************
This bug is verified on the following version:
Host:    kernel: 3.10.0-681.el7.ppc64le
         qemu-kvm-rhev-2.9.0-12.el7.ppc64le
         SLOF-20170303-4.git66d250e.el7.noarch.rpm
Guest:   3.10.0-681.el7.ppc64le 


Steps:  the same as bug reported
Actual results:
HMP:   (qemu) info status
       VM status: paused (guest-panicked)
QMP:  {"timestamp": {"seconds": 1497951537, "microseconds": 652401}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}



So, the result is expected,  this bug is fixed against qemu-kvm-rhev-2.9.0-12.el7.ppc64le

Comment 16 errata-xmlrpc 2017-08-02 04:41:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392

Note You need to log in before you can comment on or make changes to this bug.