Bug 1458705 - pvdump: QMP reports "GUEST_PANICKED" event but HMP still shows VM running after guest crashed
Summary: pvdump: QMP reports "GUEST_PANICKED" event but HMP still shows VM running aft...
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: ppc64le
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: David Gibson
QA Contact: yilzhang
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-05 09:33 UTC by yilzhang
Modified: 2017-08-02 04:41 UTC (History)
9 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2017-08-02 04:41:00 UTC


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:2392 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2017-08-01 20:04:36 UTC

Description yilzhang 2017-06-05 09:33:18 UTC
Description of problem:
When testing pvdump, after I stop kdump service and trigger a crash inside guest, QMP reports "GUEST_PANICKED" event but HMP still shows VM "running".

Version-Release number of selected component (if applicable):
HOST:
  kernel: 3.10.0-675.el7.ppc64le
  qemu:   qemu-kvm-rhev-2.9.0-7.el7.ppc64le
  SLOF:   SLOF-20170303-4.git66d250e.el7.noarch
GUEST:  kernel-3.10.0-675.el7.ppc64le

How reproducible:
100%

Steps to Reproduce:
1. Boot up guest
2. Conenct QMP
   # telnet $HostIP 9990
3. Check the HMP monitor status
   (qemu) info status
4. Stop kdump, and trigger crash in guest
   # service kdump stop
   # echo c >/proc/sysrq-trigger
5. Check guest status with HMP and QMP


Actual results:
HMP: 
(qemu) info status
VM status: running
QMP:
{"timestamp": {"seconds": 1496653822, "microseconds": 102423}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}


Expected results:
HMP:
(qemu) info status
VM status: **paused (guest-panicked)**
QMP:
{"timestamp": {"seconds": 1496653822, "microseconds": 102423}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}


Additional info:
Qemu command line used to boot up guest:
/usr/libexec/qemu-kvm \
 -name yilzhang_vm \
 -smp 8,maxcpus=20,sockets=2,cores=2,threads=4 \
 -m 8192 \
-serial unix:/tmp/ttyS0,server,nowait \
-no-shutdown \
 -rtc base=localtime,clock=host \
 -boot menu=on \
 -monitor stdio \
 -vnc 0:90 \
 -qmp tcp:0:9990,server,nowait \
 -device usb-tablet,id=usb-table0 \
 -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c3:e7:84 \
 -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \
 -device virtio-scsi-pci,id=scsi0 \
 -drive file=rhel.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,werror=stop,rerror=stop \
 -device scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0 \

Comment 2 Qunfang Zhang 2017-06-05 10:14:08 UTC
Hi, Yilin

Is this a power specific bug?

Comment 3 hachen 2017-06-06 02:13:33 UTC
X86 doesn't have this issue.

Comment 4 Qunfang Zhang 2017-06-06 02:19:31 UTC
(In reply to hachen from comment #3)
> X86 doesn't have this issue.

Thanks Haotong for confirmation.

Comment 5 yilzhang 2017-06-06 03:44:57 UTC
(In reply to yilzhang from comment #0)
> Description of problem:
> When testing pvdump, after I stop kdump service and trigger a crash inside
> guest, QMP reports "GUEST_PANICKED" event but HMP still shows VM "running".
> 
> Version-Release number of selected component (if applicable):
> HOST:
>   kernel: 3.10.0-675.el7.ppc64le
>   qemu:   qemu-kvm-rhev-2.9.0-7.el7.ppc64le
>   SLOF:   SLOF-20170303-4.git66d250e.el7.noarch
> GUEST:  kernel-3.10.0-675.el7.ppc64le
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. Boot up guest
> 2. Conenct QMP
>    # telnet $HostIP 9990
       { "execute": "qmp_capabilities" }
> 3. Check the HMP monitor status
>    (qemu) info status
> 4. Stop kdump, and trigger crash in guest
>    # service kdump stop
>    # echo c >/proc/sysrq-trigger
> 5. Check guest status with HMP and QMP
> 
> 
> Actual results:
> HMP: 
> (qemu) info status
> VM status: running
> QMP:
> {"timestamp": {"seconds": 1496653822, "microseconds": 102423}, "event":
> "GUEST_PANICKED", "data": {"action": "pause"}}
> 
> 
> Expected results:
> HMP:
> (qemu) info status
> VM status: **paused (guest-panicked)**
> QMP:
> {"timestamp": {"seconds": 1496653822, "microseconds": 102423}, "event":
> "GUEST_PANICKED", "data": {"action": "pause"}}
> 
> 
> Additional info:
> Qemu command line used to boot up guest:
> /usr/libexec/qemu-kvm \
>  -name yilzhang_vm \
>  -smp 8,maxcpus=20,sockets=2,cores=2,threads=4 \
>  -m 8192 \
> -serial unix:/tmp/ttyS0,server,nowait \
> -no-shutdown \
>  -rtc base=localtime,clock=host \
>  -boot menu=on \
>  -monitor stdio \
>  -vnc 0:90 \
>  -qmp tcp:0:9990,server,nowait \
>  -device usb-tablet,id=usb-table0 \
>  -device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c3:e7:84 \
>  -netdev
> tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \
>  -device virtio-scsi-pci,id=scsi0 \
>  -drive
> file=rhel.qcow2,format=qcow2,id=drive_sysdisk,if=none,cache=none,aio=native,
> werror=stop,rerror=stop \
>  -device scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0 \

Comment 6 David Gibson 2017-06-07 04:07:30 UTC
I've reproduced this both with package and upstream.

Indeed it appears that although qemu detects and reports the panic, it doesn't actually pause the VM.  Since this is mostly handled in generic code, I'm not quite sure how we get a Power specific bug here, but I'm investigating.

Comment 7 David Gibson 2017-06-07 04:38:02 UTC
Ok, I've located the problem and have written an upstream patch to post shortly.

Comment 8 David Gibson 2017-06-07 07:22:00 UTC
Upstream patch is posted.

I don't believe this is a regression, which means it may be to late to look at
a backport to RHEL 7.4.  We might have to wait until 7.5 (in which case we should get it via rebase).

Comment 9 David Gibson 2017-06-14 02:53:59 UTC
Scratch build incorporating this fix completed at:

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=13420539

Comment 10 Qunfang Zhang 2017-06-14 09:08:05 UTC
Hello, Martin

Could you provide us a green light for the pm_ack? Thanks.

Comment 11 Qunfang Zhang 2017-06-14 09:08:47 UTC
(In reply to Qunfang Zhang from comment #10)
> Hello, Martin
> 
> Could you provide us a green light for the pm_ack? Thanks.

Sorry, I mean the "blocker+" flag since pm_ack+ is already set.

Comment 12 Miroslav Rezanina 2017-06-20 06:03:01 UTC
Fix included in qemu-kvm-rhev-2.9.0-12.el7

Comment 14 yilzhang 2017-06-20 09:44:08 UTC
This bug has been verified on PPC platform


*********************  Bug reproduced on PPC platform:  *********************
Host:   kernel:   3.10.0-681.el7.ppc64le
        qemu-kvm-rhev-2.9.0-10.el7.ppc64le
        SLOF-20170303-4.git66d250e.el7.noarch
Guest:  3.10.0-681.el7.ppc64le

Steps to Reproduce: the same as bug reported
Actual results:
HMP:   (qemu) info status
       VM status: running
QMP:   {"timestamp": {"seconds": 1497947877, "microseconds": 224213}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}



*********************  Bug verify on ppc platform *********************
This bug is verified on the following version:
Host:    kernel: 3.10.0-681.el7.ppc64le
         qemu-kvm-rhev-2.9.0-12.el7.ppc64le
         SLOF-20170303-4.git66d250e.el7.noarch.rpm
Guest:   3.10.0-681.el7.ppc64le 


Steps:  the same as bug reported
Actual results:
HMP:   (qemu) info status
       VM status: paused (guest-panicked)
QMP:  {"timestamp": {"seconds": 1497951537, "microseconds": 652401}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}



So, the result is expected,  this bug is fixed against qemu-kvm-rhev-2.9.0-12.el7.ppc64le

Comment 16 errata-xmlrpc 2017-08-02 04:41:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392


Note You need to log in before you can comment on or make changes to this bug.