Bug 1898488

Summary: [virtio-win][pvpanic] Test the support of the PVPANIC_CRASHLOADED
Product: Red Hat Enterprise Linux 8 Reporter: Yvugenfi <yvugenfi>
Component: virtio-winAssignee: Yvugenfi <yvugenfi>
virtio-win sub component: virtio-win-prewhql QA Contact: Virtualization Bugs <virt-bugs>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: unspecified CC: ailan, coli, jinzhao, juzhang, leidwang, lijin, mdean
Version: 8.3Keywords: RFE, Triaged
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1900966 (view as bug list) Environment:
Last Closed: 2021-11-09 18:52:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1900966    
Bug Blocks:    

Description Yvugenfi@redhat.com 2020-11-17 10:00:33 UTC
Description of problem:
There is a change in the upstream virtio-win driver to support PVPANIC_CRASHLOADED event by the pvpanic driver.
https://github.com/virtio-win/kvm-guest-drivers-windows/pull/516

We need to test the changes when downstream QEMU will support PVPANIC_CRASHLOADED event as well.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Yu Wang 2021-01-29 06:49:03 UTC
Hi,

I ran this case on 8.4.0, it shows "PVPANIC_CRASHLOADED" now,
And in comment#0 https://github.com/virtio-win/kvm-guest-drivers-windows/pull/516,
when the crash dump is enabled in Windows guest, PVPANIC_CRASHLOADED will be triggered before the crash dump. It indicates a crash dump file will be generated in the guest. Otherwise, PVPANIC_PANICKED will be emitted. This means the guest crashes without a crash dump file created.

So I have some question about this:
1 In my understanding, we get PVPANIC_PANICKED or PVPANIC_CRASHLOADED is right for this case.
It depends on whether we generating the dump file. Am I right?

But how to trigger no dump file (PVPANIC_PANICKED) situation to check PVPANIC_PANICKED event is right?

2 Can we verify this case now? And do you know whitch qemu version and virtio-win build start to support this function?

Thanks
Yu Wang

Comment 2 Yu Wang 2021-01-29 09:03:11 UTC
(In reply to Yu Wang from comment #1)
> Hi,

Hi Yan,

Could you help to answer the comment about this below?
> 
> I ran this case on 8.4.0, it shows "PVPANIC_CRASHLOADED" now,
> And in comment#0
> https://github.com/virtio-win/kvm-guest-drivers-windows/pull/516,
> when the crash dump is enabled in Windows guest, PVPANIC_CRASHLOADED will be
> triggered before the crash dump. It indicates a crash dump file will be
> generated in the guest. Otherwise, PVPANIC_PANICKED will be emitted. This
> means the guest crashes without a crash dump file created.
> 
> So I have some question about this:
> 1 In my understanding, we get PVPANIC_PANICKED or PVPANIC_CRASHLOADED is
> right for this case.
> It depends on whether we generating the dump file. Am I right?
> 
> But how to trigger no dump file (PVPANIC_PANICKED) situation to check
> PVPANIC_PANICKED event is right?
> 
> 2 Can we verify this case now? And do you know which 
> virtio-win build start to support this function?

change needinfo to Yan

> 
> Thanks
> Yu Wang

Comment 3 Yvugenfi@redhat.com 2021-02-01 09:15:39 UTC
(In reply to Yu Wang from comment #1)
> Hi,

Hello Yu Wang,

As far as I see QEMU 5.2 has the code to support PVPANIC_CRASHLOADED.


> 
> I ran this case on 8.4.0, it shows "PVPANIC_CRASHLOADED" now,
> And in comment#0
> https://github.com/virtio-win/kvm-guest-drivers-windows/pull/516,
> when the crash dump is enabled in Windows guest, PVPANIC_CRASHLOADED will be
> triggered before the crash dump. It indicates a crash dump file will be
> generated in the guest. Otherwise, PVPANIC_PANICKED will be emitted. This
> means the guest crashes without a crash dump file created.

There is different meaning for the flags. 
PVPANIC_PANICKED (bit 0) - guest might be rebooted.
PVPANIC_CRASHLOADED (bit 1) - guest should not be rebooted by QEMU\libvirt in order to provide the guest a chance to write the crash dump file. pvpanic documentation in QEMU: https://github.com/qemu/qemu/blob/master/docs/specs/pvpanic.txt:

Bit Definition
--------------
bit 0: a guest panic has happened and should be processed by the host
bit 1: a guest panic has happened and will be handled by the guest;
       the host should record it or report it, but should not affect
       the execution of the guest.

> 
> So I have some question about this:
> 1 In my understanding, we get PVPANIC_PANICKED or PVPANIC_CRASHLOADED is
> right for this case.
> It depends on whether we generating the dump file. Am I right?
> 
> But how to trigger no dump file (PVPANIC_PANICKED) situation to check
> PVPANIC_PANICKED event is right?
> 
> 2 Can we verify this case now? And do you know whitch qemu version and
> virtio-win build start to support this function?
> 
> Thanks
> Yu Wang


Best regards,
Yab.

Comment 4 Yvugenfi@redhat.com 2021-02-02 10:39:16 UTC
> 2 Can we verify this case now? And do you know which 
> virtio-win build start to support this function?

According to Vadim builds after 190.

Comment 5 Yu Wang 2021-02-04 06:53:08 UTC
(In reply to Yan Vugenfirer from comment #3)
> (In reply to Yu Wang from comment #1)
> > Hi,
> 
> Hello Yu Wang,
> 
> As far as I see QEMU 5.2 has the code to support PVPANIC_CRASHLOADED.
> 
> 
> > 
> > I ran this case on 8.4.0, it shows "PVPANIC_CRASHLOADED" now,
> > And in comment#0
> > https://github.com/virtio-win/kvm-guest-drivers-windows/pull/516,
> > when the crash dump is enabled in Windows guest, PVPANIC_CRASHLOADED will be
> > triggered before the crash dump. It indicates a crash dump file will be
> > generated in the guest. Otherwise, PVPANIC_PANICKED will be emitted. This
> > means the guest crashes without a crash dump file created.
> 
> There is different meaning for the flags. 
> PVPANIC_PANICKED (bit 0) - guest might be rebooted.
> PVPANIC_CRASHLOADED (bit 1) - guest should not be rebooted by QEMU\libvirt
> in order to provide the guest a chance to write the crash dump file. pvpanic
> documentation in QEMU:
> https://github.com/qemu/qemu/blob/master/docs/specs/pvpanic.txt:
> 
> Bit Definition
> --------------
> bit 0: a guest panic has happened and should be processed by the host

When "PVPANIC_PANICKED", guest will shutdown (not reboot) on my side. Is that right?

And I don't know how to trigger guest panic? I changed "write debugging information" to none

> bit 1: a guest panic has happened and will be handled by the guest;
>        the host should record it or report it, but should not affect
>        the execution of the guest.

Thanks for your explanation.

And I want to confirm with you the verify steps:

1 boot guest with pvpanic device
2 change the register HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl  set NMICrashDump DWORD value to 1.
3 {"execute":"inject-nmi"} in qmp
4 check if the dump file is generating and qmp event
 

Expected Result: 
1 if dump collecting to 100% in monitor and event "PVPANIC_CRASHLOADED" shows in qmp and guest will reboot/or not reboot
  (depends on guest settings)
2 if no dump collecting in monitor and "GUEST_PANICKED" shows in qmp and guest will shutdown (triggered by "write debugging information" to none)

Is the test step ok to verify this bug?

If yes, for all the pvpanic cases, we should change to the expected results right?
If no, how to improve the test steps/expected results?


Thanks
Yu Wang


> 
> > 
> > So I have some question about this:
> > 1 In my understanding, we get PVPANIC_PANICKED or PVPANIC_CRASHLOADED is
> > right for this case.
> > It depends on whether we generating the dump file. Am I right?
> > 
> > But how to trigger no dump file (PVPANIC_PANICKED) situation to check
> > PVPANIC_PANICKED event is right?
> > 
> > 2 Can we verify this case now? And do you know whitch qemu version and
> > virtio-win build start to support this function?
> > 
> > Thanks
> > Yu Wang
> 
> 
> Best regards,
> Yab.

Comment 6 Yvugenfi@redhat.com 2021-02-04 07:57:46 UTC
(In reply to Yu Wang from comment #5)
> (In reply to Yan Vugenfirer from comment #3)
> > (In reply to Yu Wang from comment #1)
> > > Hi,
> > 
> > Hello Yu Wang,
> > 
> > As far as I see QEMU 5.2 has the code to support PVPANIC_CRASHLOADED.
> > 
> > 
> > > 
> > > I ran this case on 8.4.0, it shows "PVPANIC_CRASHLOADED" now,
> > > And in comment#0
> > > https://github.com/virtio-win/kvm-guest-drivers-windows/pull/516,
> > > when the crash dump is enabled in Windows guest, PVPANIC_CRASHLOADED will be
> > > triggered before the crash dump. It indicates a crash dump file will be
> > > generated in the guest. Otherwise, PVPANIC_PANICKED will be emitted. This
> > > means the guest crashes without a crash dump file created.
> > 
> > There is different meaning for the flags. 
> > PVPANIC_PANICKED (bit 0) - guest might be rebooted.
> > PVPANIC_CRASHLOADED (bit 1) - guest should not be rebooted by QEMU\libvirt
> > in order to provide the guest a chance to write the crash dump file. pvpanic
> > documentation in QEMU:
> > https://github.com/qemu/qemu/blob/master/docs/specs/pvpanic.txt:
> > 
> > Bit Definition
> > --------------
> > bit 0: a guest panic has happened and should be processed by the host
> 
> When "PVPANIC_PANICKED", guest will shutdown (not reboot) on my side. Is
> that right?
> 
> And I don't know how to trigger guest panic? I changed "write debugging
> information" to none

There is a utility from OSR to crash Windows: https://www.osronline.com/article.cfm%5Earticle=153.htm
Another way is to issue NMI from qemu monitor.

> 
> > bit 1: a guest panic has happened and will be handled by the guest;
> >        the host should record it or report it, but should not affect
> >        the execution of the guest.
> 
> Thanks for your explanation.
> 
> And I want to confirm with you the verify steps:
> 
> 1 boot guest with pvpanic device
> 2 change the register
> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl  set
> NMICrashDump DWORD value to 1.
> 3 {"execute":"inject-nmi"} in qmp
> 4 check if the dump file is generating and qmp event
>  
> 
> Expected Result: 
> 1 if dump collecting to 100% in monitor and event "PVPANIC_CRASHLOADED"
> shows in qmp and guest will reboot/or not reboot
>   (depends on guest settings)
Yes.

> 2 if no dump collecting in monitor and "GUEST_PANICKED" shows in qmp and
> guest will shutdown (triggered by "write debugging information" to none)
> 
Yes.

> Is the test step ok to verify this bug?
> 
> If yes, for all the pvpanic cases, we should change to the expected results
> right?
Yes, new behavior is addes. So old test cases need to be adjusted.

> If no, how to improve the test steps/expected results?
> 
> 
> Thanks
> Yu Wang
> 
> 
> > 
> > > 
> > > So I have some question about this:
> > > 1 In my understanding, we get PVPANIC_PANICKED or PVPANIC_CRASHLOADED is
> > > right for this case.
> > > It depends on whether we generating the dump file. Am I right?
> > > 
> > > But how to trigger no dump file (PVPANIC_PANICKED) situation to check
> > > PVPANIC_PANICKED event is right?
> > > 
> > > 2 Can we verify this case now? And do you know whitch qemu version and
> > > virtio-win build start to support this function?
> > > 
> > > Thanks
> > > Yu Wang
> > 
> > 
> > Best regards,
> > Yab.

Comment 7 Yu Wang 2021-02-04 08:57:55 UTC
(In reply to Yan Vugenfirer from comment #6)

> > > There is different meaning for the flags. 
> > > PVPANIC_PANICKED (bit 0) - guest might be rebooted.
> > > PVPANIC_CRASHLOADED (bit 1) - guest should not be rebooted by QEMU\libvirt
> > > in order to provide the guest a chance to write the crash dump file. pvpanic
> > > documentation in QEMU:
> > > https://github.com/qemu/qemu/blob/master/docs/specs/pvpanic.txt:
> > > 
> > > Bit Definition
> > > --------------
> > > bit 0: a guest panic has happened and should be processed by the host
> > 
> > When "PVPANIC_PANICKED", guest will shutdown (not reboot) on my side. Is
> > that right?
> > 
> > And I don't know how to trigger guest panic? I changed "write debugging
> > information" to none
> 
> There is a utility from OSR to crash Windows:
> https://www.osronline.com/article.cfm%5Earticle=153.htm
> Another way is to issue NMI from qemu monitor.

Yes,I used nmi to trigger this, but, I can only trigger "PVPANIC_CRASHLOADED",
since it can always get the memory dump file after BSOD.
cannot trigger "GUEST_PANICKED" event. What should I test for GUEST_PANICKED ?
or only let it there(not do nothing)

For now, I changed "write debugging information" to none in guest to trigger 
GUEST_PANICKED. Is that valid and necessary?
> 
> > 
> > > bit 1: a guest panic has happened and will be handled by the guest;
> > >        the host should record it or report it, but should not affect
> > >        the execution of the guest.
> > 
> > Thanks for your explanation.
> > 
> > And I want to confirm with you the verify steps:
> > 
> > 1 boot guest with pvpanic device
> > 2 change the register
> > HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl  set
> > NMICrashDump DWORD value to 1.
> > 3 {"execute":"inject-nmi"} in qmp
> > 4 check if the dump file is generating and qmp event
> >  
> > 
> > Expected Result: 
> > 1 if dump collecting to 100% in monitor and event "PVPANIC_CRASHLOADED"
> > shows in qmp and guest will reboot/or not reboot
> >   (depends on guest settings)
> Yes.
> 
> > 2 if no dump collecting in monitor and "GUEST_PANICKED" shows in qmp and
> > guest will shutdown (triggered by "write debugging information" to none)
> > 
> Yes.
> 
> > Is the test step ok to verify this bug?
> > 
> > If yes, for all the pvpanic cases, we should change to the expected results
> > right?
> Yes, new behavior is addes. So old test cases need to be adjusted.
> 
> > If no, how to improve the test steps/expected results?
> > 
> > 
> > Thanks
> > Yu Wang
> >

Comment 8 Yvugenfi@redhat.com 2021-02-08 09:42:47 UTC
"-device pvpanic" command line option has a property "events". This is bit field to enable the events.
#define PVPANIC_PANICKED        (1 << PVPANIC_F_PANICKED)
#define PVPANIC_CRASHLOADED     (1 << PVPANIC_F_CRASHLOADED)

By default, it is PVPANIC_PANICKED | PVPANIC_CRASHLOADED.

By setting it to 1, you can enable only PVPANIC_F_PANICKED.
"-device pvpanic, events=1"

Comment 9 Yu Wang 2021-02-08 13:33:51 UTC
(In reply to Yan Vugenfirer from comment #8)
> "-device pvpanic" command line option has a property "events". This is bit
> field to enable the events.
> #define PVPANIC_PANICKED        (1 << PVPANIC_F_PANICKED)
> #define PVPANIC_CRASHLOADED     (1 << PVPANIC_F_CRASHLOADED)
> 
> By default, it is PVPANIC_PANICKED | PVPANIC_CRASHLOADED.
> 
> By setting it to 1, you can enable only PVPANIC_F_PANICKED.
> "-device pvpanic, events=1"

Thanks a lot,

It is very helpful for me. And I will verify this bug later.

Yu Wang

Comment 10 Yu Wang 2021-02-09 02:34:14 UTC
Tested with virtio-win-prewhql-194 

kernel-4.18.0-283.el8.x86_64
qemu-kvm-5.2.0-5.module+el8.4.0+9775+0937c167.x86_64
seabios-bin-1.14.0-1.module+el8.4.0+8855+a9e237a9.noarch
DISTRO=RHEL-8.4.0-20210206.n.0

Test steps:

1 boot guest with pvpanic device
2 change the register HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl  set NMICrashDump DWORD value to 1.
3 {"execute":"inject-nmi"} in qmp
4 check if the dump file is generating and qmp event

Note: The results are checked "Automatically restart" in Startup and Recovery in guests. If not check this, will not reset by guests.
For the senario3 will still shutdown.

results: 
1 test with no events value, "-device pvpanic,ioport=0x505,id=idG9gcTs"
get "GUEST_CRASHLOADED" and reboot after finish collecting Dump
{"timestamp": {"seconds": 1612834585, "microseconds": 594594}, "event": "GUEST_CRASHLOADED", "data": {"action": "run"}}
{"timestamp": {"seconds": 1612834591, "microseconds": 735188}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1612834591, "microseconds": 737683}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}

2 test with "events=0", "-device pvpanic,ioport=0x505,id=idG9gcTs,events=0"
"no event" and reboot after finish collecting Dump
{"timestamp": {"seconds": 1612835129, "microseconds": 399319}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1612835129, "microseconds": 401757}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}

3 test with "events=0", "-device pvpanic,ioport=0x505,id=idG9gcTs,events=1"
get "GUEST_PANICKED" and shutdown after finish collecting Dump
{"timestamp": {"seconds": 1612792284, "microseconds": 47336}, "event": "GUEST_PANICKED", "data": {"action": "pause"}}
{"timestamp": {"seconds": 1612792284, "microseconds": 47577}, "event": "GUEST_PANICKED", "data": {"action": "poweroff"}}
{"timestamp": {"seconds": 1612792284, "microseconds": 47694}, "event": "SHUTDOWN", "data": {"guest": true, "reason": "guest-panic"}}

4 test with "events=0", "-device pvpanic,ioport=0x505,id=idG9gcTs,events=2"
"no event" and reboot after finish collecting Dump
{"timestamp": {"seconds": 1612835612, "microseconds": 769451}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
{"timestamp": {"seconds": 1612835612, "microseconds": 775380}, "event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}


According to your information, "no events" , "events=0" and "events=1" is expected. 
But when settings "events=2", the same behaviors as "events=0", is that right? 


Thanks
Yu Wang

Comment 11 Yvugenfi@redhat.com 2021-02-09 09:27:23 UTC
Hi Yu,

Can you please provide your Github username? I want to add you to the discussion regarding the appropriate behavior.

From what I see in the code the driver only enables GUEST_CRASHLOADED if both flags are enabled:
if ((features & (PVPANIC_PANICKED | PVPANIC_CRASHLOADED))
                       == (PVPANIC_PANICKED | PVPANIC_CRASHLOADED))
    {
        bSupportCrashLoaded = TRUE;
        TraceEvents(TRACE_LEVEL_INFORMATION, DBG_POWER,
            "PVPANIC_PANICKED and PVPANIC_CRASHLOADED notification features are supported.");
    }
    else if ((features & PVPANIC_PANICKED) == PVPANIC_PANICKED)
    {
        TraceEvents(TRACE_LEVEL_INFORMATION, DBG_POWER,
            "PVPANIC_PANICKED notification feature is supported.");
    }


Best regards,
Yan.

Comment 18 Yu Wang 2021-02-18 08:56:34 UTC
According to Comment#17, change status to assign.

Thanks
Yu Wang

Comment 19 Yu Wang 2021-04-08 09:02:17 UTC
Tested with build197 on win10-64

Get the expected results as commit description:

    event   crashdump enabled        crashdump disabled
       1    PVPANIC_PANICKED         PVPANIC_PANICKED
       2    PVPANIC_CRASHLOADED      no event triggered
       3    PVPANIC_CRASHLOADED      PVPANIC_PANICKED

If setting event to other values, will cause driver failure in guests,
"Further restrictions of setting these values should be done in QEMU side 
to avoid the failure in Windows pvpanic driver.", so not related to 
this driver.


Test steps:

1 boot guest with pvpanic device (-device pvpanic,ioport=0x505,id=idG9gcTs,events=0)
2 disable/enable crashdump (wmic RECOVEROS set DebugInfoType = 0/2)
3 {"execute":"inject-nmi"} in qmp
4 check if the dump file is generating and qmp event

Above all, this bug has been fixed, I will change status to verified.

Thanks
Yu Wang

Comment 23 errata-xmlrpc 2021-11-09 18:52:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virtio-win bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2021:4341