RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1216281 - Guest show blackscreen after resume the guest which paused by watchdog
Summary: Guest show blackscreen after resume the guest which paused by watchdog
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.2
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Martin Kletzander
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-29 02:41 UTC by zhenfeng wang
Modified: 2016-11-03 18:16 UTC (History)
13 users (show)

Fixed In Version: libvirt-2.0.0-1.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-03 18:16:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2577 0 normal SHIPPED_LIVE Moderate: libvirt security, bug fix, and enhancement update 2016-11-03 12:07:06 UTC

Description zhenfeng wang 2015-04-29 02:41:35 UTC
Description of problem:
Guest show blackscreen after resume the guest which paused by watchdog

Version-Release number:
qemu-kvm-rhev-2.2.0-9.el7.x86_64
kernel-3.10.0-242.el7.x86_64
libvirt-1.2.14-1.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Configure a guest with guest agent and watchdog
#virsh dumpxml 7.2
--
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/7.2.org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel1'/>
      <address type='virtio-serial' controller='0' bus='0' port='2'/>
    </channel>
--
    <graphics type='spice' autoport='yes' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
---
    <watchdog model='i6300esb' action='pause'>
      <alias name='watchdog0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </watchdog>

2.Start the guest
#virsh start 7.2

3.login the guest and trigger the watchdog
guest#echo 1 >/dev/watchdog

4.Wait 1 minutes untile the guest paused by watchdog
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 19    7.2                            paused

5.Shutdown the guest, the guest couldn't be shutdown since in pause status
#virsh shutdown 7.2

6.Resume the guest, the guest show blackscreen
#virsh resume 7.2

Actual results:
The guest show blackscreen after resume

Expected results:
shouldn't show blackcreen

Comment 2 zhenfeng wang 2015-04-29 04:51:53 UTC
Re-debug this issue, found it has no relationship with watchdog actually, we could reproduce this issue by shutdown the pause guest directly without configuring watchdog. For rhel7 guest with GUI, we should shutdown the guest by agent mode, should not shutdown guest by acpi mode, since the guest os will ignore the acpi event request, so we met the issue in comment0, we can find this issue in rhel6's bug 
https://bugzilla.redhat.com/show_bug.cgi?id=1147362#c7
https://bugzilla.redhat.com/show_bug.cgi?id=1113411#c0

I guess the upper 2 bugs can also explain our comment 0's issue right?

Comment 3 Jaroslav Suchanek 2015-05-05 15:02:03 UTC
(In reply to zhenfeng wang from comment #2)
> Re-debug this issue, found it has no relationship with watchdog actually, we
> could reproduce this issue by shutdown the pause guest directly without
> configuring watchdog. For rhel7 guest with GUI, we should shutdown the guest
> by agent mode, should not shutdown guest by acpi mode, since the guest os
> will ignore the acpi event request, so we met the issue in comment0, we can
> find this issue in rhel6's bug 
> https://bugzilla.redhat.com/show_bug.cgi?id=1147362#c7
> https://bugzilla.redhat.com/show_bug.cgi?id=1113411#c0
> 
> I guess the upper 2 bugs can also explain our comment 0's issue right?

That's right. Have you tried to shutdown it also with guest mode?

Comment 4 zhenfeng wang 2015-05-13 01:40:27 UTC
yes, have try it, it could be shutdown successfully with the guest mode

Comment 5 Jaroslav Suchanek 2015-05-28 14:44:28 UTC
Moving to qemu-kvm for further investigation as there is no indication of libvirt's fault.

Comment 6 Gerd Hoffmann 2015-09-08 08:35:48 UTC
Looks all fine here.

virsh shutdown --mode agent in "running" state -- works fine.

virsh shutdown --mode agent in "paused" state -- libvirt throws an error.  Which looks ok to me, given that it can't talk to the guest agent when the guest is paused.

virsh shutdown --mode acpi in "running" state -- suspends the guest.  Seems to be the default configuration if RHEL-7 guests.  Therefore OK.

virsh shutdown --mode acpi in "paused" state -- suspends the guest, but that is delayed until the guest restarts (it can't respond to the acpi event earlier).

In case --mode is not explicitly specified libvirt seems to pick "agent" in "running" state and "acpi" in "paused" state.  So "virsh suspend; virsh shutdown; virsh resume" is just a more complicated way to say "virsh shutdown --mode acpi".

QE: can you confirm?

Comment 7 Gerd Hoffmann 2015-09-08 08:38:05 UTC
Jaroslav: see comment 6: can you confirm libvirt works as intended?

At least trying to deliver acpi events to a paused guest looks questionable to me ...

Comment 8 Karen Noel 2015-09-15 22:37:31 UTC
(In reply to Gerd Hoffmann from comment #6)
> Looks all fine here.
> 
> virsh shutdown --mode agent in "running" state -- works fine.
> 
> virsh shutdown --mode agent in "paused" state -- libvirt throws an error. 
> Which looks ok to me, given that it can't talk to the guest agent when the
> guest is paused.
> 
> virsh shutdown --mode acpi in "running" state -- suspends the guest.  Seems
> to be the default configuration if RHEL-7 guests.  Therefore OK.
> 
> virsh shutdown --mode acpi in "paused" state -- suspends the guest, but that
> is delayed until the guest restarts (it can't respond to the acpi event
> earlier).

Gerd, If everything looks fine, then why the BSOD?

I think libvirt is just doing what is requested.

> 
> In case --mode is not explicitly specified libvirt seems to pick "agent" in
> "running" state and "acpi" in "paused" state.  So "virsh suspend; virsh
> shutdown; virsh resume" is just a more complicated way to say "virsh
> shutdown --mode acpi".
> 
> QE: can you confirm?

If this isn't a regression, then please move this to 7.3.  Shutting down a paused guest doesn't seem to be a valid RHEV or OpenStack use case. But, I'm not really sure.

Can libvirt QE can ask RHEV QE if this is valid? Thanks.

Comment 9 zhenfeng wang 2015-09-16 01:54:40 UTC
Hi karen
I will confirm it with RHEV QE and give response here later

Comment 10 zhenfeng wang 2015-09-16 04:43:31 UTC
sorry to mistakely clear Gerd's needinfo in comment7, so re-add 1 here to track Gerd's comment7's issue

Comment 11 zhenfeng wang 2015-09-16 05:08:40 UTC
Hi Gerd, karen
Sorry for the delay response, please help check the following comment, hope they're helpfull for you, thanks

> 
> virsh shutdown --mode agent in "running" state -- works fine.
 
Yes, it works well shutdown guest with agent mode in running state

> 
> virsh shutdown --mode agent in "paused" state -- libvirt throws an error. 
> Which looks ok to me, given that it can't talk to the guest agent when the
> guest is paused.

Yes, libvirt could shrows an error while shutdown guest in this scenarios
# virsh list
 Id    Name                           State
----------------------------------------------------
 3     vm1                            paused

# virsh shutdown vm1 --mode agent
error: Failed to shutdown domain vm1
error: Requested operation is not valid: domain is not running

> 
> virsh shutdown --mode acpi in "running" state -- suspends the guest.  Seems
> to be the default configuration if RHEL-7 guests.  Therefore OK.

The guest will blackscreen while shutdown a running guest with acpi mode while start guest with level 5
# virsh list
 Id    Name                           State
----------------------------------------------------
 8     vm1                            running

#virsh shutdown vm1 --mode acpi  =====>guest show blackscreen

The guest could shutdown with acpi mode successfuuly while start guest with level 3

this issue is very similiar with bug 1149534, as the bug comment0 description, the original reportor met the guest screen lock while shutdown guest, i'm not sure whether it's the same with blackscreen, if yes, then jiri's explanation should solve our issue, can you help confirm it ? thanks

https://bugzilla.redhat.com/show_bug.cgi?id=1071072#c3

> 
> virsh shutdown --mode acpi in "paused" state -- suspends the guest, but that
> is delayed until the guest restarts (it can't respond to the acpi event
> earlier).

should be same as the upper one, the following was the details result:

The guest will show blackscreen after resume the guest which just did suspend/shutdown operation with acpi mode while start guest with level 5

#virsh suspend guest
#virsh shutdown guest --mode acpi
#virsh resume guest          =====> guest show blackcreen


The guest could shutdown with acpi mode successfully after resume the guest which just did suspend/shutdown operation with acpi mode while start guest with level 3


> 
> In case --mode is not explicitly specified libvirt seems to pick "agent" in
> "running" state and "acpi" in "paused" state.  So "virsh suspend; virsh
> shutdown; virsh resume" is just a more complicated way to say "virsh
> shutdown --mode acpi".

For virsh shutdown command in rhel7, use guest agent is better choice than acpi, since RHEL-7 is not configured to automatically shutdown after receiving ACPI power button event,  So either the guest OS has to be configured not to ignore it or you need to make use of a QEMU guest agent to solve this issue. This is alreadly recorded in the doc in the following bug
https://bugzilla.redhat.com/show_bug.cgi?id=1149534#c20

Comment 12 Gerd Hoffmann 2015-09-16 06:17:09 UTC
> > virsh shutdown --mode acpi in "running" state -- suspends the guest.  Seems
> > to be the default configuration if RHEL-7 guests.  Therefore OK.
> 
> The guest will blackscreen while shutdown a running guest with acpi mode
> while start guest with level 5
> # virsh list
>  Id    Name                           State
> ----------------------------------------------------
>  8     vm1                            running
> 
> #virsh shutdown vm1 --mode acpi  =====>guest show blackscreen

Yes, that black screen is suspend most likely.  Hook up a serial console, possibly you also have to remove "quiet" from the kernel command line, then you'll see the kernel messages from the guest going into suspend mode as response to the acpi event:

<quote>
[   39.045245] PM: Syncing filesystems ... done.
[   39.400158] Freezing user space processes ... (elapsed 0.003 seconds) done.
[   39.404347] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   39.407796] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[   40.009285] PM: suspend of devices complete after 601.724 msecs
[   40.013600] PM: late suspend of devices complete after 0.185 msecs
[   40.020392] PM: noirq suspend of devices complete after 2.809 msecs
</quote>

But with S3 support disabled (which is the default) the guest will not ask qemu to enter S3, so qemu wouldn't send a QMP event and therefore libvirt doesn't know the guest is in pmsuspend state.

Comment 13 Gerd Hoffmann 2015-09-18 13:36:20 UTC
So, to sum up things:

The black screen isn't a crash, but simply the guest being in pmsuspend mode.

Management doesn't know though because S3 support is off by default and therefore we don't get a notification from the guest.

We end up there because libvirt falls back to acpi shutdown.  guest agent shutdown is not available because the guest is paused.

How to tackle this?

I think libvirt should be consistent and not allow any shutdown requests while the guest is paused, be it guest-agent, acpi or something else.  The guest can't react anyway while paused.  Jaroslav?

Another question is why the guest tries to pmsuspend itself in the first place even though S3 support is not available ...

[ also defering to 7.3, this isn't a blocker ]

Comment 15 Gerd Hoffmann 2016-04-15 10:40:13 UTC
Ping Jaroslav.  This is in needinfo for quite a while ...

Comment 17 Jaroslav Suchanek 2016-04-19 07:32:39 UTC
(In reply to Gerd Hoffmann from comment #13)
> So, to sum up things:
> 
> The black screen isn't a crash, but simply the guest being in pmsuspend mode.
> 
> Management doesn't know though because S3 support is off by default and
> therefore we don't get a notification from the guest.
> 
> We end up there because libvirt falls back to acpi shutdown.  guest agent
> shutdown is not available because the guest is paused.
> 
> How to tackle this?
> 
> I think libvirt should be consistent and not allow any shutdown requests
> while the guest is paused, be it guest-agent, acpi or something else.  The
> guest can't react anyway while paused.  Jaroslav?

I'm sorry for my late answer. The above makes sense. Assigning to Martin.

Comment 18 Martin Kletzander 2016-06-13 13:30:04 UTC
Patch proposed upstream:

https://www.redhat.com/archives/libvir-list/2016-June/msg00770.html

Comment 19 Martin Kletzander 2016-06-14 09:17:35 UTC
Fixed upstream with v1.3.5-245-gb842741ba467:
commit b842741ba467b97f3c1ac63ec9bc550edf037690
Author: Martin Kletzander <mkletzan>
Date:   Mon Jun 13 14:33:42 2016 +0200

    qemu: Allow ACPI shutdown only for running domains

Comment 21 Fangge Jin 2016-07-04 10:55:40 UTC
Reproduce this BZ on build libvirt-1.3.5-1.el7.x86_64
Steps:
1. Start a RHEL7 guest with GUI
# virsh start rhel7
Domain rhel7 started


2.# virsh suspend rhel7
Domain rhel7 suspended

3.# virsh shutdown rhel7
Domain rhel7 is being shutdown

4. # virsh resume rhel7
Domain rhel7 resumed

Guest shows black screen after resuming.



Verify pass on build libvirt-2.0.0-1.el7.x86_64
Steps:
1. Start a RHEL7 guest with GUI
# virsh start rhel7
Domain rhel7 started

2. # virsh suspend rhel7
Domain rhel7 suspended

3. # virsh shutdown rhel7
error: Failed to shutdown domain rhel7
error: Requested operation is not valid: domain is not running

# virsh list
 Id    Name                           State
----------------------------------------------------
 3     rhel7                          paused

4.# virsh resume rhel7
Domain rhel7 resumed

And guest works well after resuming.

Comment 23 errata-xmlrpc 2016-11-03 18:16:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html


Note You need to log in before you can comment on or make changes to this bug.