Bug 1216281
Summary: | Guest show blackscreen after resume the guest which paused by watchdog | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | zhenfeng wang <zhwang> |
Component: | libvirt | Assignee: | Martin Kletzander <mkletzan> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 7.2 | CC: | dyuan, fjin, gsun, hhuang, huding, jsuchane, juzhang, knoel, kraxel, mzhan, rbalakri, virt-maint, xfu |
Target Milestone: | rc | Keywords: | Upstream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | libvirt-2.0.0-1.el7 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-11-03 18:16:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
zhenfeng wang
2015-04-29 02:41:35 UTC
Re-debug this issue, found it has no relationship with watchdog actually, we could reproduce this issue by shutdown the pause guest directly without configuring watchdog. For rhel7 guest with GUI, we should shutdown the guest by agent mode, should not shutdown guest by acpi mode, since the guest os will ignore the acpi event request, so we met the issue in comment0, we can find this issue in rhel6's bug https://bugzilla.redhat.com/show_bug.cgi?id=1147362#c7 https://bugzilla.redhat.com/show_bug.cgi?id=1113411#c0 I guess the upper 2 bugs can also explain our comment 0's issue right? (In reply to zhenfeng wang from comment #2) > Re-debug this issue, found it has no relationship with watchdog actually, we > could reproduce this issue by shutdown the pause guest directly without > configuring watchdog. For rhel7 guest with GUI, we should shutdown the guest > by agent mode, should not shutdown guest by acpi mode, since the guest os > will ignore the acpi event request, so we met the issue in comment0, we can > find this issue in rhel6's bug > https://bugzilla.redhat.com/show_bug.cgi?id=1147362#c7 > https://bugzilla.redhat.com/show_bug.cgi?id=1113411#c0 > > I guess the upper 2 bugs can also explain our comment 0's issue right? That's right. Have you tried to shutdown it also with guest mode? yes, have try it, it could be shutdown successfully with the guest mode Moving to qemu-kvm for further investigation as there is no indication of libvirt's fault. Looks all fine here. virsh shutdown --mode agent in "running" state -- works fine. virsh shutdown --mode agent in "paused" state -- libvirt throws an error. Which looks ok to me, given that it can't talk to the guest agent when the guest is paused. virsh shutdown --mode acpi in "running" state -- suspends the guest. Seems to be the default configuration if RHEL-7 guests. Therefore OK. virsh shutdown --mode acpi in "paused" state -- suspends the guest, but that is delayed until the guest restarts (it can't respond to the acpi event earlier). In case --mode is not explicitly specified libvirt seems to pick "agent" in "running" state and "acpi" in "paused" state. So "virsh suspend; virsh shutdown; virsh resume" is just a more complicated way to say "virsh shutdown --mode acpi". QE: can you confirm? Jaroslav: see comment 6: can you confirm libvirt works as intended? At least trying to deliver acpi events to a paused guest looks questionable to me ... (In reply to Gerd Hoffmann from comment #6) > Looks all fine here. > > virsh shutdown --mode agent in "running" state -- works fine. > > virsh shutdown --mode agent in "paused" state -- libvirt throws an error. > Which looks ok to me, given that it can't talk to the guest agent when the > guest is paused. > > virsh shutdown --mode acpi in "running" state -- suspends the guest. Seems > to be the default configuration if RHEL-7 guests. Therefore OK. > > virsh shutdown --mode acpi in "paused" state -- suspends the guest, but that > is delayed until the guest restarts (it can't respond to the acpi event > earlier). Gerd, If everything looks fine, then why the BSOD? I think libvirt is just doing what is requested. > > In case --mode is not explicitly specified libvirt seems to pick "agent" in > "running" state and "acpi" in "paused" state. So "virsh suspend; virsh > shutdown; virsh resume" is just a more complicated way to say "virsh > shutdown --mode acpi". > > QE: can you confirm? If this isn't a regression, then please move this to 7.3. Shutting down a paused guest doesn't seem to be a valid RHEV or OpenStack use case. But, I'm not really sure. Can libvirt QE can ask RHEV QE if this is valid? Thanks. Hi karen I will confirm it with RHEV QE and give response here later sorry to mistakely clear Gerd's needinfo in comment7, so re-add 1 here to track Gerd's comment7's issue Hi Gerd, karen Sorry for the delay response, please help check the following comment, hope they're helpfull for you, thanks > > virsh shutdown --mode agent in "running" state -- works fine. Yes, it works well shutdown guest with agent mode in running state > > virsh shutdown --mode agent in "paused" state -- libvirt throws an error. > Which looks ok to me, given that it can't talk to the guest agent when the > guest is paused. Yes, libvirt could shrows an error while shutdown guest in this scenarios # virsh list Id Name State ---------------------------------------------------- 3 vm1 paused # virsh shutdown vm1 --mode agent error: Failed to shutdown domain vm1 error: Requested operation is not valid: domain is not running > > virsh shutdown --mode acpi in "running" state -- suspends the guest. Seems > to be the default configuration if RHEL-7 guests. Therefore OK. The guest will blackscreen while shutdown a running guest with acpi mode while start guest with level 5 # virsh list Id Name State ---------------------------------------------------- 8 vm1 running #virsh shutdown vm1 --mode acpi =====>guest show blackscreen The guest could shutdown with acpi mode successfuuly while start guest with level 3 this issue is very similiar with bug 1149534, as the bug comment0 description, the original reportor met the guest screen lock while shutdown guest, i'm not sure whether it's the same with blackscreen, if yes, then jiri's explanation should solve our issue, can you help confirm it ? thanks https://bugzilla.redhat.com/show_bug.cgi?id=1071072#c3 > > virsh shutdown --mode acpi in "paused" state -- suspends the guest, but that > is delayed until the guest restarts (it can't respond to the acpi event > earlier). should be same as the upper one, the following was the details result: The guest will show blackscreen after resume the guest which just did suspend/shutdown operation with acpi mode while start guest with level 5 #virsh suspend guest #virsh shutdown guest --mode acpi #virsh resume guest =====> guest show blackcreen The guest could shutdown with acpi mode successfully after resume the guest which just did suspend/shutdown operation with acpi mode while start guest with level 3 > > In case --mode is not explicitly specified libvirt seems to pick "agent" in > "running" state and "acpi" in "paused" state. So "virsh suspend; virsh > shutdown; virsh resume" is just a more complicated way to say "virsh > shutdown --mode acpi". For virsh shutdown command in rhel7, use guest agent is better choice than acpi, since RHEL-7 is not configured to automatically shutdown after receiving ACPI power button event, So either the guest OS has to be configured not to ignore it or you need to make use of a QEMU guest agent to solve this issue. This is alreadly recorded in the doc in the following bug https://bugzilla.redhat.com/show_bug.cgi?id=1149534#c20 > > virsh shutdown --mode acpi in "running" state -- suspends the guest. Seems
> > to be the default configuration if RHEL-7 guests. Therefore OK.
>
> The guest will blackscreen while shutdown a running guest with acpi mode
> while start guest with level 5
> # virsh list
> Id Name State
> ----------------------------------------------------
> 8 vm1 running
>
> #virsh shutdown vm1 --mode acpi =====>guest show blackscreen
Yes, that black screen is suspend most likely. Hook up a serial console, possibly you also have to remove "quiet" from the kernel command line, then you'll see the kernel messages from the guest going into suspend mode as response to the acpi event:
<quote>
[ 39.045245] PM: Syncing filesystems ... done.
[ 39.400158] Freezing user space processes ... (elapsed 0.003 seconds) done.
[ 39.404347] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 39.407796] sd 2:0:0:0: [sda] Synchronizing SCSI cache
[ 40.009285] PM: suspend of devices complete after 601.724 msecs
[ 40.013600] PM: late suspend of devices complete after 0.185 msecs
[ 40.020392] PM: noirq suspend of devices complete after 2.809 msecs
</quote>
But with S3 support disabled (which is the default) the guest will not ask qemu to enter S3, so qemu wouldn't send a QMP event and therefore libvirt doesn't know the guest is in pmsuspend state.
So, to sum up things: The black screen isn't a crash, but simply the guest being in pmsuspend mode. Management doesn't know though because S3 support is off by default and therefore we don't get a notification from the guest. We end up there because libvirt falls back to acpi shutdown. guest agent shutdown is not available because the guest is paused. How to tackle this? I think libvirt should be consistent and not allow any shutdown requests while the guest is paused, be it guest-agent, acpi or something else. The guest can't react anyway while paused. Jaroslav? Another question is why the guest tries to pmsuspend itself in the first place even though S3 support is not available ... [ also defering to 7.3, this isn't a blocker ] Ping Jaroslav. This is in needinfo for quite a while ... (In reply to Gerd Hoffmann from comment #13) > So, to sum up things: > > The black screen isn't a crash, but simply the guest being in pmsuspend mode. > > Management doesn't know though because S3 support is off by default and > therefore we don't get a notification from the guest. > > We end up there because libvirt falls back to acpi shutdown. guest agent > shutdown is not available because the guest is paused. > > How to tackle this? > > I think libvirt should be consistent and not allow any shutdown requests > while the guest is paused, be it guest-agent, acpi or something else. The > guest can't react anyway while paused. Jaroslav? I'm sorry for my late answer. The above makes sense. Assigning to Martin. Patch proposed upstream: https://www.redhat.com/archives/libvir-list/2016-June/msg00770.html Fixed upstream with v1.3.5-245-gb842741ba467: commit b842741ba467b97f3c1ac63ec9bc550edf037690 Author: Martin Kletzander <mkletzan> Date: Mon Jun 13 14:33:42 2016 +0200 qemu: Allow ACPI shutdown only for running domains Reproduce this BZ on build libvirt-1.3.5-1.el7.x86_64 Steps: 1. Start a RHEL7 guest with GUI # virsh start rhel7 Domain rhel7 started 2.# virsh suspend rhel7 Domain rhel7 suspended 3.# virsh shutdown rhel7 Domain rhel7 is being shutdown 4. # virsh resume rhel7 Domain rhel7 resumed Guest shows black screen after resuming. Verify pass on build libvirt-2.0.0-1.el7.x86_64 Steps: 1. Start a RHEL7 guest with GUI # virsh start rhel7 Domain rhel7 started 2. # virsh suspend rhel7 Domain rhel7 suspended 3. # virsh shutdown rhel7 error: Failed to shutdown domain rhel7 error: Requested operation is not valid: domain is not running # virsh list Id Name State ---------------------------------------------------- 3 rhel7 paused 4.# virsh resume rhel7 Domain rhel7 resumed And guest works well after resuming. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2577.html |