Bug 744077

Summary: default ACPI behavior makes it impossible to cleanly shutdown F16 guest from the host
Product: [Fedora] Fedora Reporter: Eric Blake <eblake>
Component: gnome-settings-daemonAssignee: Richard Hughes <rhughes>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: awilliam, bnocera, eblake, hughsient, kchamart, kparal, mclasen, mishu, rhughes, rjones, rstrode, tflink
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: RejectedBlocker RejectedNTH
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 741375 Environment:
Last Closed: 2014-02-05 11:50:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 741375    
Bug Blocks: 959017    

Description Eric Blake 2011-10-06 22:08:31 UTC
Cloning to gnome-settings-daemon (or perhaps we should move it to get acpid installed by default in F16), to track that the overall issue is not solved, while letting the parent bug 741375 be repurposed to just the seabios fixes that are good.

+++ This bug was initially created as a clone of Bug #741375 +++

Description of problem:
When using libvirt to manage a guest, the preferred method for requesting guest shutdown from the host is the use of the virDomainShutdown API (exposed as the 'shutdown' option in virt-manager, or as 'virsh shutdown domain' from the shell, etc.).  However, this command consists of triggering an ACPI interrupt in the guest.  The default behavior of F16 on an ACPI interrupt is to enter S3 suspend mode, but this is pointless because qemu-kvm immediately re-wakes the system.  The default with F14 was to pop up the interactive restart/shutdown/cancel interactive box with a 60-second timeout that defaulted to shutdown, and thus virDomainShutdown in the host will cause an F14 guest to cleanly shutdown, but have no effect on an F16 guest.

Version-Release number of selected component (if applicable):
gnome-power-manager-3.1.92-1.fc16.x86_64
libvirt-0.9.6-1.fc16.x86_64
qemu-kvm-0.15.0-4.fc16.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Install of F16 on bare metal, including libvirt and qemu-kvm
2. Create a default-install F16 guest
3. Try to shutdown the guest from the host, such as by using 'virsh shutdown domain' in the host
  
Actual results:
In the guest, the eth0 connection bounces, which is evidence that the ACPI signal was received, and that the OS tried to go into S3 but immediately resumed operation

Expected results:
In the guest, the interactive shutdown box should appear.

Additional info:
Note that the guest itself can trigger shutdown - since KVM does not yet support 3d graphics, the default gnome use in the guest uses fallback mode, where clicking on the user name in the top right, then 'Shutdown...', pops up the interactive box, and that interactive box can indeed trigger a shutdown.  However, this is guest-initiated, not host-initiated.

Based on these beta blocker requirements from https://fedoraproject.org/wiki/Fedora_16_Beta_Release_Criteria:

14. The release must boot successfully as a virtual guest in a situation where the virtual host is running the same release (using Fedora's current preferred virtualization technology) 
21. All release-blocking desktops' offered mechanisms (if any) for shutting down, logging out and rebooting must work 

I argue that this is a beta blocker bug, since virDomainShutdown is the preferred and only offered mechanism for cleanly shutting down a guest from the host, and that this is a case of F16 as a self-hosted guest not obeying all the release requirements.

--- Additional comment from hughsient on 2011-09-26 12:01:02 MDT ---

(In reply to comment #0)
> The default with F14 was to pop up the interactive
> restart/shutdown/cancel interactive box with a 60-second timeout that defaulted
> to shutdown, and thus virDomainShutdown in the host will cause an F14 guest to
> cleanly shutdown, but have no effect on an F16 guest.

So you have to wait 60 seconds for the guest to shutdown? That sounds like it's using a hack that used to work in F14 (by co-incidence) than no longer works in F16 as the defaults have changed to something specified by UX designers.

> I argue that this is a beta blocker bug, since virDomainShutdown is the
> preferred and only offered mechanism for cleanly shutting down a guest from the
> host...

Then virDomainShutdown is buggy. You can't just expect to "inject" a power button press and hope that the guest shuts down in an ordered way.

If the user changes the behavior of the shutdown button in F14 or F15 to anything other than the default, then it's going to break there too. 

Really the power manager should be taught that it's running as a VM guest and do something sane (where sane is discussed by the UX people). If you provide some sample code in C and the g-s-d maintainers agree then this is probably the best course of action.

Richard.

--- Additional comment from eblake on 2011-09-26 12:18:10 MDT ---

vir(In reply to comment #1)
> So you have to wait 60 seconds for the guest to shutdown?

Yes, for out-of-the-box defaults in F14.

> That sounds like it's
> using a hack that used to work in F14 (by co-incidence) than no longer works in
> F16 as the defaults have changed to something specified by UX designers.
> 
> > I argue that this is a beta blocker bug, since virDomainShutdown is the
> > preferred and only offered mechanism for cleanly shutting down a guest from the
> > host...
> 
> Then virDomainShutdown is buggy. You can't just expect to "inject" a power
> button press and hope that the guest shuts down in an ordered way.

virDomainShutdown has _always_ been specified as relying on guest cooperation.  The same is true for Windows guests - if the guest is not configured to react to ACPI, then shutdown won't work (there have been numerous complaints about how windows defaults to treating ACPI as a shutdown request if someone is logged in, but ignoring it when done on the initial login screen, all of which have been marked as not a libvirt bug - for example, bug 738553).  In other words, the problem in this bug report is that the default ACPI reaction has changed, from something that worked for a clean shutdown, to something that now doesn't work, and not that ACPI is unreliable for a shutdown mechanism in the first place.

Libvirt _does_ have virDomainDestroy to forcefully shutdown a guest, but this is not clean from the guest's perspective (it is the same as yanking the power cord).

There is also talk of adding a guest agent, where the shutdown request can be sent via the agent rather than by overloading ACPI events, but the guest agent is apparently not mature enough yet for default inclusion in F16.

I have no problem with a change that would make default ACPI behavior depend on whether F16 is running as host (use the UX designer's new behavior) or guest (recognize that this is a guest, and therefore S3 is useless, and therefore shutdown is the only thing that makes sense, whether the shutdown is interactive after 60 seconds or instantaneous like the current S3 is instantaneous).

In fact, it may even be worth cloning this against qemu-kvm, to state that qemu should NOT be exposing S3 capabilities to the guest, so that the guest will no longer try to treat ACPI as an S3 request.  But _something_ needs to be done to make the default out-of-the-box behavior nicer.

> 
> If the user changes the behavior of the shutdown button in F14 or F15 to
> anything other than the default, then it's going to break there too. 

Yes, but then that's no longer the default.  The beta-blocker requirement is about sane out-of-the-box defaults, not what happens after the user configures things.  And since libvirt already documents that virDomainShutdown is best-effort and requires guest cooperation (whether via ACPI or via a guest agent command), the host must be prepared for guests that have reconfigured ACPI behavior.  But that doesn't change the question of starting from sane defaults.

> 
> Really the power manager should be taught that it's running as a VM guest and
> do something sane (where sane is discussed by the UX people). If you provide
> some sample code in C and the g-s-d maintainers agree then this is probably the
> best course of action.

The 'virt-what' package is a rough approximation of whether F16 is running as a VM guest.  Also, as I mentioned, it might be possible to teach qemu-kvm to quit advertising S3 to guests, at least until future Fedora we have a guest agent incorporated by default into guests.

--- Additional comment from awilliam on 2011-09-26 12:20:06 MDT ---

Voting -1 on blocker, this doesn't really hit any of our criteria. The relevant criteria are "The release must boot successfully as a virtual guest in a situation where the virtual host is running the same release (using Fedora's current preferred virtualization technology)" and (arguably) "All release-blocking desktops' offered mechanisms (if any) for shutting down, logging out and rebooting must work" but, hey, it does boot and run, and you can indeed shut down sanely from within the guest. 'Injecting' a shutdown action into the guest from the host is kind of extra credit stuff, to me; it clearly doesn't meet our current criteria and I'm comfortable with not adding a criterion for this.

--- Additional comment from awilliam on 2011-09-26 12:23:29 MDT ---

"Also, as I mentioned, it might be possible to teach qemu-kvm to quit
advertising S3 to guests"

this seems a correct solution, BTW. GNOME recognizes the system's advertised suspend capabilities: if it doesn't advertise suspend capability it offers a Shut Down... option in the menu rather than Suspend and I expect powers down on a power button press (rather than suspending). If qemu-kvm is not capable of suspending safely it should not advertise suspend capabilities. GNOME doesn't appear to be doing anything wrong here.

--- Additional comment from tflink on 2011-09-26 12:29:43 MDT ---

I'm also -1 beta blocker on this.

It doesn't directly hit our criteria (for the reasons listed in comment 3) and it could be fixed by updates later on.

--- Additional comment from awilliam on 2011-09-26 16:58:16 MDT ---

since we have two -1s on this i felt okay with requesting RC3 compose with it still open, but I'll wait for another vote before declaring it rejected.

--- Additional comment from kparal on 2011-09-27 04:39:44 MDT ---

This is related to my bug 704467.

(In reply to comment #4)
> "Also, as I mentioned, it might be possible to teach qemu-kvm to quit
> advertising S3 to guests"
> 
> this seems a correct solution, BTW. GNOME recognizes the system's advertised
> suspend capabilities: if it doesn't advertise suspend capability it offers a
> Shut Down... option in the menu rather than Suspend and I expect powers down on
> a power button press (rather than suspending). If qemu-kvm is not capable of
> suspending safely it should not advertise suspend capabilities. GNOME doesn't
> appear to be doing anything wrong here.

This sounds like a good way to solve it. -1 blocker from me, +1 NTH (beta/final).

--- Additional comment from tflink on 2011-09-27 10:27:01 MDT ---

Okay, we've got -3 blocker so I'm moving it to rejected and proposing it as a final NTH.

--- Additional comment from hughsient on 2011-09-27 10:32:42 MDT ---

(In reply to comment #2)
> The 'virt-what' package is a rough approximation of whether F16 is running as a
> VM guest.

Is there anything in C, and runnable by an unprivileged user that will tell us we're in a VM?

--- Additional comment from eblake on 2011-09-27 11:09:53 MDT ---

(In reply to comment #9)
> (In reply to comment #2)
> > The 'virt-what' package is a rough approximation of whether F16 is running as a
> > VM guest.
> 
> Is there anything in C, and runnable by an unprivileged user that will tell us
> we're in a VM?

I know that there has been work on virt-what to make it work non-privileged, but not sure if it meets the needs just yet.

--- Additional comment from hughsient on 2011-09-27 11:20:19 MDT ---

(In reply to comment #10)
> I know that there has been work on virt-what to make it work non-privileged,
> but not sure if it meets the needs just yet.

There's no timer device or anything specific to kvm I can use? I'm not particularly worried about anything that's not the fedora default personally.

Richard

--- Additional comment from rjones on 2011-09-27 11:27:46 MDT ---

(In reply to comment #10)
> (In reply to comment #9)
> > Is there anything in C, and runnable by an unprivileged user that will tell us
> > we're in a VM?
> 
> I know that there has been work on virt-what to make it work non-privileged,
> but not sure if it meets the needs just yet.

There is a BZ to make virt-what work as non-root:

https://bugzilla.redhat.com/show_bug.cgi?id=719611

(In reply to comment #11)
> There's no timer device or anything specific to kvm I can use? I'm not
> particularly worried about anything that's not the fedora default personally.

What you might want to do in the meantime is take a look at
the virt-what sources.  It's just a shell script!  Maybe
there will be some ideas in it you can use:

http://git.annexia.org/?p=virt-what.git;a=blob;f=virt-what.in;hb=HEAD

--- Additional comment from awilliam on 2011-09-30 14:11:14 MDT ---

Discussed at 2011-09-30 NTH review meeting. We can't see anything in particular that makes this issue NTH; it could be fixed quite well with a post-release update. We suppose if it wasn't fixed at release but was fixed post-release that would mean it would hit F16 live images in VMs, but still, it just seems too trivial to break a freeze for. Note there is a non-frozen period for a few weeks between Beta and Final freeze where this fix could be committed.

--- Additional comment from eblake on 2011-10-01 12:10:19 MDT ---

I finally figured out how to change this behavior from the default, but it is quite hidden.  Even a change to the GUIs (whether control-center or gnome-tweak-tool) to expose this would be a welcome help.

Install dconf-editor, then find org.gnome.settings-daemon.plugins.power, and change button-power from 'suspend' to either 'interactive' or 'shutdown'.

--- Additional comment from eblake on 2011-10-03 09:03:15 MDT ---

See also bug 736522 for teaching qemu/seabios how to quit advertising S3 to guests.

--- Additional comment from updates on 2011-10-05 20:35:00 MDT ---

seabios-0.6.2-3.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/seabios-0.6.2-3.fc16

--- Additional comment from kparal on 2011-10-06 00:57:32 MDT ---

Can somebody please describe in detail what changed in seabios (what is the new expected default behavior) so that we can test it properly?

--- Additional comment from eblake on 2011-10-06 14:09:04 MDT ---

The seabios change makes it so that guests no longer see S3 advertised.  In your guest, run 'pm-is-supported --suspend' before and after the seabios upgrade; exit status will be 0 (supported) pre-upgrade, and 1 (absent) post-upgrade.

However, while the seabios change is good and working, the overall bug is still present.  Now it looks like when the F16 guest has no S3 support, but org.gnome.settings-daemon.plugins.power.button-power is still 'suspend', that the guest completely ignores ACPI.  But this was also without 'acpid' installed.  Maybe that means that we ALSO need to install acpid by default, and automatically enable it if we detect that we are in a VM?  Or is there still something needed in gnome-power that recognizes ACPI power without S3 as a reason to shutdown instead?

--- Additional comment from updates on 2011-10-06 15:22:42 MDT ---

Package seabios-0.6.2-3.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing seabios-0.6.2-3.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2011-13894
then log in and leave karma (feedback).

Comment 1 Richard Hughes 2011-10-07 09:41:34 UTC
(In reply to comment #0)
> However, while the seabios change is good and working, the overall bug is still
> present.  Now it looks like when the F16 guest has no S3 support, but
> org.gnome.settings-daemon.plugins.power.button-power is still 'suspend', that
> the guest completely ignores ACPI.  

So, so summarize, if the BIOS indicates it can't do S3, and "org.gnome.settings-daemon.plugins.power.button-power" indicates 'suspend' then nothing is done? In that case we should probably fall back to "interactive" and is probably a sane thing to do rather than doing nothing. If this sums up the bug accurately, can you create an upstream gnome.org bug against gnome-settings-daemon (power plugin) and I'll fix it there.

> But this was also without 'acpid'
> installed.  Maybe that means that we ALSO need to install acpid by default, and
> automatically enable it if we detect that we are in a VM?

No, we *don't* want acpid at all.

Richard.

Comment 2 Fedora End Of Life 2013-01-16 17:11:34 UTC
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 3 Eric Blake 2013-01-16 18:51:36 UTC
moving to F18, as the defaults are still awkward.  However, solving bug 886705 may render this bug irrelevant, as the use of a guest-agent instead of ACPI to cause guest shutdowns is conceptually cleaner, and then we don't have to care about default behavior on ACPI.

Comment 4 Eric Blake 2013-05-08 22:36:12 UTC
*** Bug 961142 has been marked as a duplicate of this bug. ***

Comment 5 Adam Williamson 2013-05-08 23:45:56 UTC
Note that 886705 was only 'fixed' for live images. DVD / netinst installs do not get qemu-guest-agent at present.

Comment 6 Fedora End Of Life 2013-12-21 08:29:26 UTC
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 7 Fedora End Of Life 2014-02-05 11:50:40 UTC
Fedora 18 changed to end-of-life (EOL) status on 2014-01-14. Fedora 18 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.