Bug 1223541 - can't wake up VM that goes into freeze/suspend-to-idle state
Summary: can't wake up VM that goes into freeze/suspend-to-idle state
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Fedora
Classification: Fedora
Component: qemu
Version: 23
Hardware: Unspecified
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Fedora Virtualization Maintainers
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-05-20 20:01 UTC by Giulio 'juliuxpigface'
Modified: 2016-05-02 20:26 UTC (History)
19 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2016-05-02 20:26:56 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
# virsh dumpxml fedora-22 > fedora-22.xml (3.79 KB, text/html)
2015-05-22 20:04 UTC, Giulio 'juliuxpigface'
no flags Details
# cat /var/log/libvirt/qemu/fedora-22.log (84.83 KB, text/plain)
2015-05-22 20:05 UTC, Giulio 'juliuxpigface'
no flags Details

Description Giulio 'juliuxpigface' 2015-05-20 20:01:14 UTC
Description of problem:
Suspending to ram makes the guest unresponsive.

Version-Release number of selected component (if applicable):
qemu-kvm-2.3.0-4.fc22.x86_64
qemu-system-x86-2.3.0-4.fc22.x86_64
virt-manager-1.2.0-1.fc22.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Clean boot an installed Fedora 22 guest.
2. Suspend the guest to ram.

Actual results:
1. The virtual screen remains black and the guest couldn't be woken up.
2. A forced poweroff is necessary.

Expected results:
1. The user should be able to resume the guest.

Comment 1 Cole Robinson 2015-05-20 22:06:24 UTC
I just tried with f22 host and latest f22 VM, seems to work fine here.

Please provide:

- sudo virsh dumpxml $vmname
- /var/log/libvirt/qemu/$vmname.log

Comment 2 Giulio 'juliuxpigface' 2015-05-22 20:04:42 UTC
Created attachment 1028918 [details]
# virsh dumpxml fedora-22 > fedora-22.xml

Comment 3 Giulio 'juliuxpigface' 2015-05-22 20:05:42 UTC
Created attachment 1028919 [details]
# cat /var/log/libvirt/qemu/fedora-22.log

Comment 4 Cole Robinson 2015-05-26 00:12:40 UTC
Your guest has this:

  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>

Which means 'tell the bios not to allow s3/s4 support'. This is the default set by gnome-boxes and virt-manager because people don't like that the shutdown button suspends there VMs.

Historically I thought gnome-shell wouldn't advertise the suspend option if the bios didn't advertise support, but indeed that doesn't seem to be the case now. So reassigning to gnome-shell

FWIW if you switch the ='no' to ='yes' above, the guest will resume correctly (that's what I was using in my test case).

Comment 5 Giulio 'juliuxpigface' 2015-05-26 18:52:28 UTC
I've updated my /etc/libvirt/qemu/fedora-22.xml file with your suggestions.Thank you for this trick! I confirm it fixes the issue for me too.

Perhaps Gnome was the only one which historically had this feature, but, just to clarify... This happens not only with Gnome, but also with every spin I've tested recently. For instance, KDE doesn't hide the suspend button which makes the vm stuck.

Comment 6 Cole Robinson 2015-05-26 20:36:10 UTC
I guess it could also be a regression anywhere from qemu to seabios to guest kernel.

Can gnome-shell guys confirm that shell still tries to detect is s3/s4 is supported? If so, how do you perform the check?

(randomly setting needinfo on Florian since I've seen him triage bugs here before. sorry for punishing a good deed :) )

Comment 7 Aleksandar Kostadinov 2015-05-27 12:02:03 UTC
I do not trigger suspend in the VMs. I just close the laptop lid and it is suspending the host OS.

Comment 8 Cole Robinson 2015-05-27 13:46:11 UTC
(In reply to Aleksandar Kostadinov from comment #7)
> I do not trigger suspend in the VMs. I just close the laptop lid and it is
> suspending the host OS.

I incorrectly pointed Aleksandar at this bug, so this comment should be ignored, it's about a separate issue

Comment 9 Kamil Páral 2015-05-27 14:57:37 UTC
Not sure if this has anything to do with gnome-shell. I can still press the power button in the virt-manager UI, which will send the signal to any OS.

This is my experience from F21 host with F22 guest:
* I have suspend-to-mem and suspend-to-disk disabled by default. If I boot fully updated F22 and either use gnome-shell interface to suspend (press Alt and power off icon changes to suspend icon), or hit the power button in virt-manager UI, the screen goes black and never resumes again, no matter what I do. Hard power off is necessary.

* If I enable suspend-to-ram and suspend-to-disk, and boot the same F22 VM, both ways (gnome-shell and virt-manager's power button) seem to suspend the machine correctly. It even changes icon in virt-manager VM overview window to a "paused" state for that VM. However, the machine automatically resumes after ~ 2 seconds. I haven't found a way to prevent it. So while it does not freeze as in the first case, it still doesn't work properly.


Then I have tried a F21 guest:
* With suspend-to-* disabled by default, if I hit power button in virt-manager UI, the VM shuts down correctly, as with executing 'poweroff' inside of it. When I suspend using gnome-shell interface, the VM freezes, as with F22.

* With suspend-to-* enabled, if I hit power button in virt-manager UI, the VM again shuts down correctly, as in the previous case. If I use gnome-shell interface, the VM suspends and wakes up after 2 seconds.


So, there seems to be a change in gnome-shell action triggered after receiving a power off button signal between F21 and F22. But there doesn't seem to be any change when we actually try to suspend the machine.

(Btw, in the old days, if I hit a power button in virt-manager UI, gnome-shell displayed a poweroff/reboot/cancel dialog. That was a great behavior.)

Do you want me to attach logs (which ones?), or create a separate bug, or something else (wait for gnome-shell devs response)?

Comment 10 Florian Müllner 2015-06-11 15:42:45 UTC
(In reply to Cole Robinson from comment #6)
> Can gnome-shell guys confirm that shell still tries to detect is s3/s4 is
> supported? If so, how do you perform the check?

The suspend button in the system menu(*) is indeed supposed to be hidden when suspend is unavailable. We rely on systemd's logind for both checking for suspend support and the actual suspending.


(*) which is the only way to suspend from gnome-shell itself, keybindings and lid close are handled elsewhere

Comment 11 Kamil Páral 2015-06-12 08:24:45 UTC
Hello Florian,
how does GNOME decide what to do when the power button is pressed?

Comment 12 Florian Müllner 2015-06-12 15:05:14 UTC
That's handled by gnome-setting-daemon[0], which calls a logind method corresponding to the action configured by the 'button-power' action in org.gnome.settings-daemon.plugins.power.

[0] https://git.gnome.org/browse/gnome-settings-daemon/tree/plugins/media-keys/gsd-media-keys-manager.c#n1804

Comment 13 Cole Robinson 2015-09-21 20:42:25 UTC
Okay I poked into this some more. There's some confusing bits here.

By default, virt-manager and boxes created VMs tell QEMU to turn off S3/S4. This seems to be working as expected; the bios stops advertising explicit support for S3/S4 to the guest.

However the linux kernel will always advertise hibernation support ('disk' shows up in /sys/power/state) since there's an impl that uses regular machine shutdown rather than S4. The VM will show [shutdown] as the selected option in /sys/power/disk. So hibernation is always advertised by systemd. (This hibernation doesn't actually seem to work with a VM, but the machine just shuts down and doesn't hang so it's not that bad).

Proper suspend-to-ram _is_ correctly disabled (notice 'mem' is not present in /sys/power/state)... however systemd by default considers the 'freeze' state as a suspend impl. freeze aka suspend-to-idle is independent of the bios AFAICT and is always available.

So no matter what we specify in the VM config, systemd is always going to advertise that suspend and hibernate are available.

The root issue for this bug is that VMs seem to hang when they go into the freeze/suspend-to-idle state, or at least I can't find a way to wake it up... It's easy to reproduce this by doing: echo freeze > /sys/power/state

Comment 14 Laszlo Ersek 2015-09-28 20:33:45 UTC
I found some docs about "freeze" in
"Documentation/power/suspend-and-interrupts.txt" about it. Apparently "some
interrupt" should be injected. Upstream kernel patches leading up to
27f3d18630cd7fbb03b62bd78a74303cb8c88069 seem relevant too, but I have no
clue why this freeze / suspend-to-idle thing was a good idea in the first
place.

I don't really understand the behavior in change BTW; the "freeze" suspend
state dates back to Feb 2013 (commit
7e73c5ae6e7991a6c01f6d096ff8afaef4458c36) -- note: see "6. an interrupt
fires." there! -- and that kernel commit is part of RHEL-7 too.

I guess systemd may have changed? Hm...

http://www.freedesktop.org/software/systemd/man/systemd-sleep.conf.html

According to the above documentation, and the
"systemd/src/shared/sleep-config.c" file, the candidate states for sleep can
be restricted with this config file. This doesn't seem to be a new feature
in systemd; but maybe fedora changed the config file?

In any case, I can reproduce this with a Fedora 21 guest. In the guest:

# echo freeze > /sys/power/state

the console confirms "Entering freeze sleep", and I cannot wake up the
guest, not even with

# virsh qemu-monitor-command GUEST_NAME --hmp system_wakeup

I guess Fedora should prevent systemd from considering the freeze state in
virtual machines by default. Even the original kernel commit
7e73c5ae6e7991a6c01f6d096ff8afaef4458c36 states:

> Known Issue:
> The wakeup of this new PM_SUSPEND_FREEZE state may behave differently
> from the previous suspend state.
> Take ACPI platform for example, there are some GPEs that only enabled
> when the system is in sleep state, to wake the system backk from
> S3/S4.
> But we are not touching these GPEs during transition to
> PM_SUSPEND_FREEZE.
> This means we may lose some wake event.
> But on the other hand, as we do not disable all the Interrupts during
> PM_SUSPEND_FREEZE, we may get some extra "wakeup" Interrupts, that are
> not available for S3/S4.

Why thank you, we "may" lose some wake event (yes we do), and we "may" get
some extra wakeup interrupts (no we don't).

This is *utterly* unsafe even for physical hardware platforms -- if a user
enters "freeze", what guarantees that he'll be able to get out of it? The
current kernel code enables "freeze" unconditionally.

I guess on most phys hw platforms it is masked because S3 / S4 are always
present.

So I think this could be a kernel bug actually, but for now, Fedora should
ignore "freeze" for suspend options by default, at least in virtual
machines.

Comment 15 Laszlo Ersek 2015-09-28 20:41:33 UTC
Wow actually the following command *does* bring the guest out of freeze:

# virsh qemu-monitor-command GUEST_NAME --hmp system_powerdown

Equivalently, clicking Shut Down on the Virt Manager window works too.

(Inspired by acpi_button_notify() in "drivers/acpi/button.c" in the kernel.)

Comment 16 Cole Robinson 2016-05-02 20:26:56 UTC
Since as Laszlo discovered, the way to break out of this 'freeze' state is to hit the power button, this isn't really a bug on the qemu side. If anything I'd say systemd shouldn't advertise 'freeze' as a suspend impl since the semantics are quite different (at least on fedora), but that's a separate issue


Note You need to log in before you can comment on or make changes to this bug.