Bug 878966 - virsh setmem then dompmsuspend to disk will hang forever
Summary: virsh setmem then dompmsuspend to disk will hang forever
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: John Ferlan
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 912287 975376
TreeView+ depends on / blocked
 
Reported: 2012-11-21 16:41 UTC by Luiz Capitulino
Modified: 2016-04-27 02:25 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 872420
: 975376 (view as bug list)
Environment:
Last Closed: 2016-03-24 00:57:46 UTC
Embargoed:


Attachments (Terms of Use)

Description Luiz Capitulino 2012-11-21 16:41:26 UTC
Libvirt and/or virsh are unable to detect stale QMP/qemu-ga responses. Most of the time such stale responses are caused by bugs or malfunctioning components down the management stack (ie. outside of libvirt's scope). However, being able to detect stale responses would make libvirt and/or virsh more robust.

As an example, consider the (summarized) bug bellow, which wouldn't happen for the end user if libvirt and/or virsh were able to detect the missing response.

+++ This bug was initially created as a clone of Bug #872420 +++

Description of problem:
virsh setmem then dompmsuspend to disk will hang forever

Version-Release number of selected component (if applicable):
libvirt-0.10.2-6.el6.x86_64
qemu-guest-agent-0.12.1.2-2.333

How reproducible:
80%

Steps to Reproduce:
Prepare a 4G memory guest and start it.

[root@zhpeng ~]# virsh dompmsuspend aaa --target disk
Domain aaa successfully suspended               --------------> no problem
[root@zhpeng ~]# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     aaa                            shut off

[root@zhpeng ~]# virsh start aaa
Domain aaa started

[root@zhpeng ~]# virsh dompmsuspend aaa --target disk
Domain aaa successfully suspended                  -----------> no problem
[root@zhpeng ~]# virsh start aaa
Domain aaa started

[root@zhpeng ~]# virsh setmem --live aaa 2048000

[root@zhpeng ~]# virsh dompmsuspend aaa --target disk      ------------> it hangs forever

--- Additional comment from Luiz Capitulino on 2012-11-21 11:23:22 EST ---

The root cause of this problem is that pm-hibernate in RHEL6.4 does not return a failure exit code when suspending fails. It does in Fedora though, so only RHEL is affected.

Here's a quick reproducer:

1. Start a qemu VM with 2 gigas of RAM and RHEL6.4 as a guest (comment 10 has a command-line example)

2. As soon as the guest has booted, change to qemu's monitor and run:

(qemu) balloon 700

3. Then log into the system and check that hibernate will fail:

# echo disk > /sys/power/state
bash: echo: write error: Cannot allocate memory

4. Then try it with pm-hibernate

# pm-hibernate
# echo $?
0

On F16 pm-hibernate successfully detects the error and returns 128.

Some additional comments:

1. qemu-ga doesn't hang. Actually, it's acting as expected: pm-hiberate reports success, so qemu-ga assumes that suspending succeeded and doesn't emit a success response (see last paragraph of comment 15 for more details)

2. libvirt and/or virsh are also buggy, as they should have a timeout to detect stale responses (will clone this bz for libvirt)

3. As a workaround, you could remove the pm-utils package (however having pm-utils installed is *strongly* recommended on regular usage)

Comment 2 Jiri Denemark 2012-11-26 13:48:16 UTC
libvirt doesn't like to be in the business of inventing hard-coded
timeouts. And because of that, we will either have to introduce a new
API for PM suspend which supports specifying a timeout or provide a
way of cancelling APIs waiting for a reply from guest agent. Neither
of these can be done in 6.4.

Comment 3 Luiz Capitulino 2012-11-27 11:36:22 UTC
Agreed this is not for 6.4. It's really something for the future.

Comment 9 Cole Robinson 2016-03-24 00:57:46 UTC
This seems like the type of thing that is just going to sit dormant forever until there's another real issue we are hit by, so closing as DEFERRED


Note You need to log in before you can comment on or make changes to this bug.