Bug 892791 - Libvirt does not follow RESUME qemu monitor events. VMs remain in "paused" state forever.
Summary: Libvirt does not follow RESUME qemu monitor events. VMs remain in "paused" st...
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Virtualization Tools
Classification: Community
Component: libvirt
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Libvirt Maintainers
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 894085 896225
TreeView+ depends on / blocked
 
Reported: 2013-01-07 21:15 UTC by Andres Lagar-Cavilla
Modified: 2013-01-16 21:22 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-01-09 16:17:05 UTC
Embargoed:


Attachments (Terms of Use)

Description Andres Lagar-Cavilla 2013-01-07 21:15:35 UTC
Description of problem:
If a qemu/KVM VM is paused through a monitor by manual issuing of the "stop" command, the state of the VM in libvirtd's view will transition to "paused". This is because libvirtd listens to "STOP" events in the JSON monitor. However, libvirt does not listen to RESUME events on any monitor. So, when the VM is resumed by manually issuing "cont", the internal state will remain as "paused" even though the VM is running.

Libvirt maintains its internal view of the state in sync for migration, etc. But without listening to RESUME events it cannot correctly cope with third parties issuing stop commands (such as GDB, or software opening another QMP monitor).

Version-Release number of selected component (if applicable):
Verified to happen on master branch of git, 0.9.8 and 0.9.13

How reproducible:
Simply issue a stop command in a monitor not owned by libvirt.

Steps to Reproduce:
1. Boot a qemu/KVM domain.
2. Issue "stop" on a monitor.
3. Issue "cont" on a monitor.
  
Actual results:
Virsh list, or virsh domstate, will show the VM as paused.

Client software like OpenStack will tag the VM as paused.

Expected results:
The state of the VM should be "running". Virsh list, etc, should reflect that.

Additional info:
Not listening to RESUME events is in itself a problem of lack of completeness. While additional monitors are not customary in Libvirt-managed VMs, there are many reasonable use cases for them. One example is GDB debugging, but more generally any software that wishes to perform introspection on VMs (and thus temporarily stop a VM, get register state, state).

Comment 1 Eric Blake 2013-01-07 21:56:26 UTC
Issuing commands on a monitor behind libvirt's back is unsupported.  Once libvirt is managing a domain, then you should use libvirt, rather than the monitor, for all state changes of that guest.

Comment 2 Andres Lagar-Cavilla 2013-01-08 16:33:31 UTC
Eric,

I have misled you by overstating the importance of the "other" monitor.

This is an issue in itself due to lack of completeness of monitor callback handling in the library.

I will illustrate with a simple example:
# virsh list
 Id    Name                           State
----------------------------------------------------
 1     instance-00000020              running

# virsh qemu-monitor-command 1 '{"execute":"stop"}'
{"return":{},"id":"libvirt-10"}

# virsh list
 Id    Name                           State
----------------------------------------------------
 1     instance-00000020              paused

# virsh qemu-monitor-command 1 '{"execute":"cont"}'
{"return":{},"id":"libvirt-11"}

# virsh list
 Id    Name                           State
----------------------------------------------------
 1     instance-00000020              paused


As an additional example, if I attach GDB to qemu and start single-stepping, libvirt will drop dozens RESUME events and be mightily confused.

Hope this helps in clarifying.

Andres

Comment 3 Andres Lagar-Cavilla 2013-01-08 16:34:38 UTC
Patch sent to the list:
https://www.redhat.com/archives/libvir-list/2013-January/msg00381.html

Thanks
Andres

Comment 4 Andres Lagar-Cavilla 2013-01-09 15:21:59 UTC
Merged in git master
http://libvirt.org/git/?p=libvirt.git;a=commit;h=aedfcce33e4c2f266668a39fd655574fe34f1265


Note You need to log in before you can comment on or make changes to this bug.