Bug 754128

Summary: Shutting Down VM changes its state to "Pause" for 10sec
Product: Red Hat Enterprise Linux 6 Reporter: Ortal <ogvura>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: yeylon <yeylon>
Severity: high Docs Contact:
Priority: medium    
Version: 6.1CC: acathrow, ajia, dallan, danken, hateya, jdenemar, msvoboda, mzhan, rwu, srevivo, ydu, zhpeng
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.9.9-0rc1.el6 Doc Type: Bug Fix
Doc Text:
When shutting down, a virtual machine had changed its status from the "Up" state to the "Paused" state before it was shutdown. The "Paused" state represented the state when the guest had been already stopped, but QEMU was flushing its internal buffers and was waiting for libvirt to kill it. But this state change confused users so this update adds respective events and modifies libvirt to use the "shutdown" state. A virtual machine now moves from the "Up" to "Powering Down" and then to "Down" state.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 06:36:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 770959, 773650, 773651, 773677, 773696    
Attachments:
Description Flags
logs.tar none

Description Ortal 2011-11-15 13:45:55 UTC
Description of problem:

LIBVIRT [libvirt-0.9.4-21.el6.x86_64] - Shutting Down VM changes its state to "Pause" for 10sec

Version-Release number of selected component (if applicable):
RHEVM Version ic148
Host:
RHEL6.2 with updates
[root@master-vds13 log]# rpm -qa | grep vdsm
vdsm-4.9-110.el6.x86_64
vdsm-cli-4.9-110.el6.x86_64
[root@master-vds13 log]# rpm -qa | grep libvirt
libvirt-python-0.9.4-21.el6.x86_64
libvirt-0.9.4-21.el6.x86_64
libvirt-client-0.9.4-21.el6.x86_64
[root@master-vds13 log]# rpm -qa | grep qemu
gpxe-roms-qemu-0.9.7-6.9.el6.noarch
qemu-kvm-debuginfo-0.12.1.2-2.207.el6.x86_64
qemu-kvm-0.12.1.2-2.207.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.207.el6.x86_64
qemu-img-0.12.1.2-2.207.el6.x86_64

How reproducible:
Always

Steps to Reproduce:

1. Open RHEVM Admin Portal and attempt to shut down gracefully the VM.
2. The VM change status from "Up" to "preparing to down"
3. The VM status moved to "Paused" for 10 secs
4. The VM shutdown.

  
Actual results:

The VM shouldn't move to Pause state, is there a change you did???

Thread-45490::DEBUG::2011-11-14 10:33:42,441::task::495::TaskManager.Task::(_debug) Task 60c078f2-0117-4d34-bd2b-020066d0097f: moving from state preparing -> state finished
Thread-45490::DEBUG::2011-11-14 10:33:42,442::resourceManager::786::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-45490::DEBUG::2011-11-14 10:33:42,442::resourceManager::821::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-45490::DEBUG::2011-11-14 10:33:42,442::task::495::TaskManager.Task::(_debug) Task 60c078f2-0117-4d34-bd2b-020066d0097f: ref 0 aborting False
Thread-45490::INFO::2011-11-14 10:33:42,443::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: repoStats, Return response: {'status': {'message': 'OK', 'code': 0}, u'257425d7-ad6c-4569-89f6-45b52
e6d0a4f': {'delay': '0.0251710414886', 'lastCheck': 1321259613.923393, 'valid': True, 'code': 0}, u'8c6c9c77-4525-454b-91c0-f02bd06511c5': {'delay': '0.0909280776978', 'lastCheck': 1321259621.8530221, 'valid': Tru
e, 'code': 0}, u'36141546-c0ce-4d05-ab43-67e4b0c93e1f': {'delay': '0.0122480392456', 'lastCheck': 1321259613.923548, 'valid': True, 'code': 0}, u'ab800c64-3caa-4c28-b61f-c3073d487c50': {'delay': '0.00139307975769'
, 'lastCheck': 1321259621.2020209, 'valid': True, 'code': 0}}
libvirtEventLoop::DEBUG::2011-11-14 10:33:42,696::libvirtvm::1369::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`a3e50c9e-9518-4198-9ec9-dbe0a5ff14bd`::event Suspended detail 0 opaque None
Thread-45008::ERROR::2011-11-14 10:33:43,284::guestIF::246::vm.Vm::(run) vmId=`a3e50c9e-9518-4198-9ec9-dbe0a5ff14bd`::Run exception: None
Traceback (most recent call last):
  File "/usr/share/vdsm/guestIF.py", line 238, in run
    line = self._readLine()
  File "/usr/share/vdsm/guestIF.py", line 214, in _readLine
    self._readBuffer()
  File "/usr/share/vdsm/guestIF.py", line 200, in _readBuffer
    s = self._sock.recv(self.READSIZE)

Expected results:
The VM shouldn't move to "Pause" state during Shut Down process.

Additional info:

Attached relevant logs: libvirt, qemu, and vdsm

Comment 1 Ortal 2011-11-15 13:47:47 UTC
Created attachment 533769 [details]
logs.tar

Comment 3 Michal Privoznik 2011-11-15 14:41:31 UTC
I don't think this is a bug. One needs to distinguish basically three states on guest shutdown: guest runing; guest OS halted but qemu preparing to be killed (e.g. flushing its internal caches); qemu halted;

When transitioning from the 1st do the 2nd state, libvirt emits an event {PAUSED, VIR_DOMAIN_PAUSED_SHUTTING_DOWN}. This in fact reflects reality where guest OS is not responding anymore, because we are performing shutdown. Please note, that this state can last very long, e.g. when qemu wants to flush data on stale NFS. We definitely can not show VM as running during this stage.

However, if there is wider consensus this is a bug, I think vdsm could change reporting of state to RHEV;

Comment 5 zhpeng 2011-11-28 09:41:32 UTC
I can reproduce it with:
kernel-2.6.32-221.el6.x86_64

libvirt-0.9.4-23.el6.x86_64
libvirt-client-0.9.4-23.el6.x86_64
libvirt-python-0.9.4-23.el6.x86_64

qemu-img-0.12.1.2-2.210.el6.x86_64
qemu-kvm-0.12.1.2-2.210.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.210.el6.x86_64
gpxe-roms-qemu-0.9.7-6.9.el6.noarch

vdsm-4.9-110.el6.x86_64
vdsm-cli-4.9-110.el6.x86_64

the same log result

Comment 6 Jiri Denemark 2011-12-01 08:01:57 UTC
*** Bug 758648 has been marked as a duplicate of this bug. ***

Comment 7 Jiri Denemark 2011-12-01 08:08:57 UTC
We agreed that we should actually use our (currently unused by qemu driver) "shutdown" state instead of "paused" and issue the event accordingly. With this change, states will transition as follows:

running -> shutdown -> shutoff

I'll send patches upstream soon.

Comment 9 Jiri Denemark 2011-12-05 15:07:13 UTC
This is now fixed upstream by v0.9.8-rc2-5-g38527c9:

commit 38527c9ae0d398a996e1e3c0897a104e3ffd1256
Author: Jiri Denemark <jdenemar>
Date:   Wed Nov 30 15:31:45 2011 +0100

    qemu: Rework handling of shutdown event
    
    When QEMU guest finishes its shutdown sequence, qemu stops virtual CPUs
    and when started with -no-shutdown waits for us to kill it using
    SGITERM. Since QEMU is flushing its internal buffers, some time may pass
    before QEMU actually dies. We mistakenly used "paused" state (and
    events) for this which is quite confusing since users may see a domain
    going to pause while they expect it to shutdown. Since we already have
    "shutdown" state with "the domain is being shut down" semantics, we
    should use it for this state.
    
    However, the state didn't have a corresponding event so I created one
    and called its detail as VIR_DOMAIN_EVENT_SHUTDOWN_FINISHED (guest OS
    finished its shutdown sequence) with the intent to add
    VIR_DOMAIN_EVENT_SHUTDOWN_STARTED in the future if we have a
    sufficiently capable guest agent that can notify us when guest OS starts
    to shutdown.

Comment 10 Jiri Denemark 2011-12-05 15:10:45 UTC
Note, that vdsm/rhev will need to be adapted to this change. The "shutdown" state is in libvirt for ages but it hasn't been used by qemu driver until now so there's a chance vdsm doesn't handle it. And even if it can cope with this state, it will certainly need to be modified to handle the new shutdown event.

Comment 11 Dan Kenigsberg 2011-12-16 11:12:27 UTC
Since this bug has visible effects on end rhev-3.0 users, I would like to request backporting this to 6.2.z. However,

(In reply to comment #10)
> Note, that vdsm/rhev will need to be adapted to this change. The "shutdown"
> state is in libvirt for ages but it hasn't been used by qemu driver until now
> so there's a chance vdsm doesn't handle it. And even if it can cope with this
> state, it will certainly need to be modified to handle the new shutdown event.

changing vdsm to cope with this change in an async z-stream may require some thinking. An unexpected event is of no problem to Vdsm. The only problem I see is if Vdsm is restarted and finds a Vm in shutdown state - since it is not VIR_DOMAIN_RUNNING, we would still report the Vm as Paused.

Comment 13 yanbing du 2011-12-31 05:29:14 UTC
Test with libvirt-0.9.9-0rc1.el6.x86_64, Bug Verified.
Shut down a running guest in RHEVM, the guest's status will transition as follows:
Up -> Powering Down -> Down.

Comment 14 Miroslav Svoboda 2012-01-17 12:59:24 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When shutting down, a virtual machine had changed its status from the "Up" state to the "Paused" state before it was shutdown. The "Paused" state represented the state when the guest had been already stopped, but QEMU was flushing its internal buffers and was waiting for libvirt to kill it. But this state change confused users so this update adds respective events and modifies libvirt to use the "shutdown" state. A virtual machine now moves from the "Up" to "Powering Down" and then to "Down" state.

Comment 16 errata-xmlrpc 2012-06-20 06:36:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0748.html