Bug 894085

Summary: libvirt: vm pauses after live storage migration
Product: Red Hat Enterprise Linux 6 Reporter: Dafna Ron <dron>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.4CC: abaron, acathrow, dallan, dyasny, dyuan, eblake, fsimonce, hateya, iheim, mzhan, pbonzini, rwu, tlavigne, weizhan, whuang, ykaul, zhpeng
Target Milestone: rcKeywords: TestBlocker
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.10.2-16.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 896225 (view as bug list) Environment:
Last Closed: 2013-02-21 07:29:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 892791    
Bug Blocks: 896225, 902691    
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-01-10 16:35:07 UTC
Created attachment 676411 [details]
logs

Description of problem:

I live migrated a vm's disk with rhevm and after the migration is done the vm pauses. 
looking at libvirt log I could see this: 

The command drive-mirror has not been found and The command block-job-complete has not been found

Version-Release number of selected component (if applicable):

sf3
vdsm-4.10.2-3.0.el6ev.x86_64
libvirt-0.10.2-14.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64

How reproducible:

100%

Steps to Reproduce:
1. run a vm
2. live migrate the disk
3.
  
Actual results:

vm pauses after the live migration is reported finished. 

Expected results:

vm should not pause

Additional info: logs

2013-01-10 14:49:02.340+0000: 14883: debug : qemuMonitorIOProcess:353 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7e9800d000 buf={"id": "libvirt-50", "error": {"class": "CommandNotFound", "desc": "The command drive-mirror has not been found", "data": {"name": "drive-mirror"}}}
2013-01-10 14:49:42.674+0000: 14883: debug : qemuMonitorIOProcess:353 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7e9800d000 buf={"id": "libvirt-86", "error": {"class": "CommandNotFound", "desc": "The command block-job-complete has not been found", "data": {"name": "block-job-complete"}}}
tail: /var/log/libvirtd.log: file truncated

Comment 1 Eric Blake 2013-01-10 16:45:03 UTC
(In reply to comment #0)
> Created attachment 676411 [details]
> logs
> 
> Description of problem:
> 
> I live migrated a vm's disk with rhevm and after the migration is done the
> vm pauses. 
> looking at libvirt log I could see this: 
> 
> The command drive-mirror has not been found and The command
> block-job-complete has not been found
> 
> Version-Release number of selected component (if applicable):
> 
> sf3
> vdsm-4.10.2-3.0.el6ev.x86_64
> libvirt-0.10.2-14.el6.x86_64
> qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64
> 
> How reproducible:
> 
> 100%
> 
> Steps to Reproduce:
> 1. run a vm
> 2. live migrate the disk
> 3.
>   
> Actual results:
> 
> vm pauses after the live migration is reported finished. 

I'm not seeing this behavior in the logs; right after the guest finished the block job, then libvirt resumed the guest:

2013-01-10 14:49:42.675+0000: 14883: debug : qemuMonitorIOWrite:461 : QEMU_MONIT
OR_IO_WRITE: mon=0x7f7e9800d000 buf={"execute":"__com.redhat_drive-reopen","argu
ments":{"device":"drive-virtio-disk0","new-image-file":"/rhev/data-center/afcde1
c5-6022-4077-ab06-2beed7e5e404/a45fadeb-2db0-4201-9635-0c7ed6c1fd22/images/3906b
f7c-8a7c-45fb-9b38-7b33e63c6817/b5f0db2f-386d-4009-8983-4338202e4bd1","format":"
qcow2"},"id":"libvirt-87"}
...
2013-01-10 14:49:42.915+0000: 14883: debug : qemuMonitorIOProcess:353 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7e9800d000 buf={"return": {}, "id": "libvirt-87"}
...
2013-01-10 14:49:42.916+0000: 14887: debug : qemuMonitorSend:903 : QEMU_MONITOR_SEND_MSG: mon=0x7f7e9800d000 msg={"execute":"cont","id":"libvirt-88"}

> Additional info: logs
> 
> 2013-01-10 14:49:02.340+0000: 14883: debug : qemuMonitorIOProcess:353 :
> QEMU_MONITOR_IO_PROCESS: mon=0x7f7e9800d000 buf={"id": "libvirt-50",
> "error": {"class": "CommandNotFound", "desc": "The command drive-mirror has
> not been found", "data": {"name": "drive-mirror"}}}
> 2013-01-10 14:49:42.674+0000: 14883: debug : qemuMonitorIOProcess:353 :
> QEMU_MONITOR_IO_PROCESS: mon=0x7f7e9800d000 buf={"id": "libvirt-86",
> "error": {"class": "CommandNotFound", "desc": "The command
> block-job-complete has not been found", "data": {"name":
> "block-job-complete"}}}

These two are red herrings - they are expected errors.  Libvirt always tries the upstream spelling first, then falls back to the downstream RHEL spelling of __com.redhat_drive-mirror and __com.redhat_drive-reopen.  They have no bearing on this bug report.

Comment 2 Eric Blake 2013-01-10 16:48:24 UTC
I wonder if the real issue is that libvirt is not issuing an event at the right time; further IRC conversation says:
# virsh -r list
 Id    Name                           State
----------------------------------------------------
 2     TEST7                          running
# vdsClient -s 0 list table
454d8222-99f4-4251-9cac-d6817c10a02d 23704 TEST7 Paused

so vdsm doesn't know that the guest resumed, even though libvirt does.  I'm looking at the code now where libvirt pauses around the snapshot, to see if there is a missing event.

Comment 3 Eric Blake 2013-01-10 16:58:15 UTC
If I'm right about it just being a case of libvirt sending a pause event but not a resume event, then you can use the workaround of

virsh "suspend TEST7; resume TEST7"

to force libvirt to emit events that let vdsm get back in sync.  Also, there was a recent upstream patch concerned with emitting events for RESUME from qemu; maybe that will help in this case:

commit aedfcce33e4c2f266668a39fd655574fe34f1265
Author: Andres Lagar-Cavilla <andres>
Date:   Mon Jan 7 16:25:01 2013 -0500

    Add RESUME event listener to qemu monitor.
    
    Perform all the appropriate plumbing.
    
    When qemu/KVM VMs are paused manually through a monitor not-owned by libvirt,   libvirt will think of them as "paused" event after they are resumed and
    effectively running. With this patch the discrepancy goes away.
    
    This is meant to address bug 892791.
    
    Signed-off-by: Andres Lagar-Cavilla <andres>

Comment 5 Huang Wenlong 2013-01-11 06:50:13 UTC
I can reproduce this bug :
libvirt-0.10.2-14.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64
vdsm-4.10.2-1.1.el6.x86_64


After disk live migration finish, the vm will be paused

Comment 6 Paolo Bonzini 2013-01-14 11:07:31 UTC
Is it _really_ paused, or just marked as such?

Eric, do we need a qemu-kvm BZ for the event of comment 3?

Comment 7 Eric Blake 2013-01-14 14:10:01 UTC
(In reply to comment #6)
> Is it _really_ paused, or just marked as such?
> 
> Eric, do we need a qemu-kvm BZ for the event of comment 3?

The guest is running from both qemu and libvirt's perspective - the only entity that still thinks the guest is paused is VDSM.  But the bug is in libvirt for not giving VDSM enough events for VDSM to track state correctly.

Comment 9 Eric Blake 2013-01-15 16:45:53 UTC
I'm currently testing this patch, and will move the bug into POST if it works:
http://post-office.corp.redhat.com/archives/rhvirt-patches/2013-January/msg00197.html

Comment 11 Dafna Ron 2013-01-15 17:30:24 UTC
I tested Eric's build on my setup with iscsi storage and the migration succeeded, vm did not pause after migration.

Comment 12 Eric Blake 2013-01-15 21:58:17 UTC
Moving to POST based on comment 11, and the following analysis:

At first, I was wondering if we have a bigger problem here.  Right now, the only reason that VDSM sees either event is because the event is being emitted by a helper thread prior to returning lock to the main thread.  This feels like a race window, and I wanted to be sure the race is consistently being won by the qemu event handling.  The potential race occurs in at least these places: qemuDomainCoreDump (virsh dump --live), qemuDomainBlockPivot (virsh blockjob --pivot), qemuDomainSnapshotCreateActiveInternal (virsh snapshot-create-as --live --memspec filename)

In all of those functions, there is no explicit event issued for either the call to qemuProcessStopCPUs (suspending the domain) or qemuProcessStartCPUs (resuming the domain), except that when startCPUs fails, an explicit SUSPENDED_API_ERROR event is issued.  So, if you see any event at all, it was from this sequence of code:

main thread calls virDomainObjGetState, sees domain is running
main thread calls qemuProcessStopCPUs
main thread drops lock with qemuDomainObjEnterMonitorAsync
main thread calls qemuMonitorStopCPUs
helper thread gets response from monitor, and response includes the event
helper thread calls qemuProcessHandleStop
helper thread requests lock with virDomainObjLock
helper thread calls virDomainObjGetState, sees domain is still running
helper thread changes libvirt state to paused with virDomainObjSetState
helper thread raises the SUSPENDED_PAUSED event
helper thread drops lock with virDomainObjUnlock
main thread regains lock with qemuDomainObjExitMonitorWithDriver
main thread changes libvirt state to paused with virDomainObjSetState

(similarly with qemuProcessStartCPUs and qemuProcessHandleResume)

As long as we can guarantee that qemu emits the event _prior_ to the return value for the acknowledgment of the successful pause, then the event will ALWAYS be emitted by the helper thread.  But if qemu could ever complete the qemuMonitorStopCPUs action prior to emitting an event, then it would be possible to have this alternate sequence of events:

main thread calls virDomainObjGetState, sees domain is running
main thread calls qemuProcessStopCPUs
main thread drops lock with qemuDomainObjEnterMonitorAsync
main thread calls qemuMonitorStopCPUs
helper thread gets response from monitor, event has not been seen yet
main thread regains lock with qemuDomainObjExitMonitorWithDriver
main thread changes libvirt state to paused with virDomainObjSetState
helper thread finally sees an event
helper thread calls qemuProcessHandleStop
helper thread requests lock with virDomainObjLock
helper thread calls virDomainObjGetState, sees domain is already paused
helper thread discards the event
helper thread drops lock with virDomainObjUnlock

But it looks like we are lucky that the second result does not happen - this is what I see in the logs coming from qemu:

2013-01-10 14:49:05.463+0000: 14883: debug : qemuMonitorIOWrite:461 : QEMU_MONITOR_IO_WRITE: mon=0x7f7e9800d000 buf={"execute":"stop","id":"libvirt-54"}
 len=38 ret=38 errno=11
...
2013-01-10 14:49:05.471+0000: 14883: debug : qemuMonitorIOProcess:353 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7e9800d000 buf={"timestamp": {"seconds": 1357829345, "microseconds": 470765}, "event": "STOP"}
{"return": {}, "id": "libvirt-54"}
 len=117

That is, we are always seeing the event emitted prior to returning control to the main thread, so we don't have to worry about the race in RHEL.  It may still be worth worrying about upstream, if qemu is ever changed to emit the event after the return for an individual command, but for now, I'm confident that we have solved the problem for RHEL, and am moving this to POST.

Comment 14 Huang Wenlong 2013-01-17 05:52:26 UTC
Verify this bug with :
libvirt-0.10.2-16.el6.x86_64
vdsm-4.10.2-2.0.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.352.el6.x86_64


1) prepare 2 nfs storage domain A and B 

2) start a guest in A 

3) In the Disks tab you can "move" the guest's image from A to B 

4) After disk live migration  the guest is still running ,not pause

Comment 15 errata-xmlrpc 2013-02-21 07:29:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0276.html