Bug 1081275

Summary: If qemu dies after migration, vdsm doesn't clean up the vm
Product: Red Hat Enterprise Virtualization Manager Reporter: David Gibson <dgibson>
Component: vdsmAssignee: Vinzenz Feenstra [evilissimo] <vfeenstr>
Status: CLOSED DUPLICATE QA Contact: meital avital <mavital>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.0CC: bazulay, dgibson, iheim, lpeer, michal.skrivanek, ofrenkel, vfeenstr, yeylon
Target Milestone: ---   
Target Release: 3.3.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-04-02 00:13:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
VDSM log
none
libvirtd log
none
qemu log for the VM in question none

Description David Gibson 2014-03-26 23:54:19 UTC
Description of problem:

In a customer case, it appears that qemu has died unexpectedly (we think) after a (successful) outgoing migration, before RHEV/vdsm could clean up the "stub" qemu instance.

vdsm then gets an error when it attempts to remove the "stub" VM.  A "Down" entry for the stub VM remains in vdsm's list indefinitely preventing further migrations.

Version-Release number of selected component (if applicable):

vdsm-4.10.2-24.1.el6ev.x86_64
libvirt-0.10.2-18.el6_4.9.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6_4.7.x86_64

How reproducible:

Unknown.

Steps to Reproduce:

These steos aren't tested, but should work in theory.

1. Modify qemu to abort() after completing an outgoing migration
2. With the modified qemu on the source hypervisor, migrate a VM in RHEV.

Actual results:

a) RHEV won't permit the VM to be migrated again
b) On the migration source host, an entry still shows in vdsm for the VM.  e.g.

# vdsClient -s 0 getAllVmStates
[...]
XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
        Status = Down
        hash = 7231255206245563831
        exitMessage = Migration succeeded
        timeOffset = 0
        exitCode = 0

Expected results:

An error is still logged, due to the genuine problem with qemu, but the stub VM entry is still cleaned up and further migrations are permitted.

Additional info:

Comment 1 Omer Frenkel 2014-03-27 07:13:35 UTC
please attach vsdm,libvirt and qemu logs

Comment 2 David Gibson 2014-03-28 03:58:08 UTC
Created attachment 879723 [details]
VDSM log

Here's the vdsm.log file, the interesting events happen around 2014-03-19 13:51:12,329

Comment 3 David Gibson 2014-03-28 03:59:00 UTC
Created attachment 879724 [details]
libvirtd log

This is the libvirtd log.  (The rotation is the one that matches the interesting events in vdsm.log)

Comment 4 David Gibson 2014-03-28 03:59:38 UTC
Created attachment 879725 [details]
qemu log for the VM in question

Attaching qemu log.  Very little here, unfortunately :(

Comment 9 David Gibson 2014-04-02 00:13:22 UTC
Works for me.  I've attached the other bug to my case.

*** This bug has been marked as a duplicate of bug 985770 ***