Description of problem: In a customer case, it appears that qemu has died unexpectedly (we think) after a (successful) outgoing migration, before RHEV/vdsm could clean up the "stub" qemu instance. vdsm then gets an error when it attempts to remove the "stub" VM. A "Down" entry for the stub VM remains in vdsm's list indefinitely preventing further migrations. Version-Release number of selected component (if applicable): vdsm-4.10.2-24.1.el6ev.x86_64 libvirt-0.10.2-18.el6_4.9.x86_64 qemu-kvm-rhev-0.12.1.2-2.355.el6_4.7.x86_64 How reproducible: Unknown. Steps to Reproduce: These steos aren't tested, but should work in theory. 1. Modify qemu to abort() after completing an outgoing migration 2. With the modified qemu on the source hypervisor, migrate a VM in RHEV. Actual results: a) RHEV won't permit the VM to be migrated again b) On the migration source host, an entry still shows in vdsm for the VM. e.g. # vdsClient -s 0 getAllVmStates [...] XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX Status = Down hash = 7231255206245563831 exitMessage = Migration succeeded timeOffset = 0 exitCode = 0 Expected results: An error is still logged, due to the genuine problem with qemu, but the stub VM entry is still cleaned up and further migrations are permitted. Additional info:
please attach vsdm,libvirt and qemu logs
Created attachment 879723 [details] VDSM log Here's the vdsm.log file, the interesting events happen around 2014-03-19 13:51:12,329
Created attachment 879724 [details] libvirtd log This is the libvirtd log. (The rotation is the one that matches the interesting events in vdsm.log)
Created attachment 879725 [details] qemu log for the VM in question Attaching qemu log. Very little here, unfortunately :(
Works for me. I've attached the other bug to my case. *** This bug has been marked as a duplicate of bug 985770 ***