the current releaseVm() procedure does: try: self._dom.destroyFlags( libvirt.VIR_DOMAIN_DESTROY_GRACEFUL) except libvirt.libvirtError as e: if (e.get_error_code() == libvirt.VIR_ERR_OPERATION_FAILED): self.log.warn("Failed to destroy VM '%s' " "gracefully", self.conf['vmId']) time.sleep(30) self._dom.destroy() The sleep time can cause problems since the VDSM destroy() API call is synchronous and here we can wait for too long for VM to die…causing a timeouts on the caller side. Moreover libvirt does it's own waiting on its own (10s TERM + 5s KILL, hardcoded) Also we should think about what to do better/different when the destroy() fails. We should probably really have something like "unknown" state in VDSM
Taking the bug. Let's starting by the biggest and worst offender, this time.sleep(30)
Verified in vdsm-4.16.0-42.git3bfad86.el6.x86_64 (oVirt 3.5 beta2). Verification steps: 1. Rename qemu-kvm binary to qemu-kvm-real and replace qemu-kvm by a shell wrapper which ignores SIGTERM signals (see attachment). 2. Run a VM on the host. 3. Power off the VM, wait until is down and check vdsm.log for how long it took. Result: It takes now 25 seconds from the first sent SIGTERM till the final SIGKILL. From vdsm.log: 2014-08-04 15:08:05.822+0000: 2264: debug : virDomainDestroyFlags:2238 : dom=0x7f3940066f30, (VM: name=simple-vm, uuid=8171a0ff-7fce-4df3-ae88-b3cac2ddc87a), flags=1 2014-08-04 15:08:05.822+0000: 2264: debug : qemuProcessKill:4160 : vm=simple-vm pid=27686 flags=0 2014-08-04 15:08:20.831+0000: 2264: warning : qemuProcessKill:4224 : Timed out waiting after SIGTERM to process 27686 2014-08-04 15:08:20.831+0000: 2264: error : qemuDomainDestroyFlags:2094 : operation failed: failed to kill qemu process with SIGTERM 2014-08-04 15:08:20.831+0000: 2264: debug : virDomainFree:2281 : dom=0x7f3940066f30, (VM: name=simple-vm, uuid=8171a0ff-7fce-4df3-ae88-b3cac2ddc87a) 2014-08-04 15:08:20.833+0000: 2261: debug : virDomainDestroy:2172 : dom=0x7f391c05e890, (VM: name=simple-vm, uuid=8171a0ff-7fce-4df3-ae88-b3cac2ddc87a) 2014-08-04 15:08:20.833+0000: 2261: debug : qemuProcessKill:4160 : vm=simple-vm pid=27686 flags=1 2014-08-04 15:08:30.840+0000: 2261: warning : qemuProcessKill:4192 : Timed out waiting after SIGTERM to process 27686, sending SIGKILL
Created attachment 923929 [details] qemu-kvm shell wrapper
oVirt 3.5 has been released and should include the fix for this issue.