Bug 1091389
| Summary: | releaseVm() takes too much time when VM doesn't want to die | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Retired] oVirt | Reporter: | Michal Skrivanek <michal.skrivanek> | ||||
| Component: | vdsm | Assignee: | Francesco Romani <fromani> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Pavel Novotny <pnovotny> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.4 | CC: | amureini, bazulay, bugs, gklein, iheim, jentrena, mavital, mgoldboi, mkalinin, rbalakri, rhodain, s.kieske, yeylon | ||||
| Target Milestone: | --- | Keywords: | Triaged | ||||
| Target Release: | 3.5.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | virt | ||||||
| Fixed In Version: | v4.16.0 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-10-17 12:20:40 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1081962 | ||||||
| Attachments: |
|
||||||
Taking the bug. Let's starting by the biggest and worst offender, this time.sleep(30) Verified in vdsm-4.16.0-42.git3bfad86.el6.x86_64 (oVirt 3.5 beta2). Verification steps: 1. Rename qemu-kvm binary to qemu-kvm-real and replace qemu-kvm by a shell wrapper which ignores SIGTERM signals (see attachment). 2. Run a VM on the host. 3. Power off the VM, wait until is down and check vdsm.log for how long it took. Result: It takes now 25 seconds from the first sent SIGTERM till the final SIGKILL. From vdsm.log: 2014-08-04 15:08:05.822+0000: 2264: debug : virDomainDestroyFlags:2238 : dom=0x7f3940066f30, (VM: name=simple-vm, uuid=8171a0ff-7fce-4df3-ae88-b3cac2ddc87a), flags=1 2014-08-04 15:08:05.822+0000: 2264: debug : qemuProcessKill:4160 : vm=simple-vm pid=27686 flags=0 2014-08-04 15:08:20.831+0000: 2264: warning : qemuProcessKill:4224 : Timed out waiting after SIGTERM to process 27686 2014-08-04 15:08:20.831+0000: 2264: error : qemuDomainDestroyFlags:2094 : operation failed: failed to kill qemu process with SIGTERM 2014-08-04 15:08:20.831+0000: 2264: debug : virDomainFree:2281 : dom=0x7f3940066f30, (VM: name=simple-vm, uuid=8171a0ff-7fce-4df3-ae88-b3cac2ddc87a) 2014-08-04 15:08:20.833+0000: 2261: debug : virDomainDestroy:2172 : dom=0x7f391c05e890, (VM: name=simple-vm, uuid=8171a0ff-7fce-4df3-ae88-b3cac2ddc87a) 2014-08-04 15:08:20.833+0000: 2261: debug : qemuProcessKill:4160 : vm=simple-vm pid=27686 flags=1 2014-08-04 15:08:30.840+0000: 2261: warning : qemuProcessKill:4192 : Timed out waiting after SIGTERM to process 27686, sending SIGKILL Created attachment 923929 [details]
qemu-kvm shell wrapper
oVirt 3.5 has been released and should include the fix for this issue. |
the current releaseVm() procedure does: try: self._dom.destroyFlags( libvirt.VIR_DOMAIN_DESTROY_GRACEFUL) except libvirt.libvirtError as e: if (e.get_error_code() == libvirt.VIR_ERR_OPERATION_FAILED): self.log.warn("Failed to destroy VM '%s' " "gracefully", self.conf['vmId']) time.sleep(30) self._dom.destroy() The sleep time can cause problems since the VDSM destroy() API call is synchronous and here we can wait for too long for VM to dieā¦causing a timeouts on the caller side. Moreover libvirt does it's own waiting on its own (10s TERM + 5s KILL, hardcoded) Also we should think about what to do better/different when the destroy() fails. We should probably really have something like "unknown" state in VDSM