| Summary: | [vdsm] vdsm reports vm as Not Responding while qemu process dead. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | David Naori <dnaori> | ||||
| Component: | vdsm | Assignee: | Igor Lvovsky <ilvovsky> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | David Naori <dnaori> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.1 | CC: | abaron, bazulay, dnaori, hateya, iheim, lpeer, mgoldboi, syeghiay, ykaul | ||||
| Target Milestone: | rc | Keywords: | Regression | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | vdsm-4.9-59.el6 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-08-19 15:27:37 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
The real reason for stuck in 'Powering Down' state it's destroy failure and this is right. We can't switch VM state to 'Down' before we release all its resources. So, if you will issue destroy command again it will succeed. The additional problem here, that we need to do same things (release VM resources) when we get onQemuDeath event. Tested on vdsm-4.9-58.el6:
libvirtEventLoop::INFO::2011-04-10 13:17:16,363::vm::995::vm.Vm::(_onQemuDeath)
vmId=`cbc17805-d25c-469c-aa76-53237d4d6df2`::underlying process disconnected
libvirtEventLoop::INFO::2011-04-10
13:17:16,440::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and
protect: teardownVolume, args: ( sdUUID=b9750b78-f531-4161-ac3e-fe1805297861
spUUID=cc174bff-13d3-4ff4-a5cb-8ce6d019629e
imgUUID=294a7fc5-d61d-4006-aa8b-3c24ae9d762f
volUUID=f1b12b58-215a-49bc-9530-41908f668770 rw=False)
libvirtEventLoop::INFO::2011-04-10
13:17:19,660::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and
protect: teardownVolume, Return response: {'status': {'message': 'OK'
, 'code': 0}}
teardown executed after OnQemuDeath, but no call destroy.
[root@camel-vdsb tmp]# vdsClient -s 0 list table
cbc17805-d25c-469c-aa76-53237d4d6df2 5294 LIBVIRT-TEST-001 Powering
down*
the vm still in Powering down* state.
expected result: destroy should be called after onQemuDeath and teardown volume. (even if destroy was called before)
Tested a patch:
vm.py:
def _onQemuDeath(self):
self.log.info('underlying process disconnected')
# Try release VM resources first, if failed stuck in 'Powering Down'
# state
response = self.releaseVm()
if not response['status']['code']:
if self.user_destroy:
self.setDownStatus(NORMAL, "User shut down")
else:
self.setDownStatus(ERROR,
"Lost connection with kvm process")
Which works.
Verified on vdsm-4.9-59.el6 libvirtEventLoop::DEBUG::2011-04-12 19:26:34,422::vm::1760::vm.Vm::(setDownStatus) vmId=`f3bbd496-3931-43a7-b967-857853fd0e71`::Changed state to Down: Lost connection with kvm process |
Created attachment 487070 [details] vdsm-logs Description of problem: When vdsm tries to operate a vm while qemu process is non-responding - libvirt fails with: TimeoutError: Timed out during operation: cannot acquire state change lock. after failure in operation, vdsm gets events from libvirt abd dose not change vm status. libvirtEventLoop::DEBUG::2011-03-23 17:03:44,521::libvirtvm::1043::vm.Vm::(_onLibvirtLifecycleEvent) vmId=`52902746-580f-4e0c-826a-0bc42ec6d970`::event Stopped detail 5 opaque None libvirtEventLoop::INFO::2011-03-23 17:03:44,521::vm::1014::vm.Vm::(_onQemuDeath) vmId=`52902746-580f-4e0c-826a-0bc42ec6d970`::underlying process disconnected vdsClient -s 0 list table 52902746-580f-4e0c-826a-0bc42ec6d970 31111 LIBVIRT-NFS-0-07 Powering down* Version-Release number of selected component (if applicable): vdsm-4.9-55.el6.x86_64 libvirt-0.8.7-13.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1.run vm 2.kill -19 qemu-process and wait 60 seconds till vdsm will report the vm as up* 3.vdsClient -s 0 destroy vm-uuid (operation will fail with the above error ) 4.kill -9 qemu-process Actual results: vm in non-responding state while actually dead. Additional info: vdsm log attached