Hide Forgot
Created attachment 556975 [details] Libvirt, machine, vdsm, engine logs Description of problem: Cannot resume paused VM after storage problems. My setup was following: two hosts, two iscsi domains - each on different server. Two VMs were running on SPM host and on both hosts was blocked storage to master SD which VMs had it's volumes on. After new master SD was elected and VMs were paused, former SPM went to Non Operational state and new SPM was elected. I unblocked storage connections on both hosts and then activated Non Operational host. Then I activated Inactive storage domain and attempted to run both machines. Only one machine started running. The second one fails on acquiring guest cpu lock. There is running qemu process for both VMs on the host. Version-Release number of selected component (if applicable): vdsm-4.9.3.1-0.fc16.x86_64 libvirt-client-0.9.6-4.fc16.x86_64 qemu-kvm-0.15.1-3.fc16.x86_64 libvirt-0.9.6-4.fc16.x86_64 qemu-kvm-tools-0.15.1-3.fc16.x86_64 libvirt-python-0.9.6-4.fc16.x86_64 ovirt-engine-3.0.0_0001-1.2.fc16.x86_64 How reproducible: 5 out of 5 times but only on one VM Steps to Reproduce: 1. See Description 2. 3. Actual results: VM cannot resume Expected results: VM is resumed Additional info: I have also noticed this line in vdsm log after VM was paused: Thread-16::WARNING::2012-01-23 14:03:22,941::libvirtvm::1180::vm.Vm::(_readPauseCode) vmId=`b28ff353-258f-4cb9-9a05-8822a9c3b218`::_readPauseCode unsupported by libvirt vm Thread-310::DEBUG::2012-01-23 14:16:11,326::clientIF::76::vds::(wrapper) [10.34.63.26]::call cont with ('b28ff353-258f-4cb9-9a05-8822a9c3b218',) {} Thread-310::ERROR::2012-01-23 14:17:17,354::clientIF::90::vds::(wrapper) Traceback (most recent call last): File "/usr/share/vdsm/clientIF.py", line 80, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/clientIF.py", line 499, in cont return v.cont() File "/usr/share/vdsm/vm.py", line 773, in cont self._acquireCpuLockWithTimeout() File "/usr/share/vdsm/vm.py", line 769, in _acquireCpuLockWithTimeout timeout) RuntimeError: waiting more that 66s for _guestCpuLock
I failed to reproduced using only one host.
Could it be that the lock is not released due to qemu not responding? what is the domain state in virsh? Does it respond to suspend/resume? Does the problem remain after vdsm is restarted?
I cannot reproduce anymore (tried 6 times). Closing as worksforme for now.