Bug 784002 - Guest cpu lock is not unlocked in case guest was paused due to storage problems
Summary: Guest cpu lock is not unlocked in case guest was paused due to storage problems
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Dan Kenigsberg
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-01-23 14:18 UTC by Jakub Libosvar
Modified: 2016-04-18 06:43 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-01-24 13:49:12 UTC
oVirt Team: ---


Attachments (Terms of Use)
Libvirt, machine, vdsm, engine logs (258.97 KB, application/x-gzip)
2012-01-23 14:18 UTC, Jakub Libosvar
no flags Details

Description Jakub Libosvar 2012-01-23 14:18:39 UTC
Created attachment 556975 [details]
Libvirt, machine, vdsm, engine logs

Description of problem:
Cannot resume paused VM after storage problems. My setup was following:
two hosts, two iscsi domains - each on different server. Two VMs were running on SPM host and on both hosts was blocked storage to master SD which VMs had it's volumes on. After new master SD was elected and VMs were paused, former SPM went to Non Operational state and new SPM was elected. I unblocked storage connections on both hosts and then activated Non Operational host. Then I activated Inactive storage domain and attempted to run both machines. Only one machine started running. The second one fails on acquiring guest cpu lock. There is running qemu process for both VMs on the host. 


Version-Release number of selected component (if applicable):
vdsm-4.9.3.1-0.fc16.x86_64
libvirt-client-0.9.6-4.fc16.x86_64
qemu-kvm-0.15.1-3.fc16.x86_64
libvirt-0.9.6-4.fc16.x86_64
qemu-kvm-tools-0.15.1-3.fc16.x86_64
libvirt-python-0.9.6-4.fc16.x86_64
ovirt-engine-3.0.0_0001-1.2.fc16.x86_64

How reproducible:
5 out of 5 times but only on one VM

Steps to Reproduce:
1. See Description
2.
3.
  
Actual results:
VM cannot resume

Expected results:
VM is resumed

Additional info:

I have also noticed this line in vdsm log after VM was paused: 
Thread-16::WARNING::2012-01-23 14:03:22,941::libvirtvm::1180::vm.Vm::(_readPauseCode) vmId=`b28ff353-258f-4cb9-9a05-8822a9c3b218`::_readPauseCode unsupported by libvirt vm

Thread-310::DEBUG::2012-01-23 14:16:11,326::clientIF::76::vds::(wrapper) [10.34.63.26]::call cont with ('b28ff353-258f-4cb9-9a05-8822a9c3b218',) {}
Thread-310::ERROR::2012-01-23 14:17:17,354::clientIF::90::vds::(wrapper) Traceback (most recent call last):
  File "/usr/share/vdsm/clientIF.py", line 80, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/clientIF.py", line 499, in cont
    return v.cont()
  File "/usr/share/vdsm/vm.py", line 773, in cont
    self._acquireCpuLockWithTimeout()
  File "/usr/share/vdsm/vm.py", line 769, in _acquireCpuLockWithTimeout
    timeout)
RuntimeError: waiting more that 66s for _guestCpuLock

Comment 1 Jakub Libosvar 2012-01-23 14:38:41 UTC
I failed to reproduced using only one host.

Comment 2 Dan Kenigsberg 2012-01-23 15:40:41 UTC
Could it be that the lock is not released due to qemu not responding?
what is the domain state in virsh? Does it respond to suspend/resume?
Does the problem remain after vdsm is restarted?

Comment 3 Jakub Libosvar 2012-01-24 13:49:12 UTC
I cannot reproduce anymore (tried 6 times). Closing as worksforme for now.


Note You need to log in before you can comment on or make changes to this bug.