Created attachment 645019 [details] vdsm log [Storage] Extend storage domain fails on timeout if there are another storage tasks run in parallel. This might be critical when we do live storage migration Scenario: Live migration of disks from one storage domain to another Extend storage domain while moving disks between domains. When we start moving disk ,the operation goes to the queue and fails in 2 minutes. So,in case of live disk storage migration both tasks are failed. Extend storage domain and move disk. full vdsm log attached. Thread-15779::DEBUG::2012-11-14 17:56:37,839::resourceManager::705::ResourceManager.Owner::(acquire) 2bf82660-6468-4195-b712-33dccf9ae9a1: request for 'Storage.6093c92d-8817-4199-a2dc-9c04b07344b3' timed out after '120.000000' seconds Thread-15779::ERROR::2012-11-14 17:56:37,840::task::853::TaskManager.Task::(_setError) Task=`2bf82660-6468-4195-b712-33dccf9ae9a1`::Unexpected erro r Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 631, in extendStorageDomain vars.task.getExclusiveLock(STORAGE, sdUUID) File "/usr/share/vdsm/storage/task.py", line 1301, in getExclusiveLock self.resOwner.acquire(namespace, resName, resourceManager.LockType.exclusive, timeout) File "/usr/share/vdsm/storage/resourceManager.py", line 706, in acquire raise se.ResourceTimeout() ResourceTimeout: Resource timeout: ()
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
Looking at this again it looks to me like it has nothing to do with the tasks, just a locking issue. extendStorageDomain takes an exclusive lock on the domain and live storage migration (syncImage) takes a shared lock. Can probably just change the lock to shared (I can't think of a reason for it to be exclusive). Fede?
(In reply to comment #8) > Looking at this again it looks to me like it has nothing to do with the > tasks, just a locking issue. > extendStorageDomain takes an exclusive lock on the domain and live storage > migration (syncImage) takes a shared lock. Can probably just change the > lock to shared (I can't think of a reason for it to be exclusive). > Fede? You are correct, in this specific case the call was moveImage (probably during the copy of a template), but yes it's a locking issue. I think we can use a shared lock for extendStorageDomain as you suggested. Thread-15692::INFO::2012-11-14 17:53:04,265::logUtils::37::dispatcher::(wrapper) Run and protect: moveImage(spUUID='e59ad86c-2e41-11e2-98f9-df8494d70a03', srcDomUUID='328fe2ec-4849-4e64-92d8-333f9ff0b09b', dstDomUUID='6093c92d-8817-4199-a2dc-9c04b07344b3', imgUUID='1630a362-5a55-47fb-86c1-a2af432d437a', vmUUID='', op=2, postZero='true', force='false') [...] Thread-15692::DEBUG::2012-11-14 17:53:04,793::resourceManager::175::ResourceManager.Request::(__init__) ResName=`Storage.6093c92d-8817-4199-a2dc-9c04b07344b3`ReqID=`e0d1d872-2eb7-4d60-931c-6f1b29dd318c`::Request was made in '/usr/share/vdsm/storage/resourceManager.py' line '485' at 'registerResource' Thread-15692::DEBUG::2012-11-14 17:53:04,793::resourceManager::486::ResourceManager::(registerResource) Trying to register resource 'Storage.6093c92d-8817-4199-a2dc-9c04b07344b3' for lock type 'shared' Thread-15692::DEBUG::2012-11-14 17:53:04,794::resourceManager::528::ResourceManager::(registerResource) Resource 'Storage.6093c92d-8817-4199-a2dc-9c04b07344b3' is free. Now locking as 'shared' (1 active user) [...] Thread-15779::DEBUG::2012-11-14 17:56:37,839::resourceManager::705::ResourceManager.Owner::(acquire) 2bf82660-6468-4195-b712-33dccf9ae9a1: request for 'Storage.6093c92d-8817-4199-a2dc-9c04b07344b3' timed out after '120.000000' seconds
commit 0b56650267094de24d32d209ca3b5bd02d19685a Author: Federico Simoncelli <fsimonce> Date: Fri Dec 28 10:12:26 2012 -0500 domain: use shared lock for extendStorageDomain The extendStorageDomain command shouldn't hold an exclusive lock on the storage domain since it can be executed also during other long tasks (e.g.: moveImage). http://gerrit.ovirt.org/#/c/10451/
verified on sf10 with vdsm-4.10.2-11.0.el6ev.x86_64 libvirt-0.10.2-18.el6_4.eblake.2.x86_64
3.2 has been released