Bug 876663 - [Storage] Cannot Extend storage domain while moving an image
Summary: [Storage] Cannot Extend storage domain while moving an image
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.2.0
Assignee: Federico Simoncelli
QA Contact: Dafna Ron
URL:
Whiteboard: storage
Depends On:
Blocks: 917401
TreeView+ depends on / blocked
 
Reported: 2012-11-14 16:51 UTC by Leonid Natapov
Modified: 2016-02-10 19:35 UTC (History)
8 users (show)

Fixed In Version: vdsm-4.10.2-10.0.el6ev
Doc Type: Bug Fix
Doc Text:
Previously it was impossible to extend a block storage domain (adding more space) while there was an ongoing operation, as for example the transfer (move) of an image from a storage domain to another. Now it's possible to extend the storage domain even when there are long tasks running.
Clone Of:
Environment:
Last Closed:
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdsm log (494.56 KB, application/octet-stream)
2012-11-14 16:51 UTC, Leonid Natapov
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 877752 0 high CLOSED engine: .CannotGetJdbcConnectionException on extend domain command 2021-02-22 00:41:40 UTC
oVirt gerrit 10451 0 None None None 2020-05-19 14:33:55 UTC

Internal Links: 877752

Description Leonid Natapov 2012-11-14 16:51:22 UTC
Created attachment 645019 [details]
vdsm log

[Storage] Extend storage domain fails on timeout if there are another storage tasks run in parallel. This might be critical when we do live storage migration

Scenario:
Live migration of disks from one storage domain to another
Extend storage domain while moving disks between domains.
When we start moving disk ,the operation goes to the queue and fails in 2 minutes. So,in case of live disk storage migration both tasks are failed.
Extend storage domain and move disk. 

full vdsm log attached.

Thread-15779::DEBUG::2012-11-14 17:56:37,839::resourceManager::705::ResourceManager.Owner::(acquire) 2bf82660-6468-4195-b712-33dccf9ae9a1: request
for 'Storage.6093c92d-8817-4199-a2dc-9c04b07344b3' timed out after '120.000000' seconds
Thread-15779::ERROR::2012-11-14 17:56:37,840::task::853::TaskManager.Task::(_setError) Task=`2bf82660-6468-4195-b712-33dccf9ae9a1`::Unexpected erro
r
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 631, in extendStorageDomain
    vars.task.getExclusiveLock(STORAGE, sdUUID)
  File "/usr/share/vdsm/storage/task.py", line 1301, in getExclusiveLock
    self.resOwner.acquire(namespace, resName, resourceManager.LockType.exclusive, timeout)
  File "/usr/share/vdsm/storage/resourceManager.py", line 706, in acquire
    raise se.ResourceTimeout()
ResourceTimeout: Resource timeout: ()

Comment 7 RHEL Program Management 2012-12-14 07:52:18 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 8 Ayal Baron 2012-12-26 10:18:28 UTC
Looking at this again it looks to me like it has nothing to do with the tasks, just a locking issue.
extendStorageDomain takes an exclusive lock on the domain and live storage migration (syncImage) takes a shared lock.  Can probably just change the lock to shared (I can't think of a reason for it to be exclusive).
Fede?

Comment 9 Federico Simoncelli 2012-12-28 10:59:54 UTC
(In reply to comment #8)
> Looking at this again it looks to me like it has nothing to do with the
> tasks, just a locking issue.
> extendStorageDomain takes an exclusive lock on the domain and live storage
> migration (syncImage) takes a shared lock.  Can probably just change the
> lock to shared (I can't think of a reason for it to be exclusive).
> Fede?

You are correct, in this specific case the call was moveImage (probably during the copy of a template), but yes it's a locking issue.

I think we can use a shared lock for extendStorageDomain as you suggested.

Thread-15692::INFO::2012-11-14 17:53:04,265::logUtils::37::dispatcher::(wrapper) Run and protect: moveImage(spUUID='e59ad86c-2e41-11e2-98f9-df8494d70a03', srcDomUUID='328fe2ec-4849-4e64-92d8-333f9ff0b09b', dstDomUUID='6093c92d-8817-4199-a2dc-9c04b07344b3', imgUUID='1630a362-5a55-47fb-86c1-a2af432d437a', vmUUID='', op=2, postZero='true', force='false')

[...]

Thread-15692::DEBUG::2012-11-14 17:53:04,793::resourceManager::175::ResourceManager.Request::(__init__) ResName=`Storage.6093c92d-8817-4199-a2dc-9c04b07344b3`ReqID=`e0d1d872-2eb7-4d60-931c-6f1b29dd318c`::Request was made in '/usr/share/vdsm/storage/resourceManager.py' line '485' at 'registerResource'
Thread-15692::DEBUG::2012-11-14 17:53:04,793::resourceManager::486::ResourceManager::(registerResource) Trying to register resource 'Storage.6093c92d-8817-4199-a2dc-9c04b07344b3' for lock type 'shared'
Thread-15692::DEBUG::2012-11-14 17:53:04,794::resourceManager::528::ResourceManager::(registerResource) Resource 'Storage.6093c92d-8817-4199-a2dc-9c04b07344b3' is free. Now locking as 'shared' (1 active user)

[...]

Thread-15779::DEBUG::2012-11-14 17:56:37,839::resourceManager::705::ResourceManager.Owner::(acquire) 2bf82660-6468-4195-b712-33dccf9ae9a1: request for 'Storage.6093c92d-8817-4199-a2dc-9c04b07344b3' timed out after '120.000000' seconds

Comment 10 Federico Simoncelli 2013-01-02 15:37:21 UTC
commit 0b56650267094de24d32d209ca3b5bd02d19685a
Author: Federico Simoncelli <fsimonce>
Date:   Fri Dec 28 10:12:26 2012 -0500

    domain: use shared lock for extendStorageDomain
    
    The extendStorageDomain command shouldn't hold an exclusive lock on
    the storage domain since it can be executed also during other long
    tasks (e.g.: moveImage).

http://gerrit.ovirt.org/#/c/10451/

Comment 13 Dafna Ron 2013-03-13 13:45:17 UTC
verified on sf10 with vdsm-4.10.2-11.0.el6ev.x86_64 libvirt-0.10.2-18.el6_4.eblake.2.x86_64

Comment 14 Itamar Heim 2013-06-11 09:26:22 UTC
3.2 has been released

Comment 15 Itamar Heim 2013-06-11 09:30:38 UTC
3.2 has been released

Comment 16 Itamar Heim 2013-06-11 09:46:19 UTC
3.2 has been released


Note You need to log in before you can comment on or make changes to this bug.