Bug 795770

Summary: [ovirt] [vdsm] deactivateStorageDomain will fail when there no access to metadata
Product: [Retired] oVirt Reporter: Haim <hateya>
Component: ovirt-engine-coreAssignee: lpeer <lpeer>
Status: CLOSED WONTFIX QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: abaron, acathrow, amureini, bazulay, ewarszaw, hateya, iheim, mgoldboi, yeylon, ykaul
Target Milestone: ---   
Target Release: 3.3.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-12 07:32:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Haim 2012-02-21 14:00:33 UTC
Description of problem:

- in case something horrible happened, and vdsm has no longer access to metadata, deactivateStorageDomain will fail. 

this is a very big problem as backend will send deactivateStorageDomain, vdsm will return an exception, and domain will stay up for good.

Thread-1482::INFO::2012-02-21 15:58:10,521::logUtils::37::dispatcher::(wrapper) Run and protect: deactivateStorageDomain(sdUUID='c37e94b2-b130-49b4-a7aa-b30e8a372878', spUUID='711a080f-4702-450c-9f5f-bf54f1e99383', msdUUID='00000000-000
0-0000-0000-000000000000', masterVersion=1, options=None)
Thread-1482::DEBUG::2012-02-21 15:58:10,522::resourceManager::175::ResourceManager.Request::(__init__) ResName=`Storage.711a080f-4702-450c-9f5f-bf54f1e99383`ReqID=`aee37fdf-4fa0-46c3-af16-668147c2da2f`::Request was made in '/usr/share/v
dsm/storage/hsm.py' line '901' at 'deactivateStorageDomain'
Thread-1482::DEBUG::2012-02-21 15:58:10,523::resourceManager::486::ResourceManager::(registerResource) Trying to register resource 'Storage.711a080f-4702-450c-9f5f-bf54f1e99383' for lock type 'exclusive'
Thread-1482::DEBUG::2012-02-21 15:58:10,524::resourceManager::528::ResourceManager::(registerResource) Resource 'Storage.711a080f-4702-450c-9f5f-bf54f1e99383' is free. Now locking as 'exclusive' (1 active user)
Thread-1482::DEBUG::2012-02-21 15:58:10,525::resourceManager::212::ResourceManager.Request::(grant) ResName=`Storage.711a080f-4702-450c-9f5f-bf54f1e99383`ReqID=`aee37fdf-4fa0-46c3-af16-668147c2da2f`::Granted request
Thread-1482::DEBUG::2012-02-21 15:58:10,525::task::817::TaskManager.Task::(resourceAcquired) Task=`268e3ca2-3550-4664-9dd4-aaecc9d51f90`::_resourcesAcquired: Storage.711a080f-4702-450c-9f5f-bf54f1e99383 (exclusive)
Thread-1482::DEBUG::2012-02-21 15:58:10,526::task::978::TaskManager.Task::(_decref) Task=`268e3ca2-3550-4664-9dd4-aaecc9d51f90`::ref 1 aborting False
Thread-1482::DEBUG::2012-02-21 15:58:10,527::resourceManager::175::ResourceManager.Request::(__init__) ResName=`Storage.c37e94b2-b130-49b4-a7aa-b30e8a372878`ReqID=`0d69a60d-5850-488b-9234-c86a1953a3b7`::Request was made in '/usr/share/v
dsm/storage/hsm.py' line '902' at 'deactivateStorageDomain'
Thread-1482::DEBUG::2012-02-21 15:58:10,527::resourceManager::486::ResourceManager::(registerResource) Trying to register resource 'Storage.c37e94b2-b130-49b4-a7aa-b30e8a372878' for lock type 'exclusive'
Thread-1482::DEBUG::2012-02-21 15:58:10,528::resourceManager::528::ResourceManager::(registerResource) Resource 'Storage.c37e94b2-b130-49b4-a7aa-b30e8a372878' is free. Now locking as 'exclusive' (1 active user)
Thread-1482::DEBUG::2012-02-21 15:58:10,529::resourceManager::212::ResourceManager.Request::(grant) ResName=`Storage.c37e94b2-b130-49b4-a7aa-b30e8a372878`ReqID=`0d69a60d-5850-488b-9234-c86a1953a3b7`::Granted request
Thread-1482::DEBUG::2012-02-21 15:58:10,529::task::817::TaskManager.Task::(resourceAcquired) Task=`268e3ca2-3550-4664-9dd4-aaecc9d51f90`::_resourcesAcquired: Storage.c37e94b2-b130-49b4-a7aa-b30e8a372878 (exclusive)
Thread-1482::DEBUG::2012-02-21 15:58:10,530::task::978::TaskManager.Task::(_decref) Task=`268e3ca2-3550-4664-9dd4-aaecc9d51f90`::ref 1 aborting False
Thread-1482::INFO::2012-02-21 15:58:10,531::sp::1020::Storage.StoragePool::(deactivateSD) sdUUID=c37e94b2-b130-49b4-a7aa-b30e8a372878 spUUID=711a080f-4702-450c-9f5f-bf54f1e99383 newMsdUUID=00000000-0000-0000-0000-000000000000
Thread-1482::INFO::2012-02-21 15:58:10,531::fileSD::193::Storage.StorageDomain::(validate) sdUUID=c37e94b2-b130-49b4-a7aa-b30e8a372878
Thread-1482::DEBUG::2012-02-21 15:58:10,532::persistentDict::216::Storage.PersistentDict::(refresh) read lines (FileMetadataRW)=[]
Thread-1482::WARNING::2012-02-21 15:58:10,533::persistentDict::238::Storage.PersistentDict::(refresh) data has no embedded checksum - trust it as it is
Thread-1482::ERROR::2012-02-21 15:58:10,533::task::853::TaskManager.Task::(_setError) Task=`268e3ca2-3550-4664-9dd4-aaecc9d51f90`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 904, in deactivateStorageDomain
    pool.deactivateSD(sdUUID, msdUUID, masterVersion)
  File "/usr/share/vdsm/storage/securable.py", line 80, in wrapper
    return f(*args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1051, in deactivateSD
    elif dom.isBackup():
  File "/usr/share/vdsm/storage/sd.py", line 706, in isBackup
    return self.getMetaParam(DMDK_CLASS) == BACKUP_DOMAIN
  File "/usr/share/vdsm/storage/sd.py", line 660, in getMetaParam
    return self._metadata[key]
  File "/usr/share/vdsm/storage/persistentDict.py", line 75, in __getitem__
    return dec(self._dict[key])
  File "/usr/share/vdsm/storage/persistentDict.py", line 185, in __getitem__
    raise KeyError(key)
KeyError: 'CLASS'

Comment 1 Ayal Baron 2012-02-21 14:16:44 UTC
Active/Inactive are engine internal states and the way this should be fixed is by engine avoiding ever calling vdsm to deactivate the domain and just deactivating on engine side.

This has already been discussed with Omer and Livnat.  Moving to engine (not sure there isn't another bug on this already).

Comment 2 Haim 2012-02-21 15:11:45 UTC
(In reply to comment #1)
> Active/Inactive are engine internal states and the way this should be fixed is
> by engine avoiding ever calling vdsm to deactivate the domain and just
> deactivating on engine side.
> 
> This has already been discussed with Omer and Livnat.  Moving to engine (not
> sure there isn't another bug on this already).

We have a bug on it for RHEVM (726957) - anyhow, I remember Eduardo fixed something similiar (95cf43b694c8e18699e9fcb1a5ab71363d2deb8d), and this is another permutation of the problem (NFS - ISO domain). 
I don't see why vdsm should fail so brutally - we can handle it.

Comment 3 Ayal Baron 2012-02-21 20:53:12 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > Active/Inactive are engine internal states and the way this should be fixed is
> > by engine avoiding ever calling vdsm to deactivate the domain and just
> > deactivating on engine side.
> > 
> > This has already been discussed with Omer and Livnat.  Moving to engine (not
> > sure there isn't another bug on this already).
> 
> We have a bug on it for RHEVM (726957) - anyhow, I remember Eduardo fixed
> something similiar (95cf43b694c8e18699e9fcb1a5ab71363d2deb8d), and this is
> another permutation of the problem (NFS - ISO domain). 
> I don't see why vdsm should fail so brutally - we can handle it.

Because the right way to solve it is to remove the 'functionality' from vdsm altogether as it makes no sense as it is, and wasting time on fixing something that is destined for removal in the near future is redundant.

Comment 4 Itamar Heim 2012-12-12 07:32:54 UTC
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.