Created attachment 783953 [details] logs Description of problem: After vdsm crashed during detachStorageDomain, vdsm is unable to activate/detach/remove the domain Version-Release number of selected component (if applicable): vdsm-4.12.0-rc3.13.git06ed3cc.el6ev.x86_64 How reproducible: depends on what phase vdsm crashed during the detachStorageDomain Steps to Reproduce: on a data center (block with one host on cluster in my case) with connected storage pool and a an ISO domain (local in my case): - detach the ISO domain from pool and stop vdsm right after - start vdsm and wait for host to take SPM - try to activate/detach the domain Actual results: vdsm fails to perform those actions: Thread-427::ERROR::2013-08-07 17:11:32,361::task::850::TaskManager.Task::(_setError) Task=`ee8d1d86-54d7-48ee-9f4d-a8b52ab2890f`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 783, in detachStorageDomain pool.detachSD(sdUUID) File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1048, in detachSD self.validateAttachedDomain(dom) File "/usr/share/vdsm/storage/sp.py", line 515, in validateAttachedDomain raise se.StorageDomainNotInPool(self.spUUID, dom.sdUUID) StorageDomainNotInPool: Storage domain not in pool: 'domain=8ccbd167-a48c-4afd-ab3f-a08f69492486, pool=072c2d76-8886-47ab-a1f9-d97f834115af' Expected results: After a faulire in detachStorageDomain, vdsm should roll-back/forward Additional info: logs
The domain itself has been detached (domain MD has been update), so activateStorageDomain naturally cannot succeed. However, there are 2 operations in order to fully detach: 1. update domain (remove pool=...) 2. update master domain In order to update master domain in this state you need to call forcedDetachSD spm cannot decide to do this as it is not necessarily clear that this is the intent (vs. attach for example). This is one of those problems that would disappear once we no longer have a pool. Anyway, if this can be fixed it's only in engine side.
Fede, shouldn't the memory based pool backed take care of this one too?
(In reply to Allon Mureinik from comment #2) > Fede, shouldn't the memory based pool backed take care of this one too? Yes, this will be automatically fixed by the memory based pool backend.
(In reply to Federico Simoncelli from comment #3) > (In reply to Allon Mureinik from comment #2) > > Fede, shouldn't the memory based pool backed take care of this one too? > > Yes, this will be automatically fixed by the memory based pool backend. According to this statement, the fix for bug 1058022 should have solved this. Moving to MODIFIED.
After a failure of vdsm during detachment of an ISO domain, when vdsm starts again and takes SPM, detachment to the ISO again succeeds. vdsm gets the status of the domains in the pool while it's connecting to the pool again. In case the did moved to detached, it doesn't appear in the domainsMap in the connectStoragePool and in case it wasn't detached, it appears as: '3e85ed9c-16d8-4e76-89cc-d533bcd41b79': 'attached' Re-attaching the domain to the pool again succeeds. Verified using upstream ovirt-3.5 RC1.1
RHEV-M 3.5.0 has been released, closing this bug.