Red Hat Bugzilla – Bug 994582
[vdsm] cannot activate/detach an ISO domain after first detachment failed
Last modified: 2016-02-10 11:56:37 EST
Created attachment 783953 [details]
Description of problem:
After vdsm crashed during detachStorageDomain, vdsm is unable to activate/detach/remove the domain
Version-Release number of selected component (if applicable):
depends on what phase vdsm crashed during the detachStorageDomain
Steps to Reproduce:
on a data center (block with one host on cluster in my case) with connected storage pool and a an ISO domain (local in my case):
- detach the ISO domain from pool and stop vdsm right after
- start vdsm and wait for host to take SPM
- try to activate/detach the domain
vdsm fails to perform those actions:
Thread-427::ERROR::2013-08-07 17:11:32,361::task::850::TaskManager.Task::(_setError) Task=`ee8d1d86-54d7-48ee-9f4d-a8b52ab2890f`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 857, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 783, in detachStorageDomain
File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
return f(self, *args, **kwargs)
File "/usr/share/vdsm/storage/sp.py", line 1048, in detachSD
File "/usr/share/vdsm/storage/sp.py", line 515, in validateAttachedDomain
raise se.StorageDomainNotInPool(self.spUUID, dom.sdUUID)
StorageDomainNotInPool: Storage domain not in pool: 'domain=8ccbd167-a48c-4afd-ab3f-a08f69492486, pool=072c2d76-8886-47ab-a1f9-d97f834115af'
After a faulire in detachStorageDomain, vdsm should roll-back/forward
The domain itself has been detached (domain MD has been update), so activateStorageDomain naturally cannot succeed.
However, there are 2 operations in order to fully detach:
1. update domain (remove pool=...)
2. update master domain
In order to update master domain in this state you need to call forcedDetachSD
spm cannot decide to do this as it is not necessarily clear that this is the intent (vs. attach for example).
This is one of those problems that would disappear once we no longer have a pool.
Anyway, if this can be fixed it's only in engine side.
Fede, shouldn't the memory based pool backed take care of this one too?
(In reply to Allon Mureinik from comment #2)
> Fede, shouldn't the memory based pool backed take care of this one too?
Yes, this will be automatically fixed by the memory based pool backend.
(In reply to Federico Simoncelli from comment #3)
> (In reply to Allon Mureinik from comment #2)
> > Fede, shouldn't the memory based pool backed take care of this one too?
> Yes, this will be automatically fixed by the memory based pool backend.
According to this statement, the fix for bug 1058022 should have solved this.
Moving to MODIFIED.
After a failure of vdsm during detachment of an ISO domain, when vdsm starts again and takes SPM, detachment to the ISO again succeeds.
vdsm gets the status of the domains in the pool while it's connecting to the pool again. In case the did moved to detached, it doesn't appear in the domainsMap in the connectStoragePool and in case it wasn't detached, it appears as: '3e85ed9c-16d8-4e76-89cc-d533bcd41b79': 'attached'
Re-attaching the domain to the pool again succeeds.
Verified using upstream ovirt-3.5 RC1.1
RHEV-M 3.5.0 has been released, closing this bug.