Bug 994582 - [vdsm] cannot activate/detach an ISO domain after first detachment failed
[vdsm] cannot activate/detach an ISO domain after first detachment failed
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.3.0
x86_64 Unspecified
unspecified Severity high
: ---
: 3.5.0
Assigned To: Federico Simoncelli
Elad
storage
:
Depends On:
Blocks: 974849 rhev3.5beta 1156165
  Show dependency treegraph
 
Reported: 2013-08-07 10:24 EDT by Elad
Modified: 2016-02-10 11:56 EST (History)
11 users (show)

See Also:
Fixed In Version: vt1.3
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-02-16 14:10:34 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs (2.01 MB, application/x-gzip)
2013-08-07 10:24 EDT, Elad
no flags Details

  None (edit)
Description Elad 2013-08-07 10:24:28 EDT
Created attachment 783953 [details]
logs

Description of problem:
After vdsm crashed during detachStorageDomain, vdsm is unable to activate/detach/remove the domain

Version-Release number of selected component (if applicable):
vdsm-4.12.0-rc3.13.git06ed3cc.el6ev.x86_64

How reproducible:
depends on what phase vdsm crashed during the detachStorageDomain

Steps to Reproduce:
on a data center (block with one host on cluster in my case) with connected storage pool and a an ISO domain (local in my case):
- detach the ISO domain from pool and stop vdsm right after
- start vdsm and wait for host to take SPM
- try to activate/detach the domain 

Actual results:
vdsm fails to perform those actions:

Thread-427::ERROR::2013-08-07 17:11:32,361::task::850::TaskManager.Task::(_setError) Task=`ee8d1d86-54d7-48ee-9f4d-a8b52ab2890f`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 783, in detachStorageDomain
    pool.detachSD(sdUUID)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1048, in detachSD
    self.validateAttachedDomain(dom)
  File "/usr/share/vdsm/storage/sp.py", line 515, in validateAttachedDomain
    raise se.StorageDomainNotInPool(self.spUUID, dom.sdUUID)
StorageDomainNotInPool: Storage domain not in pool: 'domain=8ccbd167-a48c-4afd-ab3f-a08f69492486, pool=072c2d76-8886-47ab-a1f9-d97f834115af'



Expected results:
After a faulire in detachStorageDomain, vdsm should roll-back/forward

Additional info:
logs
Comment 1 Ayal Baron 2013-09-01 09:22:56 EDT
The domain itself has been detached (domain MD has been update), so activateStorageDomain naturally cannot succeed.
However, there are 2 operations in order to fully detach:
1. update domain (remove pool=...)
2. update master domain
In order to update master domain in this state you need to call forcedDetachSD
spm cannot decide to do this as it is not necessarily clear that this is the intent (vs. attach for example).
This is one of those problems that would disappear once we no longer have a pool.
Anyway, if this can be fixed it's only in engine side.
Comment 2 Allon Mureinik 2014-05-07 05:00:31 EDT
Fede, shouldn't the memory based pool backed take care of this one too?
Comment 3 Federico Simoncelli 2014-06-26 04:18:35 EDT
(In reply to Allon Mureinik from comment #2)
> Fede, shouldn't the memory based pool backed take care of this one too?

Yes, this will be automatically fixed by the memory based pool backend.
Comment 4 Allon Mureinik 2014-06-26 05:55:56 EDT
(In reply to Federico Simoncelli from comment #3)
> (In reply to Allon Mureinik from comment #2)
> > Fede, shouldn't the memory based pool backed take care of this one too?
> 
> Yes, this will be automatically fixed by the memory based pool backend.

According to this statement, the fix for bug 1058022 should have solved this.
Moving to MODIFIED.
Comment 5 Elad 2014-08-26 08:10:12 EDT
After a failure of vdsm during detachment of an ISO domain, when vdsm starts again and takes SPM, detachment to the ISO again succeeds.

vdsm gets the status of the domains in the pool while it's connecting to the pool again. In case the did moved to detached, it doesn't appear in the domainsMap in the connectStoragePool and in case it wasn't detached, it appears as: '3e85ed9c-16d8-4e76-89cc-d533bcd41b79': 'attached'

Re-attaching the domain to the pool again succeeds.

Verified using upstream ovirt-3.5 RC1.1
Comment 6 Allon Mureinik 2015-02-16 14:10:34 EST
RHEV-M 3.5.0 has been released, closing this bug.

Note You need to log in before you can comment on or make changes to this bug.