Bug 806952 - ovirt-engine-backend: we allow to remove DC when there is more than one domain attached
ovirt-engine-backend: we allow to remove DC when there is more than one domai...
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
x86_64 Linux
high Severity high
: ---
: 3.1.0
Assigned To: Laszlo Hornyak
Dafna Ron
: Regression, Reopened
Depends On:
  Show dependency treegraph
Reported: 2012-03-26 11:14 EDT by Dafna Ron
Modified: 2016-02-10 11:57 EST (History)
10 users (show)

See Also:
Fixed In Version: si20
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-12-04 15:02:49 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
logs (1.65 MB, application/x-gzip)
2012-03-26 11:39 EDT, Dafna Ron
no flags Details
reproduced again (63.67 KB, application/x-xz)
2012-08-02 08:34 EDT, Dafna Ron
no flags Details

  None (edit)
Description Dafna Ron 2012-03-26 11:14:23 EDT
Description of problem:

remove DC should only be used if we have the master domain attached (since it cannot be detached without removing the pool). 
we currently can remove the DC when there is more then one domain attached. 
I had several races in which the remove fails in backend and not in vds which causes the error "domain not in pool in vds"

However, setting this bug on urgent is because once I created a new DC and attached the removed domains, hsm is getting wrong master or version error and becomes non-operational (yes, HSM not SPM). 

SPM remains active and connected to pool and all domains. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. create and attach several iscsi domains to a two host cluster
2. put all domains in maintenance and remove the DC
3. create a second DC and reattach all the domains to the new DC 
Actual results:

we get a wrong master domain error from hsm only and host becomes non-operational

Expected results:

we should not be allowed to remove the DC while there is more then the master domain attached

Additional info: full logs attached. 

2012-03-26 16:55:02,486 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-22) [4c105e7c] Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         324
mMessage                      Wrong Master domain or its version: 'SD=4b1984e6-feb6-4629-9e13-affda9c9ca41, pool=e4269e26-f1ac-4dfb-8407-8e191d736a05'

Thread-942::ERROR::2012-03-26 16:47:50,783::task::853::TaskManager.Task::(_setError) Task=`7345bbab-5a07-4072-8069-c5df986ed3d9`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 813, in connectStoragePool
    return self._connectStoragePool(spUUID, hostID, scsiKey, msdUUID, masterVersion, options)
  File "/usr/share/vdsm/storage/hsm.py", line 855, in _connectStoragePool
    res = pool.connect(hostID, scsiKey, msdUUID, masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 641, in connect
    self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 1107, in __rebuild
    self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 1451, in getMasterDomain
    raise se.StoragePoolWrongMaster(self.spUUID, msdUUID)
StoragePoolWrongMaster: Wrong Master domain or its version: 'SD=4b1984e6-feb6-4629-9e13-affda9c9ca41, pool=e4269e26-f1ac-4dfb-8407-8e191d736a05'
Comment 1 Dafna Ron 2012-03-26 11:39:04 EDT
Created attachment 572782 [details]
Comment 4 Laszlo Hornyak 2012-07-12 07:07:57 EDT
"works for me"

I could not reproduce the issue with the latest git version from both vdsm and engine. I used two iscsi storages. (how much is several exaclty?)
Comment 5 Laszlo Hornyak 2012-07-17 07:23:07 EDT
Dafna, I could not reproduce this issue do far, can you help? Thx.
Comment 6 Dafna Ron 2012-08-02 08:34:12 EDT
Created attachment 601939 [details]
reproduced again

I reproduce on si12 again. 
* make sure you are working with correct packages and with rhel hosts only. 
* make sure that you have iscsi domains
* make sure that you have no objects under any of the domains (lean setup). 
* make sure that you have only 1 host in up state.
Comment 7 Laszlo Hornyak 2012-08-22 09:29:31 EDT
trying to reproduce. after trying to remove the DC, it failed to detach master domain and the DC got back to Up state
Comment 8 Laszlo Hornyak 2012-08-24 10:34:21 EDT
- the DC could not be removed with a simple 'remove', the master domain got stuck in locked state and the DC got back to 'Up' state, all domains inactivated, only the master Locked - this might be a bug
- with force remove, removing the DC worked fine, after that attaching the storage generated some errors, but at the end it managed to start up

2012-08-24 16:22:23,671 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (http-- [38c457c] starting spm on vds dev-164, storage pool iscsi02, prevId -1, LVER -1
2012-08-24 16:22:23,673 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (http-- [38c457c] START, SpmStartVDSCommand(vdsId = 73261da6-edf4-11e1-955d-b3bda0fd89ee, storagePoolId = 8fa4ef1a-c514-47be-bfe1-004e09cdec23, prevId=-1, prevLVER=-1, storagePoolFormatType=V3, recoveryMode=Manual, SCSIFencing=false), log id: 6fa7ed3
2012-08-24 16:22:23,681 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (http-- [38c457c] spmStart polling started: taskId = 064081b6-0603-44c6-b4dd-44b23eda8ba9
2012-08-24 16:22:24,693 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http-- [38c457c] Failed in HSMGetTaskStatusVDS method
2012-08-24 16:22:24,695 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http-- [38c457c] Error code GeneralException and error message VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = 'wait' is an invalid keyword argument for this function
2012-08-24 16:22:24,697 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (http-- [38c457c] spmStart polling ended: taskId = 064081b6-0603-44c6-b4dd-44b23eda8ba9 task status = finished
2012-08-24 16:22:24,698 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (http-- [38c457c] Start SPM Task failed - result: cleanSuccess, message: VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = 'wait' is an invalid keyword argument for this function
2012-08-24 16:22:24,707 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (http-- [38c457c] spmStart polling ended. spm status: Free
2012-08-24 16:22:24,710 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (http-- [38c457c] START, HSMClearTaskVDSCommand(vdsId = 73261da6-edf4-11e1-955d-b3bda0fd89ee, taskId=064081b6-0603-44c6-b4dd-44b23eda8ba9), log id: 5eb97f68
Comment 9 Laszlo Hornyak 2012-08-24 10:59:48 EDT
related vdsm log:
Thread-2101::DEBUG::2012-08-24 16:52:03,767::taskManager::96::TaskManager::(getTaskStatus) Return. Response: {'code': 100, 'message': u"'wait' is an invalid keyword argument for this function", 'taskState': 'fi
nished', 'taskResult': 'cleanSuccess', 'taskID': 'e481fe99-0a11-4281-b316-a65ccae29528'}
Thread-2101::INFO::2012-08-24 16:52:03,767::logUtils::39::dispatcher::(wrapper) Run and protect: getTaskStatus, Return response: {'taskStatus': {'code': 100, 'message': u"'wait' is an invalid keyword argument f
or this function", 'taskState': 'finished', 'taskResult': 'cleanSuccess', 'taskID': 'e481fe99-0a11-4281-b316-a65ccae29528'}}
Thread-2101::DEBUG::2012-08-24 16:52:03,767::task::1151::TaskManager.Task::(prepare) Task=`de9dbdf4-8105-418c-a094-b1446e1f4508`::finished: {'taskStatus': {'code': 100, 'message': u"'wait' is an invalid keyword
 argument for this function", 'taskState': 'finished', 'taskResult': 'cleanSuccess', 'taskID': 'e481fe99-0a11-4281-b316-a65ccae29528'}}
Thread-2101::DEBUG::2012-08-24 16:52:03,767::task::568::TaskManager.Task::(_updateState) Task=`de9dbdf4-8105-418c-a094-b1446e1f4508`::moving from state preparing -> state finished
Thread-2101::DEBUG::2012-08-24 16:52:03,767::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-2101::DEBUG::2012-08-24 16:52:03,767::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Comment 10 Laszlo Hornyak 2012-08-24 11:41:20 EDT
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 840, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 307, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/sp.py", line 250, in startSpm
  File "/usr/share/vdsm/storage/sd.py", line 427, in acquireHostId
    self._clusterLock.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/safelease.py", line 170, in acquireHostId
    self._sdUUID, hostId, self._idsPath, wait=True):
TypeError: 'wait' is an invalid keyword argument for this function
Comment 11 Laszlo Hornyak 2012-08-27 11:34:16 EDT
Comment 15 Dafna Ron 2012-10-14 08:52:16 EDT
verified on si20

Note You need to log in before you can comment on or make changes to this bug.