Created attachment 639392 [details] logs Description of problem: I removed 10 domains concurrently, 3 domains failed on formatStroageDomain it seems as though the domains were removed though since although engine reported communication error to the host and rolled back on the domains if we try to remove them again we fail in vdsm: Thread-8243::INFO::2012-11-06 15:37:06,592::task::1157::TaskManager.Task::(prepare) Task=`24c09012-01f8-4f19-85a0-5007af87a8e1`::aborting: Task is aborted: u'Failed reload: e32df40d-9c2a-4bd3-8cdd-eb9b917311a8' - code 100 vgs shows no domain: [root@gold-vdsc ~]# vgs e32df40d-9c2a-4bd3-8cdd-eb9b917311a8 Volume group "e32df40d-9c2a-4bd3-8cdd-eb9b917311a8" not found Version-Release number of selected component (if applicable): vdsm-4.9.6-41.0.el6_3.x86_64 si24 How reproducible: 100% Steps to Reproduce: 1. in two hosts cluster create/attach/detach 10 iscsi domains 2. remove all the domains concurrently 3. Actual results: we report error in remove of some of the domains on timeout Expected results: we should not fail. Additional info: logs looks like task dies: Thread-8238::ERROR::2012-11-06 15:37:03,582::dispatcher::69::Storage.Dispatcher.Protect::(run) Failed reload: c85eac8b-7802-4bc9-b963-beb37f75a963 Traceback (most recent call last): File "/usr/share/vdsm/storage/dispatcher.py", line 61, in run result = ctask.prepare(self.func, *args, **kwargs) File "/usr/share/vdsm/storage/task.py", line 1164, in prepare raise self.error AttributeError: Failed reload: c85eac8b-7802-4bc9-b963-beb37f75a963 Thread-8245::DEBUG::2012-11-06 15:37:03,616::BindingXMLRPC::171::vds::(wrapper) [10.35.97.65] Thread-8245::DEBUG::2012-11-06 15:37:03,617::task::588::TaskManager.Task::(_updateState) Task=`db14bcc4-cef8-44f4-9c6d-f9fbe8625abd`::moving from state init -> state preparing trying to remove again will give second error: Thread-8243::ERROR::2012-11-06 15:37:06,590::task::853::TaskManager.Task::(_setError) Task=`24c09012-01f8-4f19-85a0-5007af87a8e1`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 2328, in formatStorageDomain if not misc.parseBool(autoDetach) and sd.getPools(): File "/usr/share/vdsm/storage/sd.py", line 371, in getPools pools = self.getMetaParam(key=DMDK_POOLS) File "/usr/share/vdsm/storage/sd.py", line 689, in getMetaParam return self._metadata[key] File "/usr/share/vdsm/storage/persistentDict.py", line 85, in __getitem__ return dec(self._dict[key]) File "/usr/share/vdsm/storage/persistentDict.py", line 193, in __getitem__ with self._accessWrapper(): File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__ return self.gen.next() File "/usr/share/vdsm/storage/persistentDict.py", line 147, in _accessWrapper self.refresh() File "/usr/share/vdsm/storage/persistentDict.py", line 224, in refresh lines = self._metaRW.readlines() File "/usr/share/vdsm/storage/blockSD.py", line 186, in readlines for tag in vg.tags: File "/usr/share/vdsm/storage/lvm.py", line 68, in __getattr__ raise AttributeError("Failed reload: %s" % self.name) AttributeError: Failed reload: e32df40d-9c2a-4bd3-8cdd-eb9b917311a8 Thread-8243::DEBUG::2012-11-06 15:37:06,591::task::872::TaskManager.Task::(_run) Task=`24c09012-01f8-4f19-85a0-5007af87a8e1`::Task._run: 24c09012-01f8-4f19-85a0-5007af87a8e1 ('e32df40d-9c2a-4bd3-8cdd-eb9b917311a8', False) {} failed - stopping task
grep on formatStorageDomain shows that Thread-8238: has no return Thread-8238::INFO::2012-11-06 15:36:52,908::logUtils::37::dispatcher::(wrapper) Run and protect: formatStorageDomain(sdUUID='c85eac8b-7802-4bc9-b963-beb37f75a963', autoDetach=False, options=None)
Does this happen with less domains? 3? 5?
(In reply to comment #2) > Does this happen with less domains? 3? 5? I tested with 3 and there was no problem.
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
This requires changing the API to be async