Bug 873701 (scale)

Summary: [RFE] Change formatStorageDomain verb to be async
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Allon Mureinik <amureini>
Status: CLOSED WONTFIX QA Contact: yeylon <yeylon>
Severity: low Docs Contact:
Priority: high    
Version: unspecifiedCC: amureini, bazulay, bsettle, iheim, lpeer, scohen, srevivo, yeylon
Target Milestone: ---Keywords: FutureFeature
Target Release: ---Flags: scohen: needinfo+
sherold: Triaged+
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-17 08:04:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1080372, 1185830    
Bug Blocks:    
Attachments:
Description Flags
logs none

Description Dafna Ron 2012-11-06 14:15:24 UTC
Created attachment 639392 [details]
logs

Description of problem:

I removed 10 domains concurrently, 3 domains failed on formatStroageDomain 
it seems as though the domains were removed though since although engine reported communication error to the host and rolled back on the domains if we try to remove them again we fail in vdsm:

Thread-8243::INFO::2012-11-06 15:37:06,592::task::1157::TaskManager.Task::(prepare) Task=`24c09012-01f8-4f19-85a0-5007af87a8e1`::aborting: Task is aborted: u'Failed reload: e32df40d-9c2a-4bd3-8cdd-eb9b917311a8' - code 100

vgs shows no domain: 

[root@gold-vdsc ~]# vgs e32df40d-9c2a-4bd3-8cdd-eb9b917311a8
  Volume group "e32df40d-9c2a-4bd3-8cdd-eb9b917311a8" not found

Version-Release number of selected component (if applicable):

vdsm-4.9.6-41.0.el6_3.x86_64
si24

How reproducible:

100%

Steps to Reproduce:
1. in two hosts cluster create/attach/detach 10 iscsi domains
2. remove all the domains concurrently 
3.
  
Actual results:

we report error in remove of some of the domains on timeout

Expected results:

we should not fail. 

Additional info: logs


looks like task dies: 

Thread-8238::ERROR::2012-11-06 15:37:03,582::dispatcher::69::Storage.Dispatcher.Protect::(run) Failed reload: c85eac8b-7802-4bc9-b963-beb37f75a963
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/dispatcher.py", line 61, in run
    result = ctask.prepare(self.func, *args, **kwargs)
  File "/usr/share/vdsm/storage/task.py", line 1164, in prepare
    raise self.error
AttributeError: Failed reload: c85eac8b-7802-4bc9-b963-beb37f75a963
Thread-8245::DEBUG::2012-11-06 15:37:03,616::BindingXMLRPC::171::vds::(wrapper) [10.35.97.65]
Thread-8245::DEBUG::2012-11-06 15:37:03,617::task::588::TaskManager.Task::(_updateState) Task=`db14bcc4-cef8-44f4-9c6d-f9fbe8625abd`::moving from state init -> state preparing


trying to remove again will give second error: 

Thread-8243::ERROR::2012-11-06 15:37:06,590::task::853::TaskManager.Task::(_setError) Task=`24c09012-01f8-4f19-85a0-5007af87a8e1`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2328, in formatStorageDomain
    if not misc.parseBool(autoDetach) and sd.getPools():
  File "/usr/share/vdsm/storage/sd.py", line 371, in getPools
    pools = self.getMetaParam(key=DMDK_POOLS)
  File "/usr/share/vdsm/storage/sd.py", line 689, in getMetaParam
    return self._metadata[key]
  File "/usr/share/vdsm/storage/persistentDict.py", line 85, in __getitem__
    return dec(self._dict[key])
  File "/usr/share/vdsm/storage/persistentDict.py", line 193, in __getitem__
    with self._accessWrapper():
  File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__
    return self.gen.next()
  File "/usr/share/vdsm/storage/persistentDict.py", line 147, in _accessWrapper
    self.refresh()
  File "/usr/share/vdsm/storage/persistentDict.py", line 224, in refresh
    lines = self._metaRW.readlines()
  File "/usr/share/vdsm/storage/blockSD.py", line 186, in readlines
    for tag in vg.tags:
  File "/usr/share/vdsm/storage/lvm.py", line 68, in __getattr__
    raise AttributeError("Failed reload: %s" % self.name)
AttributeError: Failed reload: e32df40d-9c2a-4bd3-8cdd-eb9b917311a8
Thread-8243::DEBUG::2012-11-06 15:37:06,591::task::872::TaskManager.Task::(_run) Task=`24c09012-01f8-4f19-85a0-5007af87a8e1`::Task._run: 24c09012-01f8-4f19-85a0-5007af87a8e1 ('e32df40d-9c2a-4bd3-8cdd-eb9b917311a8', False) {} failed - stopping task

Comment 1 Dafna Ron 2012-11-07 09:00:14 UTC
grep on formatStorageDomain shows that Thread-8238: has no return 


Thread-8238::INFO::2012-11-06 15:36:52,908::logUtils::37::dispatcher::(wrapper) Run and protect: formatStorageDomain(sdUUID='c85eac8b-7802-4bc9-b963-beb37f75a963', autoDetach=False, options=None)

Comment 2 Ayal Baron 2012-11-25 10:56:29 UTC
Does this happen with less domains? 3? 5?

Comment 3 Dafna Ron 2012-11-25 12:14:06 UTC
(In reply to comment #2)
> Does this happen with less domains? 3? 5?

I tested with 3 and there was no problem.

Comment 4 RHEL Program Management 2012-12-14 07:52:34 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 5 Ayal Baron 2012-12-26 09:57:21 UTC
This requires changing the API to be async