Bug 720257

Summary: VDSM: When no connection between vdsm and storage domain teardownVolume failed
Product: Red Hat Enterprise Linux 6 Reporter: Evgeniy German <egerman>
Component: vdsmAssignee: Saggi Mizrahi <smizrahi>
Status: CLOSED ERRATA QA Contact: Evgeniy German <egerman>
Severity: urgent Docs Contact:
Priority: high    
Version: 6.1CC: abaron, bazulay, danken, dpaikov, ewarszaw, iheim, oramraz, tdosek, ykaul
Target Milestone: rcKeywords: Regression, TestBlocker
Target Release: ---   
Hardware: x86_64   
OS: All   
Whiteboard: storage
Fixed In Version: vdsm-4.9-85 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 07:30:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 696976    
Attachments:
Description Flags
vdsm log none

Description Evgeniy German 2011-07-11 08:44:57 UTC
Created attachment 512151 [details]
vdsm log

Description of problem:
teardownVolume should work even when is no connection to storage domain. In the result when it fail there is no way to remove vm.

Version-Release number of selected component (if applicable):
RHEVM:ic129
vdsm:vdsm-4.9-79.el6.x86_64
libvirt:libvirt-0.9.2-1.el6.x86_64

Setup:
Two ISCSI Data Domains
At least one vm with disk

How reproducible:
1)Block connection to all ISCSI Storages for more than 5 min
2)Unblock connection
3)Activate all storages
4)Try remove vm
  
Actual results:
Thread-386::INFO::2011-07-11 09:13:41,066::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: teardownVolume, args: ( sdUUID=197e4a67-6b85-4033-b486-e1319e2ea411 spUUID=5f3919f2-6777-43fb-af1e-c11d28359766 imgUUID=89861256-012e-4e68-ada0-1476ef1d773e volUUID=8b80496f-11e2-4a73-975c-b40059f8b2c8)
Thread-386::DEBUG::2011-07-11 09:13:41,067::task::492::TaskManager.Task::(_debug) Task 18185c35-693f-40d0-b3fe-66a98651cfcf: moving from state init -> state preparing
Thread-386::DEBUG::2011-07-11 09:13:41,067::resourceManager::154::ResourceManager.Request::(__init__) ResName=`Storage.197e4a67-6b85-4033-b486-e1319e2ea411`ReqID=`288e79e9-656f-44f5-bd1b-b1a87a03a59b`::Request was made in '/usr/share/vdsm/storage/hsm.py' line '1701' at 'public_teardownVolume'
Thread-386::DEBUG::2011-07-11 09:13:41,068::resourceManager::467::ResourceManager::(registerResource) Trying to register resource 'Storage.197e4a67-6b85-4033-b486-e1319e2ea411' for lock type 'shared'
Thread-386::DEBUG::2011-07-11 09:13:41,068::resourceManager::508::ResourceManager::(registerResource) Resource 'Storage.197e4a67-6b85-4033-b486-e1319e2ea411' is free. Now locking as 'shared' (1 active user)
Thread-386::DEBUG::2011-07-11 09:13:41,068::resourceManager::191::ResourceManager.Request::(grant) ResName=`Storage.197e4a67-6b85-4033-b486-e1319e2ea411`ReqID=`288e79e9-656f-44f5-bd1b-b1a87a03a59b`::Granted request
Thread-386::DEBUG::2011-07-11 09:13:41,069::task::492::TaskManager.Task::(_debug) Task 18185c35-693f-40d0-b3fe-66a98651cfcf: _resourcesAcquired: Storage.197e4a67-6b85-4033-b486-e1319e2ea411 (shared)
Thread-386::DEBUG::2011-07-11 09:13:41,069::task::492::TaskManager.Task::(_debug) Task 18185c35-693f-40d0-b3fe-66a98651cfcf: ref 1 aborting False
Thread-386::DEBUG::2011-07-11 09:13:41,069::lvm::412::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
Thread-386::DEBUG::2011-07-11 09:13:41,070::lvm::359::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 filter = [ \\"a%/dev/mapper/1987653345|/dev/mapper/19876544%\\", \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags 197e4a67-6b85-4033-b486-e1319e2ea411' (cwd None)
Thread-386::DEBUG::2011-07-11 09:13:41,217::lvm::359::Storage.Misc.excCmd::(cmd) FAILED: <err> = '  /dev/mapper/19876544: read failed after 0 of 4096 at 53687025664: Input/output error\n  /dev/mapper/19876544: read failed after 0 of 4096 at 53687083008: Input/output error\n  /dev/mapper/19876544: read failed after 0 of 4096 at 0: Input/output error\n  /dev/mapper/19876544: read failed after 0 of 4096 at 4096: Input/output error\n  Volume group "197e4a67-6b85-4033-b486-e1319e2ea411" not found\n'; <rc> = 5
Thread-386::WARNING::2011-07-11 09:13:41,221::lvm::416::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] ['  /dev/mapper/19876544: read failed after 0 of 4096 at 53687025664: Input/output error', '  /dev/mapper/19876544: read failed after 0 of 4096 at 53687083008: Input/output error', '  /dev/mapper/19876544: read failed after 0 of 4096 at 0: Input/output error', '  /dev/mapper/19876544: read failed after 0 of 4096 at 4096: Input/output error', '  Volume group "197e4a67-6b85-4033-b486-e1319e2ea411" not found']
Thread-386::DEBUG::2011-07-11 09:13:41,222::lvm::439::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
Thread-386::ERROR::2011-07-11 09:13:41,222::task::865::TaskManager.Task::(_setError) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 873, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1702, in public_teardownVolume
    volclass = SDF.produce(sdUUID).getVolumeClass()
  File "/usr/share/vdsm/storage/sdf.py", line 30, in produce
    newSD = cls.__sdc.lookup(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 83, in lookup
    dom = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 107, in _findDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('197e4a67-6b85-4033-b486-e1319e2ea411',)
Thread-386::DEBUG::2011-07-11 09:13:41,223::task::492::TaskManager.Task::(_debug) Task 18185c35-693f-40d0-b3fe-66a98651cfcf: Task._run: 18185c35-693f-40d0-b3fe-66a98651cfcf ('197e4a67-6b85-4033-b486-e1319e2ea411', '5f3919f2-6777-43fb-af1e-c11d28359766', '89861256-012e-4e68-ada0-1476ef1d773e', '8b80496f-11e2-4a73-975c-b40059f8b2c8') {} failed - stopping task


Expected results:
After activation should be connected to all storages,and vm should be remove

Additional info:
1)The same scenario works under older versions of vdsm.
2)All storage negative tests uses very similar scenario and because this issue its a testblocker for us.

Comment 3 Saggi Mizrahi 2011-07-19 12:02:54 UTC
http://gerrit.usersys.redhat.com/724

Comment 5 Tomas Dosek 2011-07-29 11:21:59 UTC
Verified - vdsm-4.9-86 - above decribed scenario no longer reproduces.

Comment 6 errata-xmlrpc 2011-12-06 07:30:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html