Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 853933

Summary: [engine-core] unable to detach block storage domain in partial state (engine fence SPM on each attempt)
Product: Red Hat Enterprise Virtualization Manager Reporter: vvyazmin <vvyazmin>
Component: ovirt-engineAssignee: Ayal Baron <abaron>
Status: CLOSED WONTFIX QA Contact: Haim <hateya>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.1.0CC: abaron, amureini, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, yeylon, ykaul
Target Milestone: ---   
Target Release: 3.1.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-12 09:36:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
## Logs vdsm, rhevm none

Description vvyazmin@redhat.com 2012-09-03 10:52:31 UTC
Created attachment 609322 [details]
## Logs vdsm, rhevm

Description of problem:
Failed  detach corrupted SD

Version-Release number of selected component (if applicable):
RHEVM 3.1 - SI16

RHEVM: rhevm-3.1.0-14.el6ev.noarch
VDSM: vdsm-4.9.6-31.0.el6_3.x86_64
LIBVIRT: libvirt-0.9.10-21.el6_3.4.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.295.el6_3.1.x86_64
SANLOCK: sanlock-2.3-3.el6_3.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create DC iSCSI with 2 hosts.
2. Create first Storage Domain (Master) SD-01 on Storage Server A (SS-A)
3. Create second SD-02 from 2 LUN's. First LUN on SS-A, and second LUN on SS-B
4. Activate  SD-02
5. Disconect/ Block target connectin to SS-B
* on both hosts, I block connection in IPTables (iptables -A OUTPUT -s 10.35.64.10 -j DROP && iptables -A OUTPUT -d 10.35.64.10 -j DROP && iptables -L)
6. “Maintenance” and than “Detach” second SD-02

Actual results:
1. Failed “Detach” second SD-02
2. In every “Detach” action  SPM move from host to host (ping-pong)

Expected results:
1. Succeed “Detach” second SD-02 by producing “forcedDetachStorageDomain” or by producing an error message stating that, user should use the destroy option
2. Prevent SPM ping-pong, between servers
3. If all nodes see problem with current SD, that failed “Detach” - engine will do following:
* Show a warning, with description how to solved this issue
* Prevent  SPM ping-pong, and run “force remove command”


Additional info:

[root@cougar08 ~]# vgs
  Couldn't find device with uuid qG9rrY-vWos-HC9Z-avqp-KnnN-gqqY-5MKf8K.
  Couldn't find device with uuid JsoOxl-4axo-oHAl-B18k-Q7kl-RTdj-KsV2kf.
  VG                                   #PV #LV #SN Attr   VSize   VFree 
  1f3aed55-74d2-4d7b-b783-d8d8ca20af84   1   6   0 wz--n-  49.62g 45.75g
  2e385b7e-0d91-4e8f-a2a1-2d9ea251a5af   1   6   0 wz--n-  49.62g 45.75g
  535d68ea-fb95-497d-be75-8dda542beb24   1   6   0 wz--n-  49.62g 45.75g
  8178d7c4-4127-40fb-aec4-c4dab61616e8   2   6   0 wz-pn-  76.25g 72.38g
  8dde5007-c9c3-4708-8bd8-48d1a8503566   1   6   0 wz--n-  37.62g 33.75g
  a6311f49-aadd-4bef-9ad2-3d215c18f4e9   1   6   0 wz--n-  49.62g 45.75g
  da68e840-61b2-4e7a-b4b3-92810b24a4af   4   6   0 wz-pn-  18.50g 14.62g
  f386dedb-24f9-465c-8718-099d026fbf8c   1   9   0 wz--n-  49.62g 36.75g
  vg0                                    1   3   0 wz--n- 465.27g     0 

[root@cougar08 ~]# vgs 8178d7c4-4127-40fb-aec4-c4dab61616e8 -o+pv_name
  Couldn't find device with uuid qG9rrY-vWos-HC9Z-avqp-KnnN-gqqY-5MKf8K.
  VG                                   #PV #LV #SN Attr   VSize  VFree  PV                           
  8178d7c4-4127-40fb-aec4-c4dab61616e8   2   6   0 wz-pn- 76.25g 72.38g /dev/mapper/3514f0c56958002e1
  8178d7c4-4127-40fb-aec4-c4dab61616e8   2   6   0 wz-pn- 76.25g 72.38g unknown device    


Thread-46644::WARNING::2012-09-03 11:56:06,519::persistentDict::172::Storage.PersistentDict::(transaction) Error in transaction, rolling back changes
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/persistentDict.py", line 169, in transaction
    self.flush(self._metadata)
  File "/usr/share/vdsm/storage/persistentDict.py", line 288, in flush
    self._metaRW.writelines(lines)
  File "/usr/share/vdsm/storage/blockSD.py", line 208, in writelines
    lvm.changeVGTags(self._vgName, delTags=toRemove, addTags=toAdd)
  File "/usr/share/vdsm/storage/lvm.py", line 1130, in changeVGTags
    raise se.VolumeGroupReplaceTagError("vg:%s del:%s add:%s (%s)" % (vgName, ", ".join(delTags), ", ".join(addTags), err[-1]))
VolumeGroupReplaceTagError: Replace Volume Group tag error: ('vg:8178d7c4-4127-40fb-aec4-c4dab61616e8 del:MDT_POOL_UUID=0884dcc5-fa40-46da-a7cd-526c88968fdd, MDT__SHA_CKSUM=6598
75c6dd81dc090db7a4e50b8bd8f80896ec1f add:MDT_POOL_UUID=, MDT__SHA_CKSUM=1f060762224b543ce6bfb0aa18513836efc3f6bd (  Consider vgreduce --removemissing.)',)
Thread-46644::DEBUG::2012-09-03 11:56:06,519::persistentDict::287::Storage.PersistentDict::(flush) about to write lines (VGTagMetadataRW)=['CLASS=Data', 'DESCRIPTION=sD-iSCSI-TE
ST-02__Comb', 'IOOPTIMEOUTSEC=1', 'LEASERETRIES=3', 'LEASETIMESEC=5', 'LOCKPOLICY=', 'LOCKRENEWALINTERVALSEC=5', 'LOGBLKSIZE=512', 'MASTER_VERSION=0', 'PHYBLKSIZE=512', 'POOL_UU
ID=0884dcc5-fa40-46da-a7cd-526c88968fdd', u'PV0=pv:3514f0c56958002e1,uuid:pV3Kwh-4Ae2-3kj4-ONax-vTQY-hff5-71oGr9,pestart:0,pecount:301,mapoffset:0', 'ROLE=Regular', 'SDUUID=8178
d7c4-4127-40fb-aec4-c4dab61616e8', 'TYPE=ISCSI', 'VERSION=3', 'VGUUID=02VEoG-4T5K-xt55-maGr-zdvx-kS9m-0PJfUM', '_SHA_CKSUM=659875c6dd81dc090db7a4e50b8bd8f80896ec1f']
Thread-46644::DEBUG::2012-09-03 11:56:06,520::lvm::467::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' got the operation mutex
Thread-46644::DEBUG::2012-09-03 11:56:06,520::lvm::469::OperationMutex::(_invalidatevgs) Operation 'lvm invalidate operation' released the operation mutex
Thread-46644::DEBUG::2012-09-03 11:56:06,520::lvm::478::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-46644::DEBUG::2012-09-03 11:56:06,521::lvm::490::OperationMutex::(_invalidatelvs) Operation 'lvm invalidate operation' released the operation mutex
Thread-46644::DEBUG::2012-09-03 11:56:06,521::lvm::352::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' got the operation mutex
Thread-46644::DEBUG::2012-09-03 11:56:06,522::__init__::1164::Storage.Misc.excCmd::(_log) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper
/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ \\"a%3514f0c56958002b7|3514f0c56958002b8|3514f0c56958002b9|3514f0c56958002ba|3514f0c5
6958002bb|3514f0c56958002bc|3514f0c56958002bd|3514f0c56958002be|3514f0c56958002bf|3514f0c56958002c0|3514f0c56958002c1|3514f0c56958002c2|3514f0c56958002c3|3514f0c56958002c4|3514f
0c56958002c5|3514f0c56958002c6|3514f0c56958002c7|3514f0c56958002c8|3514f0c56958002c9|3514f0c56958002ca|3514f0c56958002cb|3514f0c56958002cc|3514f0c56958002cd|3514f0c56958002ce|35
14f0c56958002cf|3514f0c56958002e0|3514f0c56958002e1%\\", \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks=1 }  backup {  retain_min = 50  reta
in_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 8178d7c4-4127-40fb-a
ec4-c4dab61616e8' (cwd None)
Thread-46644::DEBUG::2012-09-03 11:56:07,174::__init__::1164::Storage.Misc.excCmd::(_log) SUCCESS: <err> = "  Couldn't find device with uuid qG9rrY-vWos-HC9Z-avqp-KnnN-gqqY-5MKf8K.\n"; <rc> = 0
Thread-46644::DEBUG::2012-09-03 11:56:07,176::lvm::379::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
Thread-46644::ERROR::2012-09-03 11:56:07,177::task::853::TaskManager.Task::(_setError) Task=`96d03702-7bbd-46aa-bdd5-3bf3e5987091`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 638, in detachStorageDomain
    pool.detachSD(sdUUID)
  File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1006, in detachSD
    dom.detach(self.spUUID)
  File "/usr/share/vdsm/storage/sd.py", line 465, in detach
    self.setMetaParam(DMDK_POOLS, pools)
  File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
    self.gen.next()
  File "/usr/share/vdsm/storage/persistentDict.py", line 169, in transaction
    self.flush(self._metadata)
  File "/usr/share/vdsm/storage/persistentDict.py", line 288, in flush
    self._metaRW.writelines(lines)
  File "/usr/share/vdsm/storage/blockSD.py", line 208, in writelines
    lvm.changeVGTags(self._vgName, delTags=toRemove, addTags=toAdd)
  File "/usr/share/vdsm/storage/lvm.py", line 1130, in changeVGTags
    raise se.VolumeGroupReplaceTagError("vg:%s del:%s add:%s (%s)" % (vgName, ", ".join(delTags), ", ".join(addTags), err[-1]))
VolumeGroupReplaceTagError: Replace Volume Group tag error: ('vg:8178d7c4-4127-40fb-aec4-c4dab61616e8 del:MDT_POOL_UUID=0884dcc5-fa40-46da-a7cd-526c88968fdd, MDT__SHA_CKSUM=659875c6dd81dc090db7a4e50b8bd8f80896ec1f add:MDT_POOL_UUID=, MDT__SHA_CKSUM=1f060762224b543ce6bfb0aa18513836efc3f6bd (  Consider vgreduce --removemissing.)',)
Thread-46644::DEBUG::2012-09-03 11:56:07,178::task::872::TaskManager.Task::(_run) Task=`96d03702-7bbd-46aa-bdd5-3bf3e5987091`::Task._run: 96d03702-7bbd-46aa-bdd5-3bf3e5987091 ('8178d7c4-4127-40fb-aec4-c4dab61616e8', '0884dcc5-fa40-46da-a7cd-526c88968fdd', '00000000-0000-0000-0000-000000000000', 1) {} failed - stopping task
Thread-46644::DEBUG::2012-09-03 11:56:07,178::task::1199::TaskManager.Task::(stop) Task=`96d03702-7bbd-46aa-bdd5-3bf3e5987091`::stopping in state preparing (force False)
Thread-46644::DEBUG::2012-09-03 11:56:07,178::task::978::TaskManager.Task::(_decref) Task=`96d03702-7bbd-46aa-bdd5-3bf3e5987091`::ref 1 aborting True
Thread-46644::INFO::2012-09-03 11:56:07,179::task::1157::TaskManager.Task::(prepare) Task=`96d03702-7bbd-46aa-bdd5-3bf3e5987091`::aborting: Task is aborted: 'Replace Volume Group tag error' - code 516
Thread-46644::DEBUG::2012-09-03 11:56:07,179::task::1162::TaskManager.Task::(prepare) Task=`96d03702-7bbd-46aa-bdd5-3bf3e5987091`::Prepare: aborted: Replace Volume Group tag error
Thread-46644::DEBUG::2012-09-03 11:56:07,179::task::978::TaskManager.Task::(_decref) Task=`96d03702-7bbd-46aa-bdd5-3bf3e5987091`::ref 0 aborting True
Thread-46644::DEBUG::2012-09-03 11:56:07,180::task::913::TaskManager.Task::(_doAbort) Task=`96d03702-7bbd-46aa-bdd5-3bf3e5987091`::Task._doAbort: force False
Thread-46644::DEBUG::2012-09-03 11:56:07,180::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-46644::DEBUG::2012-09-03 11:56:07,180::task::588::TaskManager.Task::(_updateState) Task=`96d03702-7bbd-46aa-bdd5-3bf3e5987091`::moving from state preparing -> state aborting
Thread-46644::DEBUG::2012-09-03 11:56:07,181::task::537::TaskManager.Task::(__state_aborting) Task=`96d03702-7bbd-46aa-bdd5-3bf3e5987091`::_aborting: recover policy none
Thread-46644::DEBUG::2012-09-03 11:56:07,181::task::588::TaskManager.Task::(_updateState) Task=`96d03702-7bbd-46aa-bdd5-3bf3e5987091`::moving from state aborting -> state failed
Thread-46644::DEBUG::2012-09-03 11:56:07,181::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {'Storage.0884dcc5-fa40-46da-a7cd-526c88968fdd': < ResourceRef 'Storage.0884dcc5-fa40-46da-a7cd-526c88968fdd', isValid: 'True' obj: 'None'>, 'Storage.8178d7c4-4127-40fb-aec4-c4dab61616e8': < ResourceRef 'Storage.8178d7c4-4127-40fb-aec4-c4dab61616e8', isValid: 'True' obj: 'None'>}
Thread-46644::DEBUG::2012-09-03 11:56:07,182::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-46644::DEBUG::2012-09-03 11:56:07,182::resourceManager::538::ResourceManager::(releaseResource) Trying to release resource 'Storage.0884dcc5-fa40-46da-a7cd-526c88968fdd'
Thread-46644::DEBUG::2012-09-03 11:56:07,182::resourceManager::553::ResourceManager::(releaseResource) Released resource 'Storage.0884dcc5-fa40-46da-a7cd-526c88968fdd' (0 active users)
Thread-46644::DEBUG::2012-09-03 11:56:07,183::resourceManager::558::ResourceManager::(releaseResource) Resource 'Storage.0884dcc5-fa40-46da-a7cd-526c88968fdd' is free, finding out if anyone is waiting for it.
Thread-46644::DEBUG::2012-09-03 11:56:07,183::resourceManager::565::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.0884dcc5-fa40-46da-a7cd-526c88968fdd', Clearing records.
Thread-46644::DEBUG::2012-09-03 11:56:07,184::resourceManager::538::ResourceManager::(releaseResource) Trying to release resource 'Storage.8178d7c4-4127-40fb-aec4-c4dab61616e8'
Thread-46644::DEBUG::2012-09-03 11:56:07,184::resourceManager::553::ResourceManager::(releaseResource) Released resource 'Storage.8178d7c4-4127-40fb-aec4-c4dab61616e8' (0 active users)
Thread-46644::DEBUG::2012-09-03 11:56:07,184::resourceManager::558::ResourceManager::(releaseResource) Resource 'Storage.8178d7c4-4127-40fb-aec4-c4dab61616e8' is free, finding out if anyone is waiting for it.
Thread-46644::DEBUG::2012-09-03 11:56:07,184::resourceManager::565::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.8178d7c4-4127-40fb-aec4-c4dab61616e8', Clearing records.
Thread-46644::ERROR::2012-09-03 11:56:07,185::dispatcher::66::Storage.Dispatcher.Protect::(run) {'status': {'message': "Replace Volume Group tag error: ('vg:8178d7c4-4127-40fb-aec4-c4dab61616e8 del:MDT_POOL_UUID=0884dcc5-fa40-46da-a7cd-526c88968fdd, MDT__SHA_CKSUM=659875c6dd81dc090db7a4e50b8bd8f80896ec1f add:MDT_POOL_UUID=, MDT__SHA_CKSUM=1f060762224b543ce6bfb0aa18513836efc3f6bd (  Consider vgreduce --removemissing.)',)", 'code': 516}}
Thread-46647::DEBUG::2012-09-03 11:56:07,212::BindingXMLRPC::164::vds::(wrapper) [10.35.97.56]

Comment 1 Itamar Heim 2013-03-12 09:36:46 UTC
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.