Description of problem: Removing iscsi master Storage domain, after removing data center, fail. Steps to Reproduce: Failure occurred during VM sanity test run 1. Put storage domain in maintanance. 2. Remove data center 3. Remove master storage domain Error from vdsm.log (attached) Thread-4732::WARNING::2012-06-27 14:32:58,483::lvm::443::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] [' /dev/mapper/3600144f0bdc38d0000004f03f94b0045: read failed after 0 of 4096 at 107374116864: Input/output error', ' /dev/mapper/3600144f0bdc38d0000004f03f94b0045: read failed after 0 of 4096 at 107374174208: Input/output error', ' /dev/mapper/3600144f0bdc38d0000004f03f94b0045: read failed after 0 of 4096 at 0: Input/output error', ' WARNING: Error counts reached a limit of 3. Device /dev/mapper/3600144f0bdc38d0000004f03f94b0045 was disabled', ' /dev/mapper/3600144f0bdc38d0000004fc5fa31000c: read failed after 0 of 4096 at 107374116864: Input/output error', ' /dev/mapper/3600144f0bdc38d0000004fc5fa31000c: read failed after 0 of 4096 at 107374174208: Input/output error', ' /dev/mapper/3600144f0bdc38d0000004fc5fa31000c: read failed after 0 of 4096 at 0: Input/output error', ' WARNING: Error counts reached a limit of 3. Device /dev/mapper/3600144f0bdc38d0000004fc5fa31000c was disabled', ' /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154: read failed after 0 of 4096 at 107374116864: Input/output error', ' /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154: read failed after 0 of 4096 at 107374174208: Input/output error', ' /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154: read failed after 0 of 4096 at 0: Input/output error', ' WARNING: Error counts reached a limit of 3. Device /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154 was disabled', ' Volume group "862bc1dc-6c2a-4170-a20b-722063833fd0" not found'] Thread-4732::DEBUG::2012-06-27 14:32:58,483::lvm::466::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex Thread-4732::WARNING::2012-06-27 14:32:58,483::lvm::153::Storage.LVM::(__getattr__) 862bc1dc-6c2a-4170-a20b-722063833fd0 can't be reloaded, please check your storage connections. Thread-4732::WARNING::2012-06-27 14:32:58,484::hsm::1257::Storage.HSM::(public_formatStorageDomain) Domain 862bc1dc-6c2a-4170-a20b-722063833fd0 can't be formated Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 1248, in public_formatStorageDomain File "/usr/share/vdsm/storage/sd.py", line 332, in getPools File "/usr/share/vdsm/storage/sd.py", line 620, in getMetaParam File "/usr/share/vdsm/storage/persistentDict.py", line 63, in __getitem__ File "/usr/share/vdsm/storage/persistentDict.py", line 171, in __getitem__ File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__ File "/usr/share/vdsm/storage/persistentDict.py", line 125, in _accessWrapper File "/usr/share/vdsm/storage/persistentDict.py", line 202, in refresh File "/usr/share/vdsm/storage/blockSD.py", line 124, in readlines File "/usr/share/vdsm/storage/lvm.py", line 154, in __getattr__ AttributeError: Failed reload: 862bc1dc-6c2a-4170-a20b-722063833fd0 Thread-4732::ERROR::2012-06-27 14:32:58,484::task::868::TaskManager.Task::(_setError) Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 876, in _run File "/usr/share/vdsm/storage/hsm.py", line 1258, in public_formatStorageDomain StorageDomainFormatError: Error formating a storage domain: ('862bc1dc-6c2a-4170-a20b-722063833fd0',) Thread-4732::DEBUG::2012-06-27 14:32:58,485::task::495::TaskManager.Task::(_debug) Task 952e734d-488a-48d9-958b-6fd0619d3186: Task._run: 952e734d-488a-48d9-958b-6fd0619d3186 ('862bc1dc-6c2a-4170-a20b-722063833fd0',) {} failed - stopping task
Created attachment 594956 [details] engine log (failure around 2012-06-27 17:32)
This failure was seen on rhevm 3.0 (ic155.1). Error from rhevm.log (attached): 2012-06-27 17:32:57,953 INFO [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (http-0.0.0.0-8443-7) Running command: RemoveStorageDomainCommand internal: false. Entities affected : ID: 862bc1dc-6c2a-4170-a20b-722063833fd0 Type: Storage 2012-06-27 17:32:57,956 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (http-0.0.0.0-8443-7) START, ConnectStorageServerVDSCommand(vdsId = 9a587842-c059-11e1-9f67-ffec586e5f7e, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = ISCSI, connectionList = [{ id: e1b78f8c-09bb-47e5-baf9-4ce5fdd5be3c, connection: 10.35.64.102 };]), log id: 28b83e44 2012-06-27 17:32:59,070 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (http-0.0.0.0-8443-7) FINISH, ConnectStorageServerVDSCommand, return: {e1b78f8c-09bb-47e5-baf9-4ce5fdd5be3c=0}, log id: 28b83e44 2012-06-27 17:32:59,072 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (http-0.0.0.0-8443-7) START, FormatStorageDomainVDSCommand(vdsId = 9a587842-c059-11e1-9f67-ffec586e5f7e, storageDomainId=862bc1dc-6c2a-4170-a20b-722063833fd0), log id: 24044cea 2012-06-27 17:32:59,311 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http-0.0.0.0-8443-7) Failed in FormatStorageDomainVDS method 2012-06-27 17:32:59,311 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http-0.0.0.0-8443-7) Error code StorageDomainFormatError and error message VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Error formating a storage domain: ('862bc1dc-6c2a-4170-a20b-722063833fd0',) 2012-06-27 17:32:59,997 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http-0.0.0.0-8443-7) Command org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 352 mMessage Error formating a storage domain: ('862bc1dc-6c2a-4170-a20b-722063833fd0',) 2012-06-27 17:32:59,997 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http-0.0.0.0-8443-7) Vds: silver-vdsc.qa.lab.tlv.redhat.com 2012-06-27 17:32:59,997 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (http-0.0.0.0-8443-7) Command FormatStorageDomainVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Error formating a storage domain: ('862bc1dc-6c2a-4170-a20b-722063833fd0',) 2012-06-27 17:32:59,998 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (http-0.0.0.0-8443-7) FINISH, FormatStorageDomainVDSCommand, log id: 24044cea 2012-06-27 17:32:59,998 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (http-0.0.0.0-8443-7) Command org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Error formating a storage domain: ('862bc1dc-6c2a-4170-a20b-722063833fd0',)
The master storage domain removal failure occurred only once. During storage tests run, in which this exact scenario repeats several times.
The attached tar file is broken. We need the vdsm log to be sure but from the partial inline log seems that the storage target is disconnected (*), therefore accessing the metadata fails. From the line: AttributeError: Failed reload: 862bc1dc-6c2a-4170-a20b-722063833fd0 The requested SD to be formatted, 862bc1dc-6c2a-4170-a20b-722063833fd0 was used before by the host but now is unreacheable. Probably the storage target was disconnected when disconnectStoragePool was issued.
Created attachment 598598 [details] vdsm log (around 2012-06-27 16:25)
This bug is consequence of looking for a device after connectStorageServer before that the mapper has updated the /dev/mapper tree. It is a race. This symptom has been observed before after connectStorageServer -> getDeviceList flow. (BZ#?) The solution to this bug is avoid the disconnectStorageServer sent by engine at the end of the destroyStoragePool flow. In addition: 1) The destroyStoragePool and FormatStorageDomain engine flows should be reviewed. 2) The number of engine disconnectStorageServer commands should be greatly reduced in all the flows, # Log excerpt: Thread-4728::INFO::2012-06-27 14:32:53,743::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: disconnectStoragePool, args: ( spUUID=5205d778-642e-4891-b49f-9a19c61d5786 hostID=1 scsiKey=5205d778-642e-4891-b49f-9a19c61d5786) Thread-4728::INFO::2012-06-27 14:32:53,744::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: disconnectStoragePool, Return response: {'status': {'message': 'OK', 'code': 0}} Thread-4731::INFO::2012-06-27 14:32:57,139::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: connectStorageServer, args: (domType=3, spUUID=00000000-0000-0000-0000-000000000000, conList=[{'connection': '10.35.64.102', ' iqn': 'iqn.1986-03.com.sun:02:indigo62234946167288', 'portal': '1', 'user': '', 'password': '******', 'id': 'e1b78f8c-09bb-47e5-baf9-4ce5fdd5be3c', 'port': '3260'}]) Thread-4731::INFO::2012-06-27 14:32:58,248::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: connectStorageServer, Return response: {'status': {'message': 'OK', 'code': 0}, 'statuslist': [{'status': 0, 'id': 'e1b78f8c- 09bb-47e5-baf9-4ce5fdd5be3c'}]} Thread-4732::INFO::2012-06-27 14:32:58,272::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: formatStorageDomain, args: ( sdUUID=862bc1dc-6c2a-4170-a20b-722063833fd0) Thread-4732::DEBUG::2012-06-27 14:32:58,300::lvm::374::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_e rror_count=3 filter = [ \\"a%35000cca0009137a4|3600144f0bdc38d0000004f03f94b0045|3600144f0bdc38d0000004fb3c7fd0154|3600144f0bdc38d0000004fc5fa31000c%\\", \\"r%.*%\\" ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks =1 } backup { retain_min = 50 retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 862bc1dc-6c2a-4170-a20b-722063833fd0' (cwd None) Thread-4732::DEBUG::2012-06-27 14:32:58,480::lvm::374::Storage.Misc.excCmd::(cmd) FAILED: <err> = ' /dev/mapper/3600144f0bdc38d0000004f03f94b0045: read failed after 0 of 4096 at 107374116864: Input/output error\n /dev/mapper/3600144f0b dc38d0000004f03f94b0045: read failed after 0 of 4096 at 107374174208: Input/output error\n /dev/mapper/3600144f0bdc38d0000004f03f94b0045: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3 . Device /dev/mapper/3600144f0bdc38d0000004f03f94b0045 was disabled\n /dev/mapper/3600144f0bdc38d0000004fc5fa31000c: read failed after 0 of 4096 at 107374116864: Input/output error\n /dev/mapper/3600144f0bdc38d0000004fc5fa31000c: read failed after 0 of 4096 at 107374174208: Input/output error\n /dev/mapper/3600144f0bdc38d0000004fc5fa31000c: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/3600144f0 bdc38d0000004fc5fa31000c was disabled\n /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154: read failed after 0 of 4096 at 107374116864: Input/output error\n /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154: read failed after 0 of 4096 at 1073 74174208: Input/output error\n /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154: read failed after 0 of 4096 at 0: Input/output error\n WARNING: Error counts reached a limit of 3. Device /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154 was d isabled\n Volume group "862bc1dc-6c2a-4170-a20b-722063833fd0" not found\n'; <rc> = 5 # And formatStorageDomain fails.
Created attachment 599612 [details] engine log (previous was broken)
*** This bug has been marked as a duplicate of bug 820486 ***