Bug 836122 - 3.1 - [vdsm] Cannot remove iscsi master storage after removing the storage pool
Summary: 3.1 - [vdsm] Cannot remove iscsi master storage after removing the storage pool
Keywords:
Status: CLOSED DUPLICATE of bug 820486
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.3.0
Assignee: Tal Nisan
QA Contact:
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-28 07:22 UTC by Ilanit Stein
Modified: 2016-04-26 23:26 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-08-09 08:31:19 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
engine log (failure around 2012-06-27 17:32) (45 bytes, application/x-compressed-tar)
2012-06-28 07:30 UTC, Ilanit Stein
no flags Details
vdsm log (around 2012-06-27 16:25) (887.35 KB, application/x-compressed-tar)
2012-07-17 10:23 UTC, Ilanit Stein
no flags Details
engine log (previous was broken) (139.52 KB, application/x-compressed-tar)
2012-07-22 13:53 UTC, Ilanit Stein
no flags Details

Description Ilanit Stein 2012-06-28 07:22:51 UTC
Description of problem:

Removing iscsi master Storage domain, after removing data center, fail.
  
Steps to Reproduce:

Failure occurred during VM sanity test run
1. Put storage domain in maintanance.
2. Remove data center
3. Remove master storage domain
  
Error from vdsm.log (attached)

Thread-4732::WARNING::2012-06-27 14:32:58,483::lvm::443::Storage.LVM::(_reloadvgs) lvm vgs failed: 5 [] ['  /dev/mapper/3600144f0bdc38d0000004f03f94b0045: read failed after 0 of 4096 at 107374116864: Input/output error', '  /dev/mapper/3600144f0bdc38d0000004f03f94b0045: read failed after 0 of 4096 at 107374174208: Input/output error', '  /dev/mapper/3600144f0bdc38d0000004f03f94b0045: read failed after 0 of 4096 at 0: Input/output error', '  WARNING: Error counts reached a limit of 3. Device /dev/mapper/3600144f0bdc38d0000004f03f94b0045 was disabled', '  /dev/mapper/3600144f0bdc38d0000004fc5fa31000c: read failed after 0 of 4096 at 107374116864: Input/output error', '  /dev/mapper/3600144f0bdc38d0000004fc5fa31000c: read failed after 0 of 4096 at 107374174208: Input/output error', '  /dev/mapper/3600144f0bdc38d0000004fc5fa31000c: read failed after 0 of 4096 at 0: Input/output error', '  WARNING: Error counts reached a limit of 3. Device /dev/mapper/3600144f0bdc38d0000004fc5fa31000c was disabled', '  /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154: read failed after 0 of 4096 at 107374116864: Input/output error', '  /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154: read failed after 0 of 4096 at 107374174208: Input/output error', '  /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154: read failed after 0 of 4096 at 0: Input/output error', '  WARNING: Error counts reached a limit of 3. Device /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154 was disabled', '  Volume group "862bc1dc-6c2a-4170-a20b-722063833fd0" not found']
Thread-4732::DEBUG::2012-06-27 14:32:58,483::lvm::466::OperationMutex::(_reloadvgs) Operation 'lvm reload operation' released the operation mutex
Thread-4732::WARNING::2012-06-27 14:32:58,483::lvm::153::Storage.LVM::(__getattr__) 862bc1dc-6c2a-4170-a20b-722063833fd0 can't be reloaded, please check your storage connections.
Thread-4732::WARNING::2012-06-27 14:32:58,484::hsm::1257::Storage.HSM::(public_formatStorageDomain) Domain 862bc1dc-6c2a-4170-a20b-722063833fd0 can't be formated
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 1248, in public_formatStorageDomain
  File "/usr/share/vdsm/storage/sd.py", line 332, in getPools
  File "/usr/share/vdsm/storage/sd.py", line 620, in getMetaParam
  File "/usr/share/vdsm/storage/persistentDict.py", line 63, in __getitem__
  File "/usr/share/vdsm/storage/persistentDict.py", line 171, in __getitem__
  File "/usr/lib64/python2.6/contextlib.py", line 16, in __enter__
  File "/usr/share/vdsm/storage/persistentDict.py", line 125, in _accessWrapper
  File "/usr/share/vdsm/storage/persistentDict.py", line 202, in refresh
  File "/usr/share/vdsm/storage/blockSD.py", line 124, in readlines
  File "/usr/share/vdsm/storage/lvm.py", line 154, in __getattr__
AttributeError: Failed reload: 862bc1dc-6c2a-4170-a20b-722063833fd0
Thread-4732::ERROR::2012-06-27 14:32:58,484::task::868::TaskManager.Task::(_setError) Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 876, in _run
  File "/usr/share/vdsm/storage/hsm.py", line 1258, in public_formatStorageDomain
StorageDomainFormatError: Error formating a storage domain: ('862bc1dc-6c2a-4170-a20b-722063833fd0',)
Thread-4732::DEBUG::2012-06-27 14:32:58,485::task::495::TaskManager.Task::(_debug) Task 952e734d-488a-48d9-958b-6fd0619d3186: Task._run: 952e734d-488a-48d9-958b-6fd0619d3186 ('862bc1dc-6c2a-4170-a20b-722063833fd0',) {} failed - stopping task

Comment 1 Ilanit Stein 2012-06-28 07:30:53 UTC
Created attachment 594956 [details]
engine log (failure around 2012-06-27 17:32)

Comment 2 Ilanit Stein 2012-06-28 07:32:37 UTC
This failure was seen on rhevm 3.0 (ic155.1).

Error from rhevm.log (attached): 

2012-06-27 17:32:57,953 INFO  [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (http-0.0.0.0-8443-7) Running command: RemoveStorageDomainCommand internal: false. Entities affected :  ID: 862bc1dc-6c2a-4170-a20b-722063833fd0 Type: Storage
2012-06-27 17:32:57,956 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (http-0.0.0.0-8443-7) START, ConnectStorageServerVDSCommand(vdsId = 9a587842-c059-11e1-9f67-ffec586e5f7e, storagePoolId = 00000000-0000-0000-0000-000000000000, storageType = ISCSI, connectionList = [{ id: e1b78f8c-09bb-47e5-baf9-4ce5fdd5be3c, connection: 10.35.64.102 };]), log id: 28b83e44
2012-06-27 17:32:59,070 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (http-0.0.0.0-8443-7) FINISH, ConnectStorageServerVDSCommand, return: {e1b78f8c-09bb-47e5-baf9-4ce5fdd5be3c=0}, log id: 28b83e44
2012-06-27 17:32:59,072 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (http-0.0.0.0-8443-7) START, FormatStorageDomainVDSCommand(vdsId = 9a587842-c059-11e1-9f67-ffec586e5f7e, storageDomainId=862bc1dc-6c2a-4170-a20b-722063833fd0), log id: 24044cea
2012-06-27 17:32:59,311 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http-0.0.0.0-8443-7) Failed in FormatStorageDomainVDS method
2012-06-27 17:32:59,311 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http-0.0.0.0-8443-7) Error code StorageDomainFormatError and error message VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Error formating a storage domain: ('862bc1dc-6c2a-4170-a20b-722063833fd0',)
2012-06-27 17:32:59,997 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http-0.0.0.0-8443-7) Command org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand return value
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         352
mMessage                      Error formating a storage domain: ('862bc1dc-6c2a-4170-a20b-722063833fd0',)


2012-06-27 17:32:59,997 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http-0.0.0.0-8443-7) Vds: silver-vdsc.qa.lab.tlv.redhat.com
2012-06-27 17:32:59,997 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (http-0.0.0.0-8443-7) Command FormatStorageDomainVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Error formating a storage domain: ('862bc1dc-6c2a-4170-a20b-722063833fd0',)
2012-06-27 17:32:59,998 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.FormatStorageDomainVDSCommand] (http-0.0.0.0-8443-7) FINISH, FormatStorageDomainVDSCommand, log id: 24044cea
2012-06-27 17:32:59,998 ERROR [org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand] (http-0.0.0.0-8443-7) Command org.ovirt.engine.core.bll.storage.RemoveStorageDomainCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to FormatStorageDomainVDS, error = Error formating a storage domain: ('862bc1dc-6c2a-4170-a20b-722063833fd0',)

Comment 4 Ilanit Stein 2012-06-28 08:00:48 UTC
The master storage domain removal failure occurred only once. During storage tests run, in which this exact scenario repeats several times.

Comment 5 Eduardo Warszawski 2012-07-15 17:09:25 UTC
The attached tar file is broken.

We need the vdsm log to be sure but from the partial inline log seems that the storage target is disconnected (*), therefore accessing the metadata fails.

From the line:
AttributeError: Failed reload: 862bc1dc-6c2a-4170-a20b-722063833fd0

The requested SD to be formatted, 862bc1dc-6c2a-4170-a20b-722063833fd0 was used before by the host but now is unreacheable.

Probably the storage target was disconnected when disconnectStoragePool was issued.

Comment 6 Ilanit Stein 2012-07-17 10:23:41 UTC
Created attachment 598598 [details]
vdsm log (around  2012-06-27 16:25)

Comment 7 Eduardo Warszawski 2012-07-22 12:44:43 UTC
This bug is consequence of looking for a device after connectStorageServer before that the mapper has updated the /dev/mapper tree. It is a race.

This symptom has been observed before after connectStorageServer -> getDeviceList flow. (BZ#?)

The solution to this bug is avoid the disconnectStorageServer sent by engine at the end of the destroyStoragePool flow.

In addition:
1) The destroyStoragePool and FormatStorageDomain engine flows should be reviewed.
2) The number of engine disconnectStorageServer commands should be greatly reduced in all the flows,

# Log excerpt:
Thread-4728::INFO::2012-06-27 14:32:53,743::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: disconnectStoragePool, args: ( spUUID=5205d778-642e-4891-b49f-9a19c61d5786 hostID=1 scsiKey=5205d778-642e-4891-b49f-9a19c61d5786)
Thread-4728::INFO::2012-06-27 14:32:53,744::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: disconnectStoragePool, Return response: {'status': {'message': 'OK', 'code': 0}}

Thread-4731::INFO::2012-06-27 14:32:57,139::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: connectStorageServer, args: (domType=3, spUUID=00000000-0000-0000-0000-000000000000, conList=[{'connection': '10.35.64.102', '
iqn': 'iqn.1986-03.com.sun:02:indigo62234946167288', 'portal': '1', 'user': '', 'password': '******', 'id': 'e1b78f8c-09bb-47e5-baf9-4ce5fdd5be3c', 'port': '3260'}])

Thread-4731::INFO::2012-06-27 14:32:58,248::dispatcher::100::Storage.Dispatcher.Protect::(run) Run and protect: connectStorageServer, Return response: {'status': {'message': 'OK', 'code': 0}, 'statuslist': [{'status': 0, 'id': 'e1b78f8c-
09bb-47e5-baf9-4ce5fdd5be3c'}]}

Thread-4732::INFO::2012-06-27 14:32:58,272::dispatcher::94::Storage.Dispatcher.Protect::(run) Run and protect: formatStorageDomain, args: ( sdUUID=862bc1dc-6c2a-4170-a20b-722063833fd0)

Thread-4732::DEBUG::2012-06-27 14:32:58,300::lvm::374::Storage.Misc.excCmd::(cmd) '/usr/bin/sudo -n /sbin/lvm vgs --config " devices { preferred_names = [\\"^/dev/mapper/\\"] ignore_suspended_devices=1 write_cache_state=0 disable_after_e
rror_count=3 filter = [ \\"a%35000cca0009137a4|3600144f0bdc38d0000004f03f94b0045|3600144f0bdc38d0000004fb3c7fd0154|3600144f0bdc38d0000004fc5fa31000c%\\", \\"r%.*%\\" ] }  global {  locking_type=1  prioritise_write_locks=1  wait_for_locks
=1 }  backup {  retain_min = 50  retain_days = 0 } " --noheadings --units b --nosuffix --separator | -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free 862bc1dc-6c2a-4170-a20b-722063833fd0' (cwd 
None)
Thread-4732::DEBUG::2012-06-27 14:32:58,480::lvm::374::Storage.Misc.excCmd::(cmd) FAILED: <err> = '  /dev/mapper/3600144f0bdc38d0000004f03f94b0045: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/3600144f0b
dc38d0000004f03f94b0045: read failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/3600144f0bdc38d0000004f03f94b0045: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3
. Device /dev/mapper/3600144f0bdc38d0000004f03f94b0045 was disabled\n  /dev/mapper/3600144f0bdc38d0000004fc5fa31000c: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/3600144f0bdc38d0000004fc5fa31000c: read 
failed after 0 of 4096 at 107374174208: Input/output error\n  /dev/mapper/3600144f0bdc38d0000004fc5fa31000c: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/3600144f0
bdc38d0000004fc5fa31000c was disabled\n  /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154: read failed after 0 of 4096 at 107374116864: Input/output error\n  /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154: read failed after 0 of 4096 at 1073
74174208: Input/output error\n  /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154: read failed after 0 of 4096 at 0: Input/output error\n  WARNING: Error counts reached a limit of 3. Device /dev/mapper/3600144f0bdc38d0000004fb3c7fd0154 was d
isabled\n  Volume group "862bc1dc-6c2a-4170-a20b-722063833fd0" not found\n'; <rc> = 5

# And formatStorageDomain fails.

Comment 8 Ilanit Stein 2012-07-22 13:53:59 UTC
Created attachment 599612 [details]
engine log (previous was broken)

Comment 10 Eduardo Warszawski 2012-08-09 08:31:19 UTC

*** This bug has been marked as a duplicate of bug 820486 ***


Note You need to log in before you can comment on or make changes to this bug.